05 — Anthropic Cache 计费已知 Bug 与修复¶

背景¶

本文记录针对 Vendor2/Claude-4.6-Opus（via api.gpugeek.com 的 Anthropic 兼容代理）cache 计费问题的完整排查过程与根因，并给出配置建议。

Bug 一：cache 费用全部为 $0¶

现象¶

cache_creation_input_tokens 和 cache_read_input_tokens 均被计为 $0，只有基础输入和输出被正确计费。

根因¶

litellm_params 中有 input_cost_per_token/output_cost_per_token 但没有 cache 相关价格字段。

use_custom_pricing_for_model() 检测到 input_cost_per_token 非 None → 返回 True → 走 UUID 路径
_create_deployment() 注册 UUID entry 时只从 litellm_params 合并非 None 字段（router.py:6197-6199）
litellm_params 中无 cache 字段 → UUID entry 中 cache_creation_input_token_cost = None
_get_cost_per_unit() 遇到 None → 返回 default_value=0.0
计费结果：cache token × 0 = $0

修复¶

在 UI 的 LiteLLM Params 中同时设置所有 cache 价格字段：

字段	值
`cache_creation_input_token_cost`	`0.00000625`
`cache_read_input_token_cost`	`0.0000005`
`cache_creation_input_token_cost_above_1hr`	`0.00001`
`cache_creation_input_token_cost_above_200k_tokens`	`0.0000125`
`cache_read_input_token_cost_above_200k_tokens`	`0.000001`

或改用 JSON 路径（见 Bug 三）。

Bug 二：`cache_creation_input_token_cost_above_1hr` 缺失导致 1hr cache 静默免费¶

现象¶

请求 3（只有 cache creation，无 cache read）被超额收费或欠费，取决于 Anthropic 返回的是 ephemeral_5m_input_tokens 还是 ephemeral_1h_input_tokens。

具体表现： - 若 Anthropic 返回的全是 ephemeral_5m_input_tokens（5 分钟缓存）→ 按 cache_creation_input_token_cost（$6.25/1M）计费 → 正确 - 若 Anthropic 返回了 ephemeral_1h_input_tokens（1 小时缓存）→ 按 cache_creation_cost_above_1hr（缺失时为 0.0）计费 → 静默免费，欠费

根因¶

_get_cost_per_unit() 的默认值是 0.0：

# llm_cost_calc/utils.py:319 附近
def _get_cost_per_unit(model_info, field_name, default_value=0.0):
    value = model_info.get(field_name)
    if value is None:
        return default_value   # ← 静默返回 0，不报错
    return value

当 cache_creation_input_token_cost_above_1hr 未在 DB litellm_params 中设置时，UUID entry 中该字段为 None，_get_cost_per_unit 返回 0.0，导致 1hr cache token 计费为 0。

关键代码¶

# calculate_cache_writing_cost (utils.py:360-391)
tokens_1h = cache_creation_token_details.ephemeral_1h_input_tokens
total_cost += tokens_1h * cache_creation_cost_above_1hr if tokens_1h is not None else 0.0
# 若 cache_creation_cost_above_1hr = 0.0（因字段缺失），则 tokens_1h * 0.0 = 0.0

修复¶

在 DB litellm_params 中必须包含 cache_creation_input_token_cost_above_1hr。可通过 UI 的 LiteLLM Params 编辑页面添加。

Bug 三：路径 A（JSON 路径）配置方法¶

若不想在 UI 维护价格字段，可走 JSON 路径，彻底绕开 DB 价格。

配置步骤¶

Step 1：在 litellm/model_prices_and_context_window_backup.json 中添加模型条目（选用 "anthropic/Vendor2/Claude-4.6-Opus" 作为 key 最优先命中）：

"anthropic/Vendor2/Claude-4.6-Opus": {
    "input_cost_per_token": 5e-06,
    "input_cost_per_token_above_200k_tokens": 1e-05,
    "output_cost_per_token": 2.5e-05,
    "output_cost_per_token_above_200k_tokens": 3.75e-05,
    "cache_creation_input_token_cost": 6.25e-06,
    "cache_creation_input_token_cost_above_200k_tokens": 1.25e-05,
    "cache_creation_input_token_cost_above_1hr": 1e-05,
    "cache_read_input_token_cost": 5e-07,
    "cache_read_input_token_cost_above_200k_tokens": 1e-06,
    "litellm_provider": "anthropic",
    "max_input_tokens": 1000000,
    "max_output_tokens": 128000,
    "max_tokens": 128000,
    "mode": "chat",
    "supports_prompt_caching": true,
    "supports_function_calling": true,
    "supports_vision": true,
    "supports_reasoning": true,
    "supports_pdf_input": true,
    "supports_computer_use": true,
    "supports_assistant_prefill": false,
    "supports_response_schema": true,
    "supports_tool_choice": true
}

Step 2：设置环境变量：

LITELLM_LOCAL_MODEL_COST_MAP=true

Step 3：在 UI 的 LiteLLM Params 中清空所有价格字段（只保留 api_base、model、custom_llm_provider），确保 use_custom_pricing_for_model() 返回 False。

⚠️ 注意：UI "清空" 字段并不会从 DB 删除（见 03-ui-pricing.md 陷阱说明）。需要通过直接操作 DB 删除字段，或将字段值设为 0。

Step 4：重启 LiteLLM。

诊断工具¶

验证当前 DB 中 litellm_params 的实际内容¶

docker exec litellm_local_pg psql -U llmproxy -d litellm \
  -c "SELECT model_name, litellm_params FROM \"LiteLLM_ProxyModelTable\" WHERE model_name = 'claude-opus-4-6';"

若 litellm_params JSON 中包含任意 _CUSTOM_PRICING_KEYS 字段（即使加密），则走 UUID 路径。

判断当前走哪条路径¶

在日志中搜索（需 DEBUG 级别）：

custom_pricing = True   → UUID 路径
custom_pricing = False  → JSON 路径

或在响应 header 中查看 x-litellm-model-id，若存在且为 UUID 格式，则计费在 litellm.model_cost[UUID] 中查找。

价格验证公式¶

以 Anthropic claude-opus-4 官方价格为参照：

Token 类型	价格
基础输入	$5 / 1M tokens
输出	$25 / 1M tokens
Cache 写入（5min）	$6.25 / 1M tokens
Cache 写入（1hr）	$10 / 1M tokens
Cache 读取	$0.5 / 1M tokens

验证公式（假设 cache_creation_token_details 全为 5m）：

total = (prompt_tokens - cache_creation - cache_read) × $5/1M
      + cache_creation × $6.25/1M
      + cache_read × $0.5/1M
      + completion_tokens × $25/1M

若日志显示的 cost 与此公式结果偏差超过 0.1%，则存在计费 bug，对照本文根因逐一排查。