04 — 路由层计费完整流程¶

概述¶

每次 LLM 请求完成后，LiteLLM 根据响应中的 token 使用量和 model_cost map 中的价格计算费用。计费路径分为两条：UUID 路径（自定义价格）和 JSON 路径（标准价格），由 use_custom_pricing_for_model() 在运行时决定走哪条。

计费触发点¶

函数：LiteLLMLoggingObj.calculate_request_cost() 文件：litellm/litellm_core_utils/litellm_logging.py:1381-1491

# litellm_logging.py:1381
def calculate_request_cost(self):
    ...
    custom_pricing = use_custom_pricing_for_model(
        litellm_params=(self.litellm_params if hasattr(self, "litellm_params") else None)
    )
    ...
    response_cost = cost_calculator.response_cost_calculator(
        response_object=self.model_response,
        model=model,
        custom_llm_provider=custom_llm_provider,
        custom_pricing=custom_pricing,
        ...
    )

触发时机：在 success_handler 和 async_success_handler 中，每次 LLM 请求成功返回后调用。

两条计费路径¶

flowchart TD
    A[LLM 请求完成] --> B[calculate_request_cost\nlitellm_logging.py:1381]
    B --> C{use_custom_pricing_for_model\nlitellm_params}
    C -- 有任意 _CUSTOM_PRICING_KEYS 字段非 None --> D[custom_pricing = True]
    C -- 无任何价格字段 --> E[custom_pricing = False]

    D --> F[_select_model_name_for_cost_calc\n返回 router_model_id UUID]
    F --> G{UUID 在 litellm.model_cost 中?}
    G -- 是 --> H[get_model_info model=UUID\n读 model_cost UUID]
    G -- 否 --> I[回退: 用 model name 查]

    E --> J[_select_model_name_for_cost_calc\n返回 model name]
    J --> K[_get_model_info_helper model name\n5步 fallback 查找]

    H --> L[generic_cost_per_token\nllm_cost_calc/utils.py:580]
    I --> K
    K --> L
    L --> M[_parse_prompt_tokens_details\n提取 cache token 详情]
    M --> N[_get_token_base_cost\n读取各价格字段]
    N --> O[_calculate_input_cost]
    O --> P[calculate_cache_writing_cost\n5m vs 1hr 分拆计算]
    P --> Q[最终 prompt_cost + completion_cost]

真实计费路径 vs. `model_map_key`（standard logging）¶

请求完成后存在两个独立的调用点：一个负责真实计费，一个负责在 standard_logging_payload 中写入 model_map_key / model_map_value 字段。

调用点 1：真实计费¶

LiteLLMLoggingObj._response_cost_calculator()（litellm_logging.py:1373-1495）：

# litellm_logging.py:1416-1419
elif (
    router_model_id is None and "model_id" in hidden_params
):
    router_model_id = hidden_params["model_id"]   # ← 从 completion_response._hidden_params 取 UUID

# litellm_logging.py:1437-1459
response_cost_calculator_kwargs = {
    "response_object": result,
    "model": litellm_model_name or self.model,
    "custom_pricing": custom_pricing,
    "router_model_id": router_model_id,           # ← 传入 UUID
    ...
}
response_cost = litellm.response_cost_calculator(**response_cost_calculator_kwargs)

router_model_id 来源：Router 在路由时把 deployment 的 model_info.id（UUID）注入到 completion_response._hidden_params["model_id"]。

进入 _select_model_name_for_cost_calc() 后（cost_calculator.py:641-645）：

if custom_pricing is True:
    if router_model_id is not None and router_model_id in litellm.model_cost:
        return_model = router_model_id          # 走 UUID 路径
    else:
        return_model = model                    # 回退

结果：当 custom_pricing=True 且 UUID 已注册时，真实计费使用 litellm.model_cost[UUID]。

调用点 2：`model_map_key` 构造¶

StandardLoggingPayloadSetup.get_model_cost_information()（litellm_logging.py:4725-4760）：

model_cost_name = _select_model_name_for_cost_calc(
    model=None,
    completion_response=init_response_obj,
    base_model=base_model,
    custom_pricing=custom_pricing,
    # router_model_id 未传入 → 默认 None
)

router_model_id 缺省为 None。进入 _select_model_name_for_cost_calc()：

if custom_pricing is True:
    if router_model_id is not None and router_model_id in litellm.model_cost:
        return_model = router_model_id
    else:
        return_model = model               # ← 此处 model=None
# ... 后续回退到 completion_response_model
if return_model is None and completion_response_model is not None:
    return_model = completion_response_model

最终 return_model 来自 completion_response.model（例如 "Vendor2/Claude-4.6-Opus"），然后用这个名字查 litellm.model_cost，把结果塞到 StandardLoggingModelInformation：

model_cost_information = StandardLoggingModelInformation(
    model_map_key=model_cost_name,          # ← 即 "Vendor2/Claude-4.6-Opus"
    model_map_value=_model_cost_information,
)

两条调用点的数据来源对照¶

字段	调用点	使用的 key	传 `router_model_id`	用途
真实计费 `response_cost`	`_response_cost_calculator`（`litellm_logging.py:1373`）	`router_model_id`（UUID，如存在）	✅ 从 `hidden_params["model_id"]` 取	实际扣费
`model_map_key`	`get_model_cost_information`（`litellm_logging.py:4725`）	`completion_response.model` 等	❌ 默认 None	写入 `standard_logging_payload` 供审计

model_map_key 的字符串值反映的是 standard logging 自己查 model_cost map 时使用的名字，与真实计费使用的 router_model_id 属于各自独立的计算，取值依据不同。

`use_custom_pricing_for_model()` 判断逻辑¶

定义：litellm_logging.py:4385-4410

_CUSTOM_PRICING_KEYS: frozenset = frozenset(
    CustomPricingLiteLLMParams.model_fields.keys()
)  # litellm_logging.py:215-216

def use_custom_pricing_for_model(litellm_params: Optional[dict]) -> bool:
    if litellm_params is None:
        return False

    # 检查 litellm_params 直接字段
    matching_keys = _CUSTOM_PRICING_KEYS & litellm_params.keys()
    for key in matching_keys:
        if litellm_params.get(key) is not None:
            return True

    # 检查 litellm_params["metadata"]["model_info"]
    metadata: dict = litellm_params.get("metadata", {}) or {}
    model_info: dict = metadata.get("model_info", {}) or {}
    if model_info:
        matching_keys = _CUSTOM_PRICING_KEYS & model_info.keys()
        for key in matching_keys:
            if model_info.get(key) is not None:
                return True

    return False

数据来源：litellm_params 来自 Router deployment 对象，即启动时从 DB 解密并存入内存的 LiteLLM_Params 实例。

UUID 路径：自定义价格查找¶

相关函数：_select_model_name_for_cost_calc()（cost_calculator.py:612-677）

if custom_pricing is True:
    if router_model_id is not None and router_model_id in litellm.model_cost:
        return_model = router_model_id  # 直接用 UUID
    else:
        return_model = model            # 回退到 model name

router_model_id 的来源（litellm_logging.py:1417-1419）：

if router_model_id is None and "model_id" in hidden_params:
    router_model_id = hidden_params["model_id"]

hidden_params["model_id"] 由 Router 在路由时注入，等于该 deployment 的 UUID（model_info.id）。

UUID 注册到 litellm.model_cost 的时机：Router._create_deployment() 被调用时（router.py:6204-6208）。

JSON 路径：model name 5 步 fallback¶

当 custom_pricing=False，或 UUID 不在 model_cost 中时（热重载后短暂窗口），走此路径。

函数：_get_model_info_helper()（litellm/utils.py:5430-5547）

候选 key 生成（_get_potential_model_names()，utils.py:5317-5360）：

以 model="Vendor2/Claude-4.6-Opus", custom_llm_provider="anthropic" 为例：

优先级	变量名	计算值
1	`combined_model_name`	`"anthropic/Vendor2/Claude-4.6-Opus"`
2	`model`（原始）	`"Vendor2/Claude-4.6-Opus"`
3	`combined_stripped_model_name`	`"anthropic/Vendor2/Claude-4.6-Opus"`（同 1，无版本号可剥离）
4	`stripped_model_name`	`"Vendor2/Claude-4.6-Opus"`（同 2）
5	`split_model`	`"Vendor2/Claude-4.6-Opus"`（同 2）

查找流程（utils.py:5500-5547）：

for candidate in [combined_model_name, model, combined_stripped_model_name,
                  stripped_model_name, split_model]:
    _matched_key = _get_model_cost_key(candidate)  # 大小写不敏感
    if _matched_key is not None:
        _model_info = _get_model_info_from_model_cost(key=_matched_key)
        if _check_provider_match(_model_info, custom_llm_provider):
            break  # 命中
        else:
            _model_info = None  # provider 不匹配，继续

if _model_info is None:
    raise ValueError("This model isn't mapped yet.")

_check_provider_match()（utils.py:5265-5303）：仅当 model_info["litellm_provider"] != custom_llm_provider 时才 reject，anthropic == anthropic 则通过。

Cache 价格计算详解¶

完整调用链¶

generic_cost_per_token()                    [utils.py:580]
  ├─ _parse_prompt_tokens_details(usage)    [utils.py:406]
  │    └─ 提取 cache_creation_token_details（含 5m / 1hr token 数）
  ├─ _get_token_base_cost(model_info, usage) [utils.py:158]
  │    ├─ prompt_base_cost = _get_cost_per_unit(model_info, "input_cost_per_token")
  │    ├─ cache_creation_cost = _get_cost_per_unit(model_info, "cache_creation_input_token_cost")
  │    ├─ cache_creation_cost_above_1hr = _get_cost_per_unit(model_info, "cache_creation_input_token_cost_above_1hr")
  │    │    └─ ⚠️ 若字段不存在 → 返回 default_value=0.0（静默！）
  │    └─ cache_read_cost = _get_cost_per_unit(model_info, "cache_read_input_token_cost")
  ├─ _calculate_input_cost(...)             [utils.py:513]
  └─ calculate_cache_writing_cost(...)      [utils.py:360]

`_parse_prompt_tokens_details()` 如何提取 5m/1hr¶

定义：llm_cost_calc/utils.py:406-464

cache_creation_token_details = getattr(
    usage.prompt_tokens_details, "cache_creation_token_details", None
)
# cache_creation_token_details 来自 Anthropic 响应：
#   usage.cache_creation.ephemeral_5m_input_tokens
#   usage.cache_creation.ephemeral_1h_input_tokens

Anthropic 响应转换（anthropic/chat/transformation.py:1590-1594）：

CacheCreationTokenDetails(
    ephemeral_5m_input_tokens=_usage["cache_creation"].get("ephemeral_5m_input_tokens"),
    ephemeral_1h_input_tokens=_usage["cache_creation"].get("ephemeral_1h_input_tokens"),
)

`calculate_cache_writing_cost()` 核心逻辑¶

定义：llm_cost_calc/utils.py:360-391

def calculate_cache_writing_cost(
    cache_creation_tokens: int,
    cache_creation_token_details: Optional[CacheCreationTokenDetails],
    cache_creation_cost_above_1hr: float,  # 若未配置 → 0.0
    cache_creation_cost: float,            # 5m 价格
) -> float:
    total_cost = 0.0
    if cache_creation_token_details is not None:
        # Anthropic 新版：按 5m 和 1hr 分拆
        tokens_5m = cache_creation_token_details.ephemeral_5m_input_tokens
        tokens_1h = cache_creation_token_details.ephemeral_1h_input_tokens
        total_cost += tokens_5m * cache_creation_cost if tokens_5m is not None else 0.0
        total_cost += tokens_1h * cache_creation_cost_above_1hr if tokens_1h is not None else 0.0
    else:
        # 旧版：全部按 cache_creation_cost 计算
        total_cost += cache_creation_tokens * cache_creation_cost
    return total_cost

text_tokens double-counting 修正¶

generic_cost_per_token() 约 620-643 行处理 text_tokens 可能包含 cached tokens 的情况：

total_details = text_tokens + cache_hit + audio_tokens + cache_creation + image_tokens
has_double_counting = cache_hit > 0 and total_details > usage.prompt_tokens

if (text_tokens == 0 and prompt_tokens_details["image_count"] == 0) or has_double_counting:
    text_tokens = (
        usage.prompt_tokens
        - cache_hit
        - audio_tokens
        - cache_creation    # ← cache_creation 被从 text_tokens 中减去
        - image_tokens
    )

注意：当 text_tokens == 0（Anthropic 返回的格式中 text_tokens 为 0 时常见），此修正会触发，正确计算纯文本 token 数。

两条路径对比总结¶

graph LR
    subgraph UUID路径 custom_pricing=True
        U1[litellm_params 有价格字段] --> U2[router_model_id=UUID]
        U2 --> U3[litellm.model_cost UUID\n含 cache_creation_input_token_cost\n含 cache_creation_input_token_cost_above_1hr]
        U3 --> U4[价格来自 DB litellm_params\n字段缺失→静默为0]
    end

    subgraph JSON路径 custom_pricing=False
        J1[litellm_params 无价格字段] --> J2[model_name 5步 fallback]
        J2 --> J3[litellm.model_cost model_name\n含 JSON 文件所有字段]
        J3 --> J4[价格来自 JSON 文件\n字段缺失→抛 ValueError]
    end

维度	UUID 路径	JSON 路径
触发条件	litellm_params 有任意 `_CUSTOM_PRICING_KEYS` 非 None	litellm_params 无任何价格字段
价格 key	`router_model_id`（UUID）	`combined_model_name` 等 5 步
cache 价格缺失	静默返回 0.0，不报错	从 JSON 中读取，JSON 无则 ValueError
更新触发	UI 保存（立即）或 DB 变化（30s 后）	重启或热重载
1hr cache 陷阱	`cache_creation_input_token_cost_above_1hr` 缺失 → 1hr cache 免费	无此问题（JSON 中有完整字段）