跳转至

04 — 路由层计费完整流程

概述

每次 LLM 请求完成后,LiteLLM 根据响应中的 token 使用量和 model_cost map 中的价格计算费用。计费路径分为两条:UUID 路径(自定义价格)和 JSON 路径(标准价格),由 use_custom_pricing_for_model() 在运行时决定走哪条。


计费触发点

函数LiteLLMLoggingObj.calculate_request_cost() 文件litellm/litellm_core_utils/litellm_logging.py:1381-1491

# litellm_logging.py:1381
def calculate_request_cost(self):
    ...
    custom_pricing = use_custom_pricing_for_model(
        litellm_params=(self.litellm_params if hasattr(self, "litellm_params") else None)
    )
    ...
    response_cost = cost_calculator.response_cost_calculator(
        response_object=self.model_response,
        model=model,
        custom_llm_provider=custom_llm_provider,
        custom_pricing=custom_pricing,
        ...
    )

触发时机:在 success_handlerasync_success_handler 中,每次 LLM 请求成功返回后调用。


两条计费路径

flowchart TD
    A[LLM 请求完成] --> B[calculate_request_cost\nlitellm_logging.py:1381]
    B --> C{use_custom_pricing_for_model\nlitellm_params}
    C -- 有任意 _CUSTOM_PRICING_KEYS 字段非 None --> D[custom_pricing = True]
    C -- 无任何价格字段 --> E[custom_pricing = False]

    D --> F[_select_model_name_for_cost_calc\n返回 router_model_id UUID]
    F --> G{UUID 在 litellm.model_cost 中?}
    G -- 是 --> H[get_model_info model=UUID\n读 model_cost UUID]
    G -- 否 --> I[回退: 用 model name 查]

    E --> J[_select_model_name_for_cost_calc\n返回 model name]
    J --> K[_get_model_info_helper model name\n5步 fallback 查找]

    H --> L[generic_cost_per_token\nllm_cost_calc/utils.py:580]
    I --> K
    K --> L
    L --> M[_parse_prompt_tokens_details\n提取 cache token 详情]
    M --> N[_get_token_base_cost\n读取各价格字段]
    N --> O[_calculate_input_cost]
    O --> P[calculate_cache_writing_cost\n5m vs 1hr 分拆计算]
    P --> Q[最终 prompt_cost + completion_cost]

真实计费路径 vs. model_map_key(standard logging)

请求完成后存在两个独立的调用点:一个负责真实计费,一个负责在 standard_logging_payload 中写入 model_map_key / model_map_value 字段。

调用点 1:真实计费

LiteLLMLoggingObj._response_cost_calculator()litellm_logging.py:1373-1495):

# litellm_logging.py:1416-1419
elif (
    router_model_id is None and "model_id" in hidden_params
):
    router_model_id = hidden_params["model_id"]   # ← 从 completion_response._hidden_params 取 UUID

# litellm_logging.py:1437-1459
response_cost_calculator_kwargs = {
    "response_object": result,
    "model": litellm_model_name or self.model,
    "custom_pricing": custom_pricing,
    "router_model_id": router_model_id,           # ← 传入 UUID
    ...
}
response_cost = litellm.response_cost_calculator(**response_cost_calculator_kwargs)

router_model_id 来源:Router 在路由时把 deployment 的 model_info.id(UUID)注入到 completion_response._hidden_params["model_id"]

进入 _select_model_name_for_cost_calc() 后(cost_calculator.py:641-645):

if custom_pricing is True:
    if router_model_id is not None and router_model_id in litellm.model_cost:
        return_model = router_model_id          # 走 UUID 路径
    else:
        return_model = model                    # 回退

结果:当 custom_pricing=True 且 UUID 已注册时,真实计费使用 litellm.model_cost[UUID]

调用点 2:model_map_key 构造

StandardLoggingPayloadSetup.get_model_cost_information()litellm_logging.py:4725-4760):

model_cost_name = _select_model_name_for_cost_calc(
    model=None,
    completion_response=init_response_obj,
    base_model=base_model,
    custom_pricing=custom_pricing,
    # router_model_id 未传入 → 默认 None
)

router_model_id 缺省为 None。进入 _select_model_name_for_cost_calc()

if custom_pricing is True:
    if router_model_id is not None and router_model_id in litellm.model_cost:
        return_model = router_model_id
    else:
        return_model = model               # ← 此处 model=None
# ... 后续回退到 completion_response_model
if return_model is None and completion_response_model is not None:
    return_model = completion_response_model

最终 return_model 来自 completion_response.model(例如 "Vendor2/Claude-4.6-Opus"),然后用这个名字查 litellm.model_cost,把结果塞到 StandardLoggingModelInformation

model_cost_information = StandardLoggingModelInformation(
    model_map_key=model_cost_name,          # ← 即 "Vendor2/Claude-4.6-Opus"
    model_map_value=_model_cost_information,
)

两条调用点的数据来源对照

字段 调用点 使用的 key router_model_id 用途
真实计费 response_cost _response_cost_calculatorlitellm_logging.py:1373 router_model_id(UUID,如存在) ✅ 从 hidden_params["model_id"] 实际扣费
model_map_key get_model_cost_informationlitellm_logging.py:4725 completion_response.model ❌ 默认 None 写入 standard_logging_payload 供审计

model_map_key 的字符串值反映的是 standard logging 自己查 model_cost map 时使用的名字,与真实计费使用的 router_model_id 属于各自独立的计算,取值依据不同。


use_custom_pricing_for_model() 判断逻辑

定义litellm_logging.py:4385-4410

_CUSTOM_PRICING_KEYS: frozenset = frozenset(
    CustomPricingLiteLLMParams.model_fields.keys()
)  # litellm_logging.py:215-216

def use_custom_pricing_for_model(litellm_params: Optional[dict]) -> bool:
    if litellm_params is None:
        return False

    # 检查 litellm_params 直接字段
    matching_keys = _CUSTOM_PRICING_KEYS & litellm_params.keys()
    for key in matching_keys:
        if litellm_params.get(key) is not None:
            return True

    # 检查 litellm_params["metadata"]["model_info"]
    metadata: dict = litellm_params.get("metadata", {}) or {}
    model_info: dict = metadata.get("model_info", {}) or {}
    if model_info:
        matching_keys = _CUSTOM_PRICING_KEYS & model_info.keys()
        for key in matching_keys:
            if model_info.get(key) is not None:
                return True

    return False

数据来源litellm_params 来自 Router deployment 对象,即启动时从 DB 解密并存入内存的 LiteLLM_Params 实例。


UUID 路径:自定义价格查找

相关函数_select_model_name_for_cost_calc()cost_calculator.py:612-677

if custom_pricing is True:
    if router_model_id is not None and router_model_id in litellm.model_cost:
        return_model = router_model_id  # 直接用 UUID
    else:
        return_model = model            # 回退到 model name

router_model_id 的来源litellm_logging.py:1417-1419):

if router_model_id is None and "model_id" in hidden_params:
    router_model_id = hidden_params["model_id"]

hidden_params["model_id"] 由 Router 在路由时注入,等于该 deployment 的 UUID(model_info.id)。

UUID 注册到 litellm.model_cost 的时机Router._create_deployment() 被调用时(router.py:6204-6208)。


JSON 路径:model name 5 步 fallback

custom_pricing=False,或 UUID 不在 model_cost 中时(热重载后短暂窗口),走此路径。

函数_get_model_info_helper()litellm/utils.py:5430-5547

候选 key 生成_get_potential_model_names()utils.py:5317-5360):

model="Vendor2/Claude-4.6-Opus", custom_llm_provider="anthropic" 为例:

优先级 变量名 计算值
1 combined_model_name "anthropic/Vendor2/Claude-4.6-Opus"
2 model(原始) "Vendor2/Claude-4.6-Opus"
3 combined_stripped_model_name "anthropic/Vendor2/Claude-4.6-Opus"(同 1,无版本号可剥离)
4 stripped_model_name "Vendor2/Claude-4.6-Opus"(同 2)
5 split_model "Vendor2/Claude-4.6-Opus"(同 2)

查找流程utils.py:5500-5547):

for candidate in [combined_model_name, model, combined_stripped_model_name,
                  stripped_model_name, split_model]:
    _matched_key = _get_model_cost_key(candidate)  # 大小写不敏感
    if _matched_key is not None:
        _model_info = _get_model_info_from_model_cost(key=_matched_key)
        if _check_provider_match(_model_info, custom_llm_provider):
            break  # 命中
        else:
            _model_info = None  # provider 不匹配,继续

if _model_info is None:
    raise ValueError("This model isn't mapped yet.")

_check_provider_match()utils.py:5265-5303):仅当 model_info["litellm_provider"] != custom_llm_provider 时才 reject,anthropic == anthropic 则通过。


Cache 价格计算详解

完整调用链

generic_cost_per_token()                    [utils.py:580]
  ├─ _parse_prompt_tokens_details(usage)    [utils.py:406]
  │    └─ 提取 cache_creation_token_details(含 5m / 1hr token 数)
  ├─ _get_token_base_cost(model_info, usage) [utils.py:158]
  │    ├─ prompt_base_cost = _get_cost_per_unit(model_info, "input_cost_per_token")
  │    ├─ cache_creation_cost = _get_cost_per_unit(model_info, "cache_creation_input_token_cost")
  │    ├─ cache_creation_cost_above_1hr = _get_cost_per_unit(model_info, "cache_creation_input_token_cost_above_1hr")
  │    │    └─ ⚠️ 若字段不存在 → 返回 default_value=0.0(静默!)
  │    └─ cache_read_cost = _get_cost_per_unit(model_info, "cache_read_input_token_cost")
  ├─ _calculate_input_cost(...)             [utils.py:513]
  └─ calculate_cache_writing_cost(...)      [utils.py:360]

_parse_prompt_tokens_details() 如何提取 5m/1hr

定义llm_cost_calc/utils.py:406-464

cache_creation_token_details = getattr(
    usage.prompt_tokens_details, "cache_creation_token_details", None
)
# cache_creation_token_details 来自 Anthropic 响应:
#   usage.cache_creation.ephemeral_5m_input_tokens
#   usage.cache_creation.ephemeral_1h_input_tokens

Anthropic 响应转换anthropic/chat/transformation.py:1590-1594):

CacheCreationTokenDetails(
    ephemeral_5m_input_tokens=_usage["cache_creation"].get("ephemeral_5m_input_tokens"),
    ephemeral_1h_input_tokens=_usage["cache_creation"].get("ephemeral_1h_input_tokens"),
)

calculate_cache_writing_cost() 核心逻辑

定义llm_cost_calc/utils.py:360-391

def calculate_cache_writing_cost(
    cache_creation_tokens: int,
    cache_creation_token_details: Optional[CacheCreationTokenDetails],
    cache_creation_cost_above_1hr: float,  # 若未配置 → 0.0
    cache_creation_cost: float,            # 5m 价格
) -> float:
    total_cost = 0.0
    if cache_creation_token_details is not None:
        # Anthropic 新版:按 5m 和 1hr 分拆
        tokens_5m = cache_creation_token_details.ephemeral_5m_input_tokens
        tokens_1h = cache_creation_token_details.ephemeral_1h_input_tokens
        total_cost += tokens_5m * cache_creation_cost if tokens_5m is not None else 0.0
        total_cost += tokens_1h * cache_creation_cost_above_1hr if tokens_1h is not None else 0.0
    else:
        # 旧版:全部按 cache_creation_cost 计算
        total_cost += cache_creation_tokens * cache_creation_cost
    return total_cost

text_tokens double-counting 修正

generic_cost_per_token() 约 620-643 行处理 text_tokens 可能包含 cached tokens 的情况:

total_details = text_tokens + cache_hit + audio_tokens + cache_creation + image_tokens
has_double_counting = cache_hit > 0 and total_details > usage.prompt_tokens

if (text_tokens == 0 and prompt_tokens_details["image_count"] == 0) or has_double_counting:
    text_tokens = (
        usage.prompt_tokens
        - cache_hit
        - audio_tokens
        - cache_creation    # ← cache_creation 被从 text_tokens 中减去
        - image_tokens
    )

注意:当 text_tokens == 0(Anthropic 返回的格式中 text_tokens 为 0 时常见),此修正会触发,正确计算纯文本 token 数。


两条路径对比总结

graph LR
    subgraph UUID路径 custom_pricing=True
        U1[litellm_params 有价格字段] --> U2[router_model_id=UUID]
        U2 --> U3[litellm.model_cost UUID\n含 cache_creation_input_token_cost\n含 cache_creation_input_token_cost_above_1hr]
        U3 --> U4[价格来自 DB litellm_params\n字段缺失→静默为0]
    end

    subgraph JSON路径 custom_pricing=False
        J1[litellm_params 无价格字段] --> J2[model_name 5步 fallback]
        J2 --> J3[litellm.model_cost model_name\n含 JSON 文件所有字段]
        J3 --> J4[价格来自 JSON 文件\n字段缺失→抛 ValueError]
    end
维度 UUID 路径 JSON 路径
触发条件 litellm_params 有任意 _CUSTOM_PRICING_KEYS 非 None litellm_params 无任何价格字段
价格 key router_model_id(UUID) combined_model_name 等 5 步
cache 价格缺失 静默返回 0.0,不报错 从 JSON 中读取,JSON 无则 ValueError
更新触发 UI 保存(立即)或 DB 变化(30s 后) 重启或热重载
1hr cache 陷阱 cache_creation_input_token_cost_above_1hr 缺失 → 1hr cache 免费 无此问题(JSON 中有完整字段)