04 — 路由层计费完整流程¶
概述¶
每次 LLM 请求完成后,LiteLLM 根据响应中的 token 使用量和 model_cost map 中的价格计算费用。计费路径分为两条:UUID 路径(自定义价格)和 JSON 路径(标准价格),由 use_custom_pricing_for_model() 在运行时决定走哪条。
计费触发点¶
函数:LiteLLMLoggingObj.calculate_request_cost()
文件:litellm/litellm_core_utils/litellm_logging.py:1381-1491
# litellm_logging.py:1381
def calculate_request_cost(self):
...
custom_pricing = use_custom_pricing_for_model(
litellm_params=(self.litellm_params if hasattr(self, "litellm_params") else None)
)
...
response_cost = cost_calculator.response_cost_calculator(
response_object=self.model_response,
model=model,
custom_llm_provider=custom_llm_provider,
custom_pricing=custom_pricing,
...
)
触发时机:在 success_handler 和 async_success_handler 中,每次 LLM 请求成功返回后调用。
两条计费路径¶
flowchart TD
A[LLM 请求完成] --> B[calculate_request_cost\nlitellm_logging.py:1381]
B --> C{use_custom_pricing_for_model\nlitellm_params}
C -- 有任意 _CUSTOM_PRICING_KEYS 字段非 None --> D[custom_pricing = True]
C -- 无任何价格字段 --> E[custom_pricing = False]
D --> F[_select_model_name_for_cost_calc\n返回 router_model_id UUID]
F --> G{UUID 在 litellm.model_cost 中?}
G -- 是 --> H[get_model_info model=UUID\n读 model_cost UUID]
G -- 否 --> I[回退: 用 model name 查]
E --> J[_select_model_name_for_cost_calc\n返回 model name]
J --> K[_get_model_info_helper model name\n5步 fallback 查找]
H --> L[generic_cost_per_token\nllm_cost_calc/utils.py:580]
I --> K
K --> L
L --> M[_parse_prompt_tokens_details\n提取 cache token 详情]
M --> N[_get_token_base_cost\n读取各价格字段]
N --> O[_calculate_input_cost]
O --> P[calculate_cache_writing_cost\n5m vs 1hr 分拆计算]
P --> Q[最终 prompt_cost + completion_cost]
真实计费路径 vs. model_map_key(standard logging)¶
请求完成后存在两个独立的调用点:一个负责真实计费,一个负责在 standard_logging_payload 中写入 model_map_key / model_map_value 字段。
调用点 1:真实计费¶
LiteLLMLoggingObj._response_cost_calculator()(litellm_logging.py:1373-1495):
# litellm_logging.py:1416-1419
elif (
router_model_id is None and "model_id" in hidden_params
):
router_model_id = hidden_params["model_id"] # ← 从 completion_response._hidden_params 取 UUID
# litellm_logging.py:1437-1459
response_cost_calculator_kwargs = {
"response_object": result,
"model": litellm_model_name or self.model,
"custom_pricing": custom_pricing,
"router_model_id": router_model_id, # ← 传入 UUID
...
}
response_cost = litellm.response_cost_calculator(**response_cost_calculator_kwargs)
router_model_id 来源:Router 在路由时把 deployment 的 model_info.id(UUID)注入到 completion_response._hidden_params["model_id"]。
进入 _select_model_name_for_cost_calc() 后(cost_calculator.py:641-645):
if custom_pricing is True:
if router_model_id is not None and router_model_id in litellm.model_cost:
return_model = router_model_id # 走 UUID 路径
else:
return_model = model # 回退
结果:当 custom_pricing=True 且 UUID 已注册时,真实计费使用 litellm.model_cost[UUID]。
调用点 2:model_map_key 构造¶
StandardLoggingPayloadSetup.get_model_cost_information()(litellm_logging.py:4725-4760):
model_cost_name = _select_model_name_for_cost_calc(
model=None,
completion_response=init_response_obj,
base_model=base_model,
custom_pricing=custom_pricing,
# router_model_id 未传入 → 默认 None
)
router_model_id 缺省为 None。进入 _select_model_name_for_cost_calc():
if custom_pricing is True:
if router_model_id is not None and router_model_id in litellm.model_cost:
return_model = router_model_id
else:
return_model = model # ← 此处 model=None
# ... 后续回退到 completion_response_model
if return_model is None and completion_response_model is not None:
return_model = completion_response_model
最终 return_model 来自 completion_response.model(例如 "Vendor2/Claude-4.6-Opus"),然后用这个名字查 litellm.model_cost,把结果塞到 StandardLoggingModelInformation:
model_cost_information = StandardLoggingModelInformation(
model_map_key=model_cost_name, # ← 即 "Vendor2/Claude-4.6-Opus"
model_map_value=_model_cost_information,
)
两条调用点的数据来源对照¶
| 字段 | 调用点 | 使用的 key | 传 router_model_id |
用途 |
|---|---|---|---|---|
真实计费 response_cost |
_response_cost_calculator(litellm_logging.py:1373) |
router_model_id(UUID,如存在) |
✅ 从 hidden_params["model_id"] 取 |
实际扣费 |
model_map_key |
get_model_cost_information(litellm_logging.py:4725) |
completion_response.model 等 |
❌ 默认 None | 写入 standard_logging_payload 供审计 |
model_map_key 的字符串值反映的是 standard logging 自己查 model_cost map 时使用的名字,与真实计费使用的 router_model_id 属于各自独立的计算,取值依据不同。
use_custom_pricing_for_model() 判断逻辑¶
定义:litellm_logging.py:4385-4410
_CUSTOM_PRICING_KEYS: frozenset = frozenset(
CustomPricingLiteLLMParams.model_fields.keys()
) # litellm_logging.py:215-216
def use_custom_pricing_for_model(litellm_params: Optional[dict]) -> bool:
if litellm_params is None:
return False
# 检查 litellm_params 直接字段
matching_keys = _CUSTOM_PRICING_KEYS & litellm_params.keys()
for key in matching_keys:
if litellm_params.get(key) is not None:
return True
# 检查 litellm_params["metadata"]["model_info"]
metadata: dict = litellm_params.get("metadata", {}) or {}
model_info: dict = metadata.get("model_info", {}) or {}
if model_info:
matching_keys = _CUSTOM_PRICING_KEYS & model_info.keys()
for key in matching_keys:
if model_info.get(key) is not None:
return True
return False
数据来源:litellm_params 来自 Router deployment 对象,即启动时从 DB 解密并存入内存的 LiteLLM_Params 实例。
UUID 路径:自定义价格查找¶
相关函数:_select_model_name_for_cost_calc()(cost_calculator.py:612-677)
if custom_pricing is True:
if router_model_id is not None and router_model_id in litellm.model_cost:
return_model = router_model_id # 直接用 UUID
else:
return_model = model # 回退到 model name
router_model_id 的来源(litellm_logging.py:1417-1419):
if router_model_id is None and "model_id" in hidden_params:
router_model_id = hidden_params["model_id"]
hidden_params["model_id"] 由 Router 在路由时注入,等于该 deployment 的 UUID(model_info.id)。
UUID 注册到 litellm.model_cost 的时机:Router._create_deployment() 被调用时(router.py:6204-6208)。
JSON 路径:model name 5 步 fallback¶
当 custom_pricing=False,或 UUID 不在 model_cost 中时(热重载后短暂窗口),走此路径。
函数:_get_model_info_helper()(litellm/utils.py:5430-5547)
候选 key 生成(_get_potential_model_names(),utils.py:5317-5360):
以 model="Vendor2/Claude-4.6-Opus", custom_llm_provider="anthropic" 为例:
| 优先级 | 变量名 | 计算值 |
|---|---|---|
| 1 | combined_model_name |
"anthropic/Vendor2/Claude-4.6-Opus" |
| 2 | model(原始) |
"Vendor2/Claude-4.6-Opus" |
| 3 | combined_stripped_model_name |
"anthropic/Vendor2/Claude-4.6-Opus"(同 1,无版本号可剥离) |
| 4 | stripped_model_name |
"Vendor2/Claude-4.6-Opus"(同 2) |
| 5 | split_model |
"Vendor2/Claude-4.6-Opus"(同 2) |
查找流程(utils.py:5500-5547):
for candidate in [combined_model_name, model, combined_stripped_model_name,
stripped_model_name, split_model]:
_matched_key = _get_model_cost_key(candidate) # 大小写不敏感
if _matched_key is not None:
_model_info = _get_model_info_from_model_cost(key=_matched_key)
if _check_provider_match(_model_info, custom_llm_provider):
break # 命中
else:
_model_info = None # provider 不匹配,继续
if _model_info is None:
raise ValueError("This model isn't mapped yet.")
_check_provider_match()(utils.py:5265-5303):仅当 model_info["litellm_provider"] != custom_llm_provider 时才 reject,anthropic == anthropic 则通过。
Cache 价格计算详解¶
完整调用链¶
generic_cost_per_token() [utils.py:580]
├─ _parse_prompt_tokens_details(usage) [utils.py:406]
│ └─ 提取 cache_creation_token_details(含 5m / 1hr token 数)
├─ _get_token_base_cost(model_info, usage) [utils.py:158]
│ ├─ prompt_base_cost = _get_cost_per_unit(model_info, "input_cost_per_token")
│ ├─ cache_creation_cost = _get_cost_per_unit(model_info, "cache_creation_input_token_cost")
│ ├─ cache_creation_cost_above_1hr = _get_cost_per_unit(model_info, "cache_creation_input_token_cost_above_1hr")
│ │ └─ ⚠️ 若字段不存在 → 返回 default_value=0.0(静默!)
│ └─ cache_read_cost = _get_cost_per_unit(model_info, "cache_read_input_token_cost")
├─ _calculate_input_cost(...) [utils.py:513]
└─ calculate_cache_writing_cost(...) [utils.py:360]
_parse_prompt_tokens_details() 如何提取 5m/1hr¶
定义:llm_cost_calc/utils.py:406-464
cache_creation_token_details = getattr(
usage.prompt_tokens_details, "cache_creation_token_details", None
)
# cache_creation_token_details 来自 Anthropic 响应:
# usage.cache_creation.ephemeral_5m_input_tokens
# usage.cache_creation.ephemeral_1h_input_tokens
Anthropic 响应转换(anthropic/chat/transformation.py:1590-1594):
CacheCreationTokenDetails(
ephemeral_5m_input_tokens=_usage["cache_creation"].get("ephemeral_5m_input_tokens"),
ephemeral_1h_input_tokens=_usage["cache_creation"].get("ephemeral_1h_input_tokens"),
)
calculate_cache_writing_cost() 核心逻辑¶
定义:llm_cost_calc/utils.py:360-391
def calculate_cache_writing_cost(
cache_creation_tokens: int,
cache_creation_token_details: Optional[CacheCreationTokenDetails],
cache_creation_cost_above_1hr: float, # 若未配置 → 0.0
cache_creation_cost: float, # 5m 价格
) -> float:
total_cost = 0.0
if cache_creation_token_details is not None:
# Anthropic 新版:按 5m 和 1hr 分拆
tokens_5m = cache_creation_token_details.ephemeral_5m_input_tokens
tokens_1h = cache_creation_token_details.ephemeral_1h_input_tokens
total_cost += tokens_5m * cache_creation_cost if tokens_5m is not None else 0.0
total_cost += tokens_1h * cache_creation_cost_above_1hr if tokens_1h is not None else 0.0
else:
# 旧版:全部按 cache_creation_cost 计算
total_cost += cache_creation_tokens * cache_creation_cost
return total_cost
text_tokens double-counting 修正¶
generic_cost_per_token() 约 620-643 行处理 text_tokens 可能包含 cached tokens 的情况:
total_details = text_tokens + cache_hit + audio_tokens + cache_creation + image_tokens
has_double_counting = cache_hit > 0 and total_details > usage.prompt_tokens
if (text_tokens == 0 and prompt_tokens_details["image_count"] == 0) or has_double_counting:
text_tokens = (
usage.prompt_tokens
- cache_hit
- audio_tokens
- cache_creation # ← cache_creation 被从 text_tokens 中减去
- image_tokens
)
注意:当 text_tokens == 0(Anthropic 返回的格式中 text_tokens 为 0 时常见),此修正会触发,正确计算纯文本 token 数。
两条路径对比总结¶
graph LR
subgraph UUID路径 custom_pricing=True
U1[litellm_params 有价格字段] --> U2[router_model_id=UUID]
U2 --> U3[litellm.model_cost UUID\n含 cache_creation_input_token_cost\n含 cache_creation_input_token_cost_above_1hr]
U3 --> U4[价格来自 DB litellm_params\n字段缺失→静默为0]
end
subgraph JSON路径 custom_pricing=False
J1[litellm_params 无价格字段] --> J2[model_name 5步 fallback]
J2 --> J3[litellm.model_cost model_name\n含 JSON 文件所有字段]
J3 --> J4[价格来自 JSON 文件\n字段缺失→抛 ValueError]
end
| 维度 | UUID 路径 | JSON 路径 |
|---|---|---|
| 触发条件 | litellm_params 有任意 _CUSTOM_PRICING_KEYS 非 None |
litellm_params 无任何价格字段 |
| 价格 key | router_model_id(UUID) |
combined_model_name 等 5 步 |
| cache 价格缺失 | 静默返回 0.0,不报错 | 从 JSON 中读取,JSON 无则 ValueError |
| 更新触发 | UI 保存(立即)或 DB 变化(30s 后) | 重启或热重载 |
| 1hr cache 陷阱 | cache_creation_input_token_cost_above_1hr 缺失 → 1hr cache 免费 |
无此问题(JSON 中有完整字段) |