02 — Pre-call 检查链路¶
请求到达后,在调用上游 LLM 之前,LiteLLM 顺序执行两段限流:鉴权阶段的 budget check 与 ProxyLogging 的 pre_call_hook 链。两段都可能抛 429,必须都通过才能继续。
阶段一:鉴权阶段(auth_checks)¶
入口:litellm/proxy/auth/user_api_key_auth.py(FastAPI 依赖)。
user_api_key_auth() 从 header 取 token、查 DB/缓存得到 UserAPIKeyAuth 对象。
⚠️ 进入 budget 检查前的旁路开关:在调用下面这一连串 budget 检查 之前,user_api_key_auth.py:1086-1096 会先跑
_is_model_cost_zero(model, llm_router)。一旦它返回 True,下面整段(key / team_member / team / user / end_user / org / project / model_max_budget)全部跳过——即使 spend 已经远超 max_budget,也不会抛BudgetExceededError。这条旁路是 GLM/MiniMax 等模型限流失效的真正根因。详细机制见 05-skip-budget-checks-bug.md。下面这些检查只在
skip_budget_checks=False时才会执行:
_virtual_key_max_budget_check # auth_checks.py:2669 → key 级 spend
_virtual_key_soft_budget_check # auth_checks.py:2720 → soft budget 告警
team_member_budget_check # auth_checks.py:2846 → team_member spend
team_max_budget_check # auth_checks.py:2890 → team spend
user_max_budget_check # auth_checks.py:343 → user spend
end_user_max_budget_check # auth_checks.py:366 → end_user spend
project_max_budget_check # auth_checks.py:3020 → project spend
organization_max_budget_check # auth_checks.py:3202 → org spend
每一处都是同一个模板:
if subject.spend >= subject.max_budget:
raise litellm.BudgetExceededError(
current_cost=subject.spend,
max_budget=subject.max_budget,
)
关键:这些检查全部读
spend字段,不读其他统计。spend必须由后续的_PROXY_track_cost_callback在每次请求结束时写回,否则永远是 0 / 旧值。详见 03-spend-update-flow.md。
BudgetExceededError 在外层被映射为 HTTP 400 + {"error": {"code": "BudgetExceeded", ...}}。
阶段二:ProxyLogging.pre_call_hook 链¶
鉴权通过后,路由代码进入 proxy_logging_obj.pre_call_hook(...),依次调用所有已注册的 CustomLogger.async_pre_call_hook。
# proxy/utils.py: ProxyLogging.__init__()
self.max_budget_limiter = _PROXY_MaxBudgetLimiter() # line ~313
self.max_parallel_request_limiter = _PROXY_MaxParallelRequestsHandler_v3() # ~310
# proxy/utils.py: _add_proxy_hooks()
litellm.callbacks.append(self.max_budget_limiter)
litellm.callbacks.append(self.max_parallel_request_limiter)
调用链:litellm/proxy/utils.py:1328-1380(pre_call_hook 主循环)
for callback in litellm.callbacks:
if isinstance(callback, CustomLogger):
await callback.async_pre_call_hook(
user_api_key_dict=user_api_key_dict,
cache=self.call_details["user_api_key_cache"],
data=data,
call_type=call_type,
)
注册的两个核心限流器:
2.1 _PROXY_MaxBudgetLimiter¶
从 DualCache 读 {user_id}_user_api_key_user_id 行的 spend / max_budget:
# hooks/max_budget_limiter.py:24-41
user_row = await cache.async_get_cache(cache_key, ...)
max_budget = user_row["max_budget"]
curr_spend = user_row["spend"]
...
if curr_spend >= max_budget:
raise HTTPException(status_code=429, detail="Max budget limit reached.")
注意:缓存里的 spend 是异步刷新的(见 03-spend-update-flow.md),可能与 PG 短暂不一致(秒级),但只有在 response_cost 本身正确累加后才会进入缓存。如果 response_cost = 0,PG 与缓存都不会增长,限流器永远看到旧值。
2.2 _PROXY_MaxParallelRequestsHandler_v3¶
litellm/proxy/hooks/parallel_request_limiter_v3.py:181 起的 async_pre_call_hook:
# 概念示意,实际代码 line 181-446
checks = [
(key.tpm_limit, key.current_tpm),
(key.rpm_limit, key.current_rpm),
(key.max_parallel_requests, key.current_parallel),
(user.tpm_limit, user.current_tpm),
(user.rpm_limit, user.current_rpm),
(team.tpm_limit, team.current_tpm),
...
(key.model_tpm_limit[requested_model], key.current_model_tpm[requested_model]),
]
for limit, current in checks:
if limit is not None and current >= limit:
raise HTTPException(status_code=429, detail=...)
逐层 short-circuit。所有计数都来自 Redis 计数器,与 spend 完全独立。
Pre-call 阶段不包含的事¶
- 不会重新计算 cost;cost 只在响应回来后算
- 不会读
model_prices_and_context_window.json - 不会调任何 LLM provider
因此:任何由价格表配置错误引发的限流问题,根因一定不在 pre-call 阶段,而在 post-call 阶段(成本计算 → spend 累加)。
整体流程图¶
sequenceDiagram
participant C as Client
participant A as user_api_key_auth
participant Auth as auth_checks
participant PL as ProxyLogging.pre_call_hook
participant Hooks as Limiter Hooks
participant R as Router
C->>A: POST /chat/completions
A->>Auth: _virtual_key_max_budget_check(token)
Auth-->>A: ok 或 BudgetExceededError
A->>Auth: team / user / org budget checks (×6)
Auth-->>A: ok
A->>PL: pre_call_hook(token, data)
PL->>Hooks: MaxBudgetLimiter.async_pre_call_hook
Hooks-->>PL: ok (or 429)
PL->>Hooks: ParallelRequestHandler_v3.async_pre_call_hook
Hooks-->>PL: ok (or 429 TPM/RPM)
PL-->>A: ok
A->>R: 路由到上游 LLM
R-->>C: 响应(async)触发 post-call 累加 spend / TPM