跳转至

02 — Pre-call 检查链路

请求到达后,在调用上游 LLM 之前,LiteLLM 顺序执行两段限流:鉴权阶段的 budget checkProxyLogging 的 pre_call_hook 链。两段都可能抛 429,必须都通过才能继续。


阶段一:鉴权阶段(auth_checks)

入口:litellm/proxy/auth/user_api_key_auth.py(FastAPI 依赖)。

user_api_key_auth() 从 header 取 token、查 DB/缓存得到 UserAPIKeyAuth 对象。

⚠️ 进入 budget 检查前的旁路开关:在调用下面这一连串 budget 检查 之前user_api_key_auth.py:1086-1096 会先跑 _is_model_cost_zero(model, llm_router)。一旦它返回 True,下面整段(key / team_member / team / user / end_user / org / project / model_max_budget)全部跳过——即使 spend 已经远超 max_budget,也不会抛 BudgetExceededError。这条旁路是 GLM/MiniMax 等模型限流失效的真正根因。详细机制见 05-skip-budget-checks-bug.md

下面这些检查只在 skip_budget_checks=False 时才会执行:

_virtual_key_max_budget_check          # auth_checks.py:2669  → key 级 spend
_virtual_key_soft_budget_check         # auth_checks.py:2720  → soft budget 告警
team_member_budget_check               # auth_checks.py:2846  → team_member spend
team_max_budget_check                  # auth_checks.py:2890  → team spend
user_max_budget_check                  # auth_checks.py:343   → user spend
end_user_max_budget_check              # auth_checks.py:366   → end_user spend
project_max_budget_check               # auth_checks.py:3020  → project spend
organization_max_budget_check          # auth_checks.py:3202  → org spend

每一处都是同一个模板:

if subject.spend >= subject.max_budget:
    raise litellm.BudgetExceededError(
        current_cost=subject.spend,
        max_budget=subject.max_budget,
    )

关键:这些检查全部读 spend 字段,不读其他统计。spend 必须由后续的 _PROXY_track_cost_callback 在每次请求结束时写回,否则永远是 0 / 旧值。详见 03-spend-update-flow.md

BudgetExceededError 在外层被映射为 HTTP 400 + {"error": {"code": "BudgetExceeded", ...}}


阶段二:ProxyLogging.pre_call_hook 链

鉴权通过后,路由代码进入 proxy_logging_obj.pre_call_hook(...),依次调用所有已注册的 CustomLogger.async_pre_call_hook

注册流程:litellm/proxy/utils.py

# proxy/utils.py: ProxyLogging.__init__()
self.max_budget_limiter = _PROXY_MaxBudgetLimiter()           # line ~313
self.max_parallel_request_limiter = _PROXY_MaxParallelRequestsHandler_v3()  # ~310

# proxy/utils.py: _add_proxy_hooks()
litellm.callbacks.append(self.max_budget_limiter)
litellm.callbacks.append(self.max_parallel_request_limiter)

调用链:litellm/proxy/utils.py:1328-1380pre_call_hook 主循环)

for callback in litellm.callbacks:
    if isinstance(callback, CustomLogger):
        await callback.async_pre_call_hook(
            user_api_key_dict=user_api_key_dict,
            cache=self.call_details["user_api_key_cache"],
            data=data,
            call_type=call_type,
        )

注册的两个核心限流器:

2.1 _PROXY_MaxBudgetLimiter

DualCache{user_id}_user_api_key_user_id 行的 spend / max_budget

# hooks/max_budget_limiter.py:24-41
user_row = await cache.async_get_cache(cache_key, ...)
max_budget = user_row["max_budget"]
curr_spend = user_row["spend"]
...
if curr_spend >= max_budget:
    raise HTTPException(status_code=429, detail="Max budget limit reached.")

注意:缓存里的 spend 是异步刷新的(见 03-spend-update-flow.md),可能与 PG 短暂不一致(秒级),但只有在 response_cost 本身正确累加后才会进入缓存。如果 response_cost = 0,PG 与缓存都不会增长,限流器永远看到旧值。

2.2 _PROXY_MaxParallelRequestsHandler_v3

litellm/proxy/hooks/parallel_request_limiter_v3.py:181 起的 async_pre_call_hook

# 概念示意,实际代码 line 181-446
checks = [
    (key.tpm_limit,   key.current_tpm),
    (key.rpm_limit,   key.current_rpm),
    (key.max_parallel_requests, key.current_parallel),
    (user.tpm_limit,  user.current_tpm),
    (user.rpm_limit,  user.current_rpm),
    (team.tpm_limit,  team.current_tpm),
    ...
    (key.model_tpm_limit[requested_model], key.current_model_tpm[requested_model]),
]
for limit, current in checks:
    if limit is not None and current >= limit:
        raise HTTPException(status_code=429, detail=...)

逐层 short-circuit。所有计数都来自 Redis 计数器,与 spend 完全独立。


Pre-call 阶段不包含的事

  • 不会重新计算 cost;cost 只在响应回来后算
  • 不会读 model_prices_and_context_window.json
  • 不会调任何 LLM provider

因此:任何由价格表配置错误引发的限流问题,根因一定不在 pre-call 阶段,而在 post-call 阶段(成本计算 → spend 累加)


整体流程图

sequenceDiagram
    participant C as Client
    participant A as user_api_key_auth
    participant Auth as auth_checks
    participant PL as ProxyLogging.pre_call_hook
    participant Hooks as Limiter Hooks
    participant R as Router

    C->>A: POST /chat/completions
    A->>Auth: _virtual_key_max_budget_check(token)
    Auth-->>A: ok 或 BudgetExceededError
    A->>Auth: team / user / org budget checks (×6)
    Auth-->>A: ok
    A->>PL: pre_call_hook(token, data)
    PL->>Hooks: MaxBudgetLimiter.async_pre_call_hook
    Hooks-->>PL: ok (or 429)
    PL->>Hooks: ParallelRequestHandler_v3.async_pre_call_hook
    Hooks-->>PL: ok (or 429 TPM/RPM)
    PL-->>A: ok
    A->>R: 路由到上游 LLM
    R-->>C: 响应(async)触发 post-call 累加 spend / TPM