02 — Pre-call 检查链路¶

请求到达后，在调用上游 LLM 之前，LiteLLM 顺序执行两段限流：鉴权阶段的 budget check 与 ProxyLogging 的 pre_call_hook 链。两段都可能抛 429，必须都通过才能继续。

阶段一：鉴权阶段（auth_checks）¶

入口：litellm/proxy/auth/user_api_key_auth.py（FastAPI 依赖）。

user_api_key_auth() 从 header 取 token、查 DB/缓存得到 UserAPIKeyAuth 对象。

⚠️ 进入 budget 检查前的旁路开关：在调用下面这一连串 budget 检查之前，user_api_key_auth.py:1086-1096 会先跑 _is_model_cost_zero(model, llm_router)。一旦它返回 True，下面整段（key / team_member / team / user / end_user / org / project / model_max_budget）全部跳过——即使 spend 已经远超 max_budget，也不会抛 BudgetExceededError。这条旁路是 GLM/MiniMax 等模型限流失效的真正根因。详细机制见 05-skip-budget-checks-bug.md。

下面这些检查只在 skip_budget_checks=False 时才会执行：

_virtual_key_max_budget_check          # auth_checks.py:2669  → key 级 spend
_virtual_key_soft_budget_check         # auth_checks.py:2720  → soft budget 告警
team_member_budget_check               # auth_checks.py:2846  → team_member spend
team_max_budget_check                  # auth_checks.py:2890  → team spend
user_max_budget_check                  # auth_checks.py:343   → user spend
end_user_max_budget_check              # auth_checks.py:366   → end_user spend
project_max_budget_check               # auth_checks.py:3020  → project spend
organization_max_budget_check          # auth_checks.py:3202  → org spend

每一处都是同一个模板：

if subject.spend >= subject.max_budget:
    raise litellm.BudgetExceededError(
        current_cost=subject.spend,
        max_budget=subject.max_budget,
    )

关键：这些检查全部读 spend 字段，不读其他统计。spend 必须由后续的 _PROXY_track_cost_callback 在每次请求结束时写回，否则永远是 0 / 旧值。详见 03-spend-update-flow.md。

BudgetExceededError 在外层被映射为 HTTP 400 + {"error": {"code": "BudgetExceeded", ...}}。

阶段二：ProxyLogging.pre_call_hook 链¶

鉴权通过后，路由代码进入 proxy_logging_obj.pre_call_hook(...)，依次调用所有已注册的 CustomLogger.async_pre_call_hook。

注册流程：litellm/proxy/utils.py

# proxy/utils.py: ProxyLogging.__init__()
self.max_budget_limiter = _PROXY_MaxBudgetLimiter()           # line ~313
self.max_parallel_request_limiter = _PROXY_MaxParallelRequestsHandler_v3()  # ~310

# proxy/utils.py: _add_proxy_hooks()
litellm.callbacks.append(self.max_budget_limiter)
litellm.callbacks.append(self.max_parallel_request_limiter)

调用链：litellm/proxy/utils.py:1328-1380（pre_call_hook 主循环）

for callback in litellm.callbacks:
    if isinstance(callback, CustomLogger):
        await callback.async_pre_call_hook(
            user_api_key_dict=user_api_key_dict,
            cache=self.call_details["user_api_key_cache"],
            data=data,
            call_type=call_type,
        )

注册的两个核心限流器：

2.1 `_PROXY_MaxBudgetLimiter`¶

从 DualCache 读 {user_id}_user_api_key_user_id 行的 spend / max_budget：

# hooks/max_budget_limiter.py:24-41
user_row = await cache.async_get_cache(cache_key, ...)
max_budget = user_row["max_budget"]
curr_spend = user_row["spend"]
...
if curr_spend >= max_budget:
    raise HTTPException(status_code=429, detail="Max budget limit reached.")

注意：缓存里的 spend 是异步刷新的（见 03-spend-update-flow.md），可能与 PG 短暂不一致（秒级），但只有在 response_cost 本身正确累加后才会进入缓存。如果 response_cost = 0，PG 与缓存都不会增长，限流器永远看到旧值。

2.2 `_PROXY_MaxParallelRequestsHandler_v3`¶

litellm/proxy/hooks/parallel_request_limiter_v3.py:181 起的 async_pre_call_hook：

# 概念示意，实际代码 line 181-446
checks = [
    (key.tpm_limit,   key.current_tpm),
    (key.rpm_limit,   key.current_rpm),
    (key.max_parallel_requests, key.current_parallel),
    (user.tpm_limit,  user.current_tpm),
    (user.rpm_limit,  user.current_rpm),
    (team.tpm_limit,  team.current_tpm),
    ...
    (key.model_tpm_limit[requested_model], key.current_model_tpm[requested_model]),
]
for limit, current in checks:
    if limit is not None and current >= limit:
        raise HTTPException(status_code=429, detail=...)

逐层 short-circuit。所有计数都来自 Redis 计数器，与 spend 完全独立。

Pre-call 阶段不包含的事¶

不会重新计算 cost；cost 只在响应回来后算
不会读 model_prices_and_context_window.json
不会调任何 LLM provider

因此：任何由价格表配置错误引发的限流问题，根因一定不在 pre-call 阶段，而在 post-call 阶段（成本计算 → spend 累加）。

整体流程图¶

sequenceDiagram
    participant C as Client
    participant A as user_api_key_auth
    participant Auth as auth_checks
    participant PL as ProxyLogging.pre_call_hook
    participant Hooks as Limiter Hooks
    participant R as Router

    C->>A: POST /chat/completions
    A->>Auth: _virtual_key_max_budget_check(token)
    Auth-->>A: ok 或 BudgetExceededError
    A->>Auth: team / user / org budget checks (×6)
    Auth-->>A: ok
    A->>PL: pre_call_hook(token, data)
    PL->>Hooks: MaxBudgetLimiter.async_pre_call_hook
    Hooks-->>PL: ok (or 429)
    PL->>Hooks: ParallelRequestHandler_v3.async_pre_call_hook
    Hooks-->>PL: ok (or 429 TPM/RPM)
    PL-->>A: ok
    A->>R: 路由到上游 LLM
    R-->>C: 响应（async）触发 post-call 累加 spend / TPM