03 · Router 行为：每个异常会触发什么¶

本文从"异常已经被抛出"开始，讲 Router 接住之后会怎么处理：重试 / 冷却 / fallback / 直接抛给客户端。读完应该能回答：

我配了 num_retries=3，为什么 401 错误没有重试？
BadRequestError 跟 RateLimitError 同样是失败，Router 反应有什么不同？
context_window_fallbacks 和 fallbacks 同时设了，谁先生效？
客户端最后看到的 status_code 是哪一次失败的？
"No deployments available" 为什么是 429 不是 500？

📌 本文跟 docs/cooldown/ 系列的边界： - cooldown 系列 = "deployment 健康管理这套机制是怎么工作的"（DualCache、Redis、V1/V2 触发路径） - 本文 = "看到一个异常，Router 会做什么"——cooldown 只是其中一步，主线是 retry + fallback + 最终抛错

1. 决策五步走¶

异常进入 Router 后的完整决策路径：

flowchart TB
    Excep["LiteLLM 异常被抛出<br/>(已经过 exception_type 映射)"]

    Step0["① 算 num_retries<br/>5 层优先级"]
    Step1["② _should_retry(status_code)<br/>utils.py:6526<br/>408/409/429/5xx → True"]
    Step2["③ should_retry_this_error()<br/>router.py:5394<br/>8 条特殊规则"]
    Step3["④ 重试 num_retries 次<br/>每次选可用 deployment"]
    Step4["⑤ 都失败 → fallback 三层链<br/>context_window / content_policy / generic"]
    Cooldown["失败回调:<br/>deployment_callback_on_failure<br/>→ cooldown 流程<br/>(见 docs/cooldown/)"]
    Final["最终抛给客户端"]

    Excep --> Step0
    Step0 --> Step1
    Step1 -- "False (4xx 非 408/429)" --> Step2
    Step1 -- True --> Step2
    Step2 -- "8 条规则任一命中 → raise" --> Step4
    Step2 -- "通过" --> Step3
    Step3 -- 成功 --> Final
    Step3 -- 全失败 --> Step4
    Step4 -- 有 fallback --> NewModelGroup["切到新 model_group<br/>(递归回 Router)"]
    Step4 -- 无 fallback 或都失败 --> Final
    Step3 -.写失败统计.-> Cooldown
    NewModelGroup -.失败时.-> Cooldown

    style Step0 fill:#cef
    style Step1 fill:#ffd
    style Step2 fill:#ffd
    style Step3 fill:#dfd
    style Step4 fill:#fdd
    style Cooldown fill:#ddd

⚠️ 关键事实： 1. cooldown 是副作用，不在主线——cooldown 写入由失败回调 deployment_callback_on_failure 异步触发，不影响"这次请求该不该重试 / fallback"。 2. retry 和 fallback 是两层独立：retry 在同一个 model_group 里换 deployment；fallback 切到另一个 model_group。 3. 客户端看到的 status_code 来自最后一次失败——前面所有重试和 fallback 都失败后，被 ProxyException 包装抛出。如果中间某次 fallback 成功，客户端拿到的是 200。

2. 第 0 步：`num_retries` 的 5 层优先级¶

router.py:5183 num_retries = kwargs.pop("num_retries")，再被 5213-5218 和 5241-5249 两段覆写。最终生效值的优先级（从高到低）：

层	来源	何时生效
1	`deployment.num_retries` 通过 `exception.num_retries` 透传	异常对象的 `num_retries` 字段不为 None
2	`RetryPolicy` 按异常类型查表	配了 `retry_policy` 且能匹配当前异常类型
3	per-request `kwargs["num_retries"]`	调用 `completion()` 时传了
4	`Router.num_retries`	初始化 Router 时传了 `num_retries`
5	`litellm.num_retries` 全局	在外面 `import litellm; litellm.num_retries = N`
6	`openai.DEFAULT_MAX_RETRIES = 2`（constants.py:32）	兜底默认

⚠️ 覆写顺序的暗坑：层 1 在 5213-5218 先执行，层 2 在 5241-5249 后执行——也就是 RetryPolicy 会覆盖 deployment 自带的 num_retries。如果你某个 deployment 想关掉所有重试（num_retries=0），但全局 RetryPolicy 又针对当前异常配了 N 次，最终按 RetryPolicy 的 N。

⚠️ per-request "num_retries" 是 pop 不是 get——5183 行 kwargs.pop("num_retries") 拿走后从 kwargs 删除，所以下游 make_call 看不到。如果你期望 per-request 透传给上游 SDK 用，不行。

3. 第 1 关：`_should_retry(status_code)` 白名单¶

litellm/utils.py:6526-6552：

def _should_retry(status_code: int):
    """Retries on 408, 409, 429 and 500 errors."""
    if status_code == 408: return True
    if status_code == 409: return True   # ← Conflict（实际很少见）
    if status_code == 429: return True
    if status_code >= 500: return True
    return False

status_code	_should_retry	对应异常
400	❌	`BadRequestError` 及子类
401	❌	`AuthenticationError`
403	❌	`PermissionDeniedError`
404	❌	`NotFoundError`
408	✅	`Timeout`
409	✅	—
422	❌	`UnprocessableEntityError`
429	✅	`RateLimitError`
500	✅	`InternalServerError` / `APIConnectionError`(硬编码 500) / `APIResponseValidationError`
502	✅	`BadGatewayError`
503	✅	`ServiceUnavailableError` / `MidStreamFallbackError`
504	✅	`Timeout`(exception_status_code=504)

⚠️ APIConnectionError 双坑：exceptions.py:713 硬编码 status_code=500，理应 retry。但： 1. 同时被 cooldown 字符串白名单跳过（cooldown_handlers.py:57-63），所以单 deployment 反复 retry 都不会让它进 cooldown。 2. 上层 should_retry_this_error（§4）会进一步过滤——APIConnectionError 没有特殊处理，按 status_code=500 走，最终能 retry。

4. 第 2 关：`should_retry_this_error` 8 条特殊规则¶

router.py:5394-5464。这是第二道判定，比 _should_retry 更细——同样的 5xx 异常，根据 fallback 配置和健康 deployment 数量决定是 retry 还是直接抛到 fallback 层。

执行顺序：

#	条件	行为
1	`ContextWindowExceededError` 且配了 `context_window_fallbacks`	`raise error` ←让 fallback 层接手
2	`ContentPolicyViolationError` 且配了 `content_policy_fallbacks`	`raise error`
3	`status_code` 不在 `_should_retry` 白名单且不在 (401, 403)	`raise error` ← 不 retry
4	`NotFoundError`	`raise error` ← 404 不 retry，但前面已被 #3 过滤掉，相当于双保险
5	`RateLimitError` 且无健康 deployment 且有 fallbacks	`raise error` ← 让 fallback 接手
6	`AuthenticationError` 且 model_group 只有 1 个 deployment	`raise error` ← 独苗重试无意义
7	`_num_healthy_deployments <= 0`	`raise error` ← 没人能服务，直接抛
8	都通过	`return True` 进入重试循环

详细解读：

规则 3 的特例：401/403 仍可 retry¶

if status_code is not None and not litellm._should_retry(status_code):
    # 401/403 are special cases - allow retry if multiple deployments exist (handled below)
    if status_code not in (401, 403):
        raise error

按 _should_retry，401 和 403 都不该 retry。但这里特例放行——继续往下走，让规则 6（AuthenticationError 单 deployment 不 retry）和规则 7（无健康 deployment 不 retry）兜底。

实际效果：401 在多 deployment 的 model_group 里会 retry（换一台 deployment 可能 key 是好的）；单 deployment 不 retry。

规则 5 vs 规则 7 的微妙差别¶

场景	触发的规则	结果
RateLimitError + 有 fallbacks + 0 健康 deployment	规则 5	raise，fallback 接手
RateLimitError + 无 fallbacks + 0 健康 deployment	规则 7	raise，无 fallback 也无 retry，直接抛到客户端
RateLimitError + 有 fallbacks + 有健康 deployment	都通过	retry（重试当前 model_group）
RateLimitError + 无 fallbacks + 有健康 deployment	都通过	retry

⚠️ 关键洞察：规则 5 设计意图是"如果有 fallback 兜底，那当前 model_group 全挂了就尽快让 fallback 接手，不要再瞎 retry"。但规则 5 触发取决于 fallbacks 是否配置——同样的状况下，有 fallbacks 配置反而提早放弃 retry。这是合理的（fallback 比 retry 划算），但运维不知道这点会困惑。

规则 6 的"独苗"判定¶

if isinstance(error, openai.AuthenticationError):
    if _num_all_deployments <= 1:
        raise error

判定的是 _num_all_deployments（所有 deployment，不分健康），不是 _num_healthy_deployments。逻辑是：401 是 key 配置问题，多 deployment 才有不同 key 可换；只有 1 个 deployment，再 retry 也是同一个 key。

5. `RetryPolicy`：按异常类型定制次数¶

types/router.py:548-561:

class RetryPolicy(BaseModel):
    BadRequestErrorRetries: Optional[int] = None
    AuthenticationErrorRetries: Optional[int] = None
    TimeoutErrorRetries: Optional[int] = None
    RateLimitErrorRetries: Optional[int] = None
    ContentPolicyViolationErrorRetries: Optional[int] = None
    InternalServerErrorRetries: Optional[int] = None

但实际匹配代码 get_retry_from_policy.py:19-67 只检查 5 种：

异常类型	RetryPolicy 字段	实际生效
`AuthenticationError`	`AuthenticationErrorRetries`	✅
`Timeout`	`TimeoutErrorRetries`	✅
`RateLimitError`	`RateLimitErrorRetries`	✅
`ContentPolicyViolationError`	`ContentPolicyViolationErrorRetries`	✅
`BadRequestError`	`BadRequestErrorRetries`	✅
`InternalServerError`	`InternalServerErrorRetries`	❌ 被声明但没被实际使用

⚠️ InternalServerErrorRetries 是 dead config——结构体里有这个字段，但 get_retry_from_policy.py 整个文件没引用它。配了无效。

优先级（在 get_num_retries_from_retry_policy 内部）：

per-model-group retry_policy > 全局 retry_policy

router.py:5237-5249 调用时同时传 model_group_retry_policy 和 self.retry_policy，匹配函数内部先看 model_group 专用配置，没匹配再用全局。

⚠️ 检查顺序的副作用：get_retry_from_policy.py 47-67 用 isinstance 链：

if isinstance(exception, AuthenticationError) and ... : return ...
if isinstance(exception, Timeout) and ...: return ...
if isinstance(exception, RateLimitError) and ...: return ...
if isinstance(exception, ContentPolicyViolationError) and ...: return ...
if isinstance(exception, BadRequestError) and ...: return ...

→ ContentPolicyViolationError 是 BadRequestError 子类，但 ContentPolicyViolationError 检查在前，所以正确路由。 → ContextWindowExceededError 也是 BadRequestError 子类，但 RetryPolicy 里没有它的字段，会落到 BadRequestErrorRetries 里。意外的副作用：你配 BadRequestErrorRetries=5 想给普通 400 加 retry，结果连 ContextWindowExceededError 也跟着重试 5 次。

6. Fallback 三层链¶

router.py:4876-5046 async_function_with_fallbacks_common_utils。

执行顺序：

① if ContextWindowExceededError + 配了 context_window_fallbacks:
     切到 context_window_fallbacks[model_group]
② elif ContentPolicyViolationError + 配了 content_policy_fallbacks:
     切到 content_policy_fallbacks[model_group]
③ else (或上面没命中):
     切到 fallbacks[model_group] 或 fallbacks 中的 {"*": [default_list]}

6.1 `context_window_fallbacks`¶

YAML 示例：

router_settings:
  context_window_fallbacks:
    - gpt-3.5-turbo: [gpt-4-turbo, gpt-4]

触发条件：router.py:4933-4955 isinstance(e, litellm.ContextWindowExceededError)。

注意 should_retry_this_error 规则 1：配了 context_window_fallbacks 时，ContextWindowExceededError 不会 retry，直接进 fallback。这是合理的——同 model_group 内换 deployment 不会让 context 变大。

如果没配 context_window_fallbacks，会 fall-through 到 generic fallbacks（行 4957-4968 的日志提示）。

6.2 `content_policy_fallbacks`¶

逻辑跟 §6.1 完全对称，针对 ContentPolicyViolationError。

YAML 示例：

router_settings:
  content_policy_fallbacks:
    - claude-3-sonnet: [gpt-4o]   # Claude 拒了，换 GPT 试试

⚠️ 设计意图：不同 provider 的内容审核标准不同，被 A 拒的可能 B 接受。但不是所有内容都该绕过审核——这是 ops/安全的判断，本文不展开。

6.3 generic `fallbacks` + `default_fallbacks` + `{"*": ...}`¶

router.py:5004-5037：

if fallbacks is not None and model_group is not None:
    (fallback_model_group, generic_fallback_idx) = get_fallback_model_group(
        fallbacks=fallbacks,
        model_group=model_group,
    )
    if fallback_model_group is None and generic_fallback_idx is not None:
        fallback_model_group = fallbacks[generic_fallback_idx]["*"]
    ...

fallbacks 数组里两种条目： 1. 具名 {"gpt-3.5-turbo": ["gpt-4-turbo", "gpt-4"]} ——只在 model="gpt-3.5-turbo" 失败时触发 2. 通配 {"*": ["claude-3-haiku"]} ——任何 model 失败都触发（如果没找到具名的话）

default_fallbacks 参数（router.py:523-528）只是个语法糖——自动加 {"*": [default_fallbacks]} 到 self.fallbacks 末尾。

6.4 `max_fallbacks` 和 `fallback_depth`¶

router.py:4905-4908 给 input_kwargs 加 max_fallbacks 和 fallback_depth，防止 fallback 链无限递归。

配置	默认值	来源
`Router.max_fallbacks`	5	`constants.py:13` `ROUTER_MAX_FALLBACKS = int(os.getenv("ROUTER_MAX_FALLBACKS", 5))`

可以用 env ROUTER_MAX_FALLBACKS=10 调大。

7. retry 之间的睡眠：`_time_to_sleep_before_retry`¶

router.py:5512-5570:

if all_deployments is not None and len(all_deployments) == 1:
    pass   # 单 deployment 走退避
elif _num_healthy_deployments > 0:
    return 0   # 有健康的备选,立即重试

# 否则按 Retry-After header 或退避
response_headers = ...  # 拿 e.response.headers 或 e.litellm_response_headers
if response_headers is not None:
    timeout = litellm._calculate_retry_after(
        remaining_retries=remaining_retries,
        max_retries=num_retries,
        response_headers=response_headers,
        min_timeout=self.retry_after,
    )

关键事实： 1. 多 deployment + 有健康备选 → 0 秒立即 retry。这是为什么 LiteLLM 失败 fallback 看起来很快。 2. 单 deployment 或全部 cooldown → 走退避。min_timeout = Router.retry_after，默认 0；上游 Retry-After header 优先。 3. 退避算法看 utils.py:_calculate_retry_after（继承 openai-python）：基础是指数退避，被 header 覆盖。

⚠️ Retry-After 来源：优先 e.response.headers（直接拿上游返回），其次 e.litellm_response_headers（exception_mapping_utils.py:2397 把映射前的原始 headers 用 setattr 挂到异常对象上）。这两个都没有时只能用退避算法。

8. `RouterErrors`：4 个特殊字符串¶

types/router.py:516-528:

class RouterErrors(enum.Enum):
    user_defined_ratelimit_error = "Deployment over user-defined ratelimit."
    no_deployments_available = "No deployments available for selected model"
    no_deployments_with_tag_routing = "Not allowed to access model due to tags configuration"
    no_deployments_with_provider_budget_routing = "No deployments available - crossed budget"

这 4 个字符串值是 Router 抛错时用的 message 前缀（或子串），会被 ProxyException 重写 status_code。

8.1 `no_deployments_available` → 强制 429¶

proxy/_types.py:3219-3225:

if (
    "No healthy deployment available" in self.message
    or "No deployments available" in self.message
):
    self.code = "429"

触发场景： - model_group 配的所有 deployment 都在 cooldown - 没有匹配 model 的 deployment 配置

抛出位置举例：lowest_tpm_rpm_v2.py:556 / 669 / types/router.py:759。

后果：客户端看到 HTTP 429——但这不是上游限流，而是 LiteLLM 本地没有可用 deployment。SRE 看到 429 涨别马上去找上游 rate limit，先 redis-cli KEYS 'deployment:*:cooldown' 看 cooldown 列表。

8.2 `no_deployments_with_tag_routing` → 强制 401¶

proxy/_types.py:3226-3227:

elif RouterErrors.no_deployments_with_tag_routing.value in self.message:
    self.code = "401"

触发场景：tag-based routing 下，请求附带的 tag 不允许访问任何 deployment。返 401 而非 403——因为概念上是"你这个 key/team 的 tag 没权限"。

抛出位置：tag_based_routing.py:112。

8.3 其它两个不改 status_code¶

user_defined_ratelimit_error —— 已经是 RateLimitError，按 429 处理
no_deployments_with_provider_budget_routing —— 没特殊重写，按原异常 status_code

9. 每个异常的"完整反应链"速查¶

把上面所有逻辑合起来，给每条主要异常画一张"从抛出到客户端"的路径表：

异常	status	_should_retry	should_retry_this_error	默认 retry	RetryPolicy	进 cooldown	fallback 链	客户端最终看到
`BadRequestError`	400	❌	规则 3 raise	❌	`BadRequestErrorRetries`	❌	generic `fallbacks`	400
`ContextWindowExceededError`	400	❌	规则 1 raise（如配）	❌	落到 BR 字段	❌	`context_window_fallbacks` 优先，否则 generic	400
`ContentPolicyViolationError`	400	❌	规则 2 raise（如配）	❌	`ContentPolicyViolationErrorRetries`	❌	`content_policy_fallbacks` 优先，否则 generic	400
`AuthenticationError`	401	❌	规则 3 特例放行 + 规则 6 单 dep 拦	✅ 多 dep	`AuthenticationErrorRetries`	✅ 路径 1.4 立即冷	generic	401
`PermissionDeniedError`	403	❌	规则 3 特例放行 + 规则 7 无健康拦	✅ 有健康备选	—	❌（403 不在白名单）	generic	403
`NotFoundError`	404	❌	规则 3 raise + 规则 4 双保险	❌	—	✅ 路径 1.4 立即冷	generic	404
`Timeout`	408	✅	通过	✅	`TimeoutErrorRetries`	✅	generic	408
`UnprocessableEntityError`	422	❌	规则 3 raise	❌	—	❌	generic	422
`RateLimitError`	429	✅	规则 5（无健康+有 fallback 时 raise）	✅	`RateLimitErrorRetries`	✅ 路径 1.1 多 dep 即冷	generic	429（或 fallback 成功后 200）
`InternalServerError`	500	✅	通过	✅	`InternalServerErrorRetries` ⚠️ dead	✅	generic	500
`APIConnectionError`	500（硬编码）	✅	通过	✅	落到 BR 字段⚠️	❌ 字符串白名单跳过	generic	500
`BadGatewayError`	502	✅	通过	✅	落到 BR 字段⚠️	✅	generic	502
`ServiceUnavailableError`	503	✅	通过	✅	落到 BR 字段⚠️	✅	generic	503
`MidStreamFallbackError`	503	✅	通过	✅	同上	✅	generic（保留 partial content）	503 或 fallback 成功后 200
`APIError(任意 sc)`	入参	按 sc	按 sc	按 sc	落到 BR 字段⚠️	按 sc	generic	入参 sc
No healthy deployment	500（内部）	—	—	—	—	—	—	被 `ProxyException` 改成 429
No tag routing	—	—	—	—	—	—	—	被 `ProxyException` 改成 401

⚠️ "RetryPolicy 落到 BR 字段" 指 §5 说的副作用：ContextWindowExceededError / BadGatewayError / APIConnectionError 等都是 BadRequestError 的子类（或不在 RetryPolicy 检查列表里），最终匹配到 BadRequestErrorRetries——你配这个字段会意外影响这些异常。

实际上 BadGatewayError 不是 BadRequestError 子类（它继承 openai.APIStatusError），所以完全不被 RetryPolicy 覆盖——按全局 num_retries 重试。同理 ServiceUnavailableError 和 APIConnectionError。这是这张表里 ⚠️ 标注部分情况下不准确——需要按异常逐个看继承关系。详细继承表见 01-exception-catalog.md §0 总览。

10. 实战配置¶

10.1 默认（多数 prod 当前状态）¶

# 啥都没设
# num_retries = 2 (openai DEFAULT_MAX_RETRIES)
# max_fallbacks = 5 (ROUTER_MAX_FALLBACKS env)
# 无 RetryPolicy / 无 fallbacks

实际效果： - 任何可 retry 的异常重试 2 次 - 不可 retry 的异常直接抛 - 无 fallback —— deployment 全挂时立即返客户端

适用：小规模、单 model_group 服务。

10.2 重视可用性（多 model_group 推荐）¶

router_settings:
  num_retries: 3
  retry_policy:
    AuthenticationErrorRetries: 1  # 401 少 retry,避免被人滥用
    TimeoutErrorRetries: 3
    RateLimitErrorRetries: 2
    ContentPolicyViolationErrorRetries: 0  # 内容违规重试无意义
    BadRequestErrorRetries: 0  # 用户错重试无意义
  fallbacks:
    - gpt-4o: [claude-3-sonnet, gemini-1.5-pro]
    - gpt-3.5-turbo: [claude-3-haiku]
  context_window_fallbacks:
    - gpt-4o: [gemini-1.5-pro]  # 2M context
  content_policy_fallbacks:
    - claude-3-sonnet: [gpt-4o]  # Claude 拒了换 GPT

10.3 极致可用性（多 region 大流量）¶

router_settings:
  num_retries: 5
  retry_after: 0       # 已默认 0,有健康备选立即 retry
  max_fallbacks: 10    # 容忍更长 fallback 链
  cooldown_time: 60    # 从默认 5s 提到 60s
  fallbacks:
    - "*": [emergency-backup-model]  # 万能兜底
  default_fallbacks: [emergency-backup-model]  # 跟上面等价的语法糖

⚠️ num_retries=5 在多 deployment 场景：每次 retry 立即（0 秒），所以 5 次 retry 加 5 次 fallback 链路 = 最多 30 次后端调用。如果每次都失败，客户端等待时间可能很长（单 deployment 退避算法下）。

11. 单次请求的执行顺序（一图重述）¶

sequenceDiagram
    participant Client
    participant Router
    participant Dep1 as Deployment 1
    participant Dep2 as Deployment 2
    participant FB as Fallback model_group

    Client->>Router: completion(model="gpt-4o")
    Router->>Router: ① 算 num_retries (5 层优先级)

    rect rgb(255, 240, 220)
    note over Router: 第 1 次尝试
    Router->>Dep1: 调用
    Dep1-->>Router: 502 BadGatewayError
    Router->>Router: ② _should_retry(502)=True
    Router->>Router: ③ should_retry_this_error 通过
    end

    rect rgb(220, 240, 255)
    note over Router: ④ retry 第 1 次 (sleep 0s,有健康 dep)
    Router->>Dep2: 调用
    Dep2-->>Router: 429 RateLimitError
    Router->>Router: should_retry_this_error 通过 (有健康 dep)
    end

    rect rgb(220, 240, 255)
    note over Router: retry 第 2 次
    Router->>Dep1: 调用 (Dep1 还在 cooldown 吗?)
    Dep1-->>Router: 502 again
    end

    Router->>Router: num_retries 耗尽
    Router->>FB: ⑤ fallback: gpt-4o → claude-3-sonnet
    FB-->>Router: 200 OK
    Router-->>Client: 200 + response

    note over Router,Dep2: 失败回调异步触发 cooldown

下一步¶

客户端实际看到的 message JSON / proxy 日志格式 → 04-where-to-see.md
从症状倒推 → 05-troubleshooting-by-symptom.md
cooldown 详细机制（DualCache / Redis / V1/V2）→ docs/cooldown/01-mechanism.md

03 · Router 行为：每个异常会触发什么¶

1. 决策五步走¶

2. 第 0 步：num_retries 的 5 层优先级¶

3. 第 1 关：_should_retry(status_code) 白名单¶

4. 第 2 关：should_retry_this_error 8 条特殊规则¶