04 - LiteLLM 特有字段¶

除了 OpenAI 标准字段，LiteLLM 还在 body 里接受一批自己消费、不透传给上游的字段。下游可以在 /chat/completions 请求体的顶层直接传。

源码：all_litellm_params 列表 litellm/types/utils.py:2907，pre_call 处理 litellm/proxy/litellm_pre_call_utils.py:810。

可观测性 / 归因¶

`metadata`¶

类型	dict
用途	任意 K-V 标签，会写进 spend logs、structured log、Langfuse/Helicone/Promptlayer 等回调
生效位置	`metadata.user_api_key_user_id` / `user_api_key_team_id` 会被 LiteLLM 自动注入；其余字段保留原样
示例	`"metadata": {"customer_id": "biz-xxx", "feature": "summary"}`

`litellm_metadata`¶

和 metadata 几乎一样，区别： - 如果走 Assistants / Threads API，用 litellm_metadata（因为 metadata 是上游协议已占用字段）。 - 对 /chat/completions，两者都可用；litellm_metadata 会被 merge 进 metadata 变量。

`tags`¶

{"tags": ["prod", "team-a"]}

列表形式的标签，用于 spend tracking 按 tag 聚合。
也可以通过 header x-litellm-tags: prod,team-a 传。
Team/Key 配置里定义的 tags 会与请求 tags 自动合并（不覆盖）。

`litellm_trace_id` / `litellm_session_id`¶

字段	含义
`litellm_trace_id`	请求级 trace id，会写进 spend logs 和 callback
`litellm_session_id`	跨请求的会话 id，用于把多轮归到同一 session（Langfuse session、Arize session 等）

等价 header：x-litellm-trace-id、x-litellm-session-id、x-litellm-agent-id。

`user`¶

OpenAI 原生字段。LiteLLM 额外把它当 end_user_id 使用： - 回填到 spend logs 的 user_id 维度 - 参与 end-user 级预算限制（如果配置了）

`no-log`¶

{"no-log": true}

关闭本次请求的 callback logging（不写 Langfuse / DB spend logs / Helicone 等）。注意 key 是带连字符的 "no-log"。

缓存¶

`caching` (bool)¶

{"caching": true}

开启 LiteLLM 侧的 cache（Redis / in-memory / S3，取决于 cache_params）。默认 false。

`cache` (dict)¶

更细颗粒的缓存控制：

{
  "cache": {
    "no-cache": true,   // 本次跳过缓存读
    "no-store": true,   // 本次不写缓存
    "ttl": 3600,
    "s-maxage": 600
  }
}

等价 header：Cache-Control: no-cache、s-maxage=600 会被 add_litellm_data_to_request 解析成 data["ttl"]。

`ttl` / `cache_key` / `preset_cache_key`¶

ttl：本次缓存写入的 TTL（秒）
cache_key：显式指定缓存 key（否则 LiteLLM 按 messages hash）
preset_cache_key：预计算好的 cache key

容错 / 重试 / 路由¶

`fallbacks`¶

{"fallbacks": ["gpt-4o-mini", "claude-3-5-haiku"]}

当前模型失败后，按顺序尝试这些模型。配合 context_window_fallback_dict 还能按上下文超限触发。

`context_window_fallback_dict`¶

{"context_window_fallback_dict": {"gpt-4o": "claude-3-5-sonnet"}}

超 context 窗口时，自动切到 value 指定的模型。

`num_retries` / `max_retries`¶

num_retries：LiteLLM Router 的业务级重试（切 deployment）
max_retries：HTTP 层重试（同一 deployment）

`cooldown_time`¶

Router 失败降权的冷却时间（秒）。

`timeout` / `stream_timeout` / `request_timeout`¶

不同层次的超时： - timeout：整个 completion() 调用的超时 - stream_timeout：流模式下读事件的超时 - request_timeout：HTTP 请求级超时

`retry_policy` / `retry_strategy`¶

细粒度重试策略，见 Router 文档。

安全 / 内容过滤¶

`guardrails`¶

{"guardrails": ["pii_masker", "prompt_injection_detector"]}

启用具体的 guardrail pipeline（必须在 config.yaml 里定义过）。

`disable_global_guardrails`¶

Team/Key metadata 里可以置 true，跳过全局 guardrail。请求 body 里一般不直接传，一般由 team 配置决定。

Prompt 管理¶

`prompt_id` / `prompt_variables` / `prompt_version` / `prompt_label`¶

用于 LiteLLM Prompt Management 功能：

{
  "prompt_id": "summary-v3",
  "prompt_variables": {"name": "Alice"},
  "prompt_label": "production"
}

请求会从 Humanloop / Langfuse / 内置 prompt registry 拉 prompt 模板，渲染后替代 messages。

`litellm_system_prompt`¶

强制覆盖 system prompt（不替换 messages，而是追加/覆盖 system role）。

调试 / mock¶

`mock_response`¶

{"mock_response": "This is a canned response"}

不走 provider，直接返回这个内容。做前端联调、单元测试时非常常用。

`mock_tool_calls`¶

{"mock_tool_calls": [{"name": "f", "args": {"x": 1}}]}

模拟工具调用响应。

`mock_timeout`¶

true 时模拟 provider 超时。

`litellm_request_debug`¶

开启后响应里会附加内部调试信息（注意：含 provider request 结构，别开在生产）。

字段覆盖 / 丢弃¶

字段	类型	含义
`drop_params`	bool	遇到 provider 不支持的字段直接丢，不报错
`allowed_openai_params`	array[string]	白名单，跳过对这些字段的支持度检查，原样透传
`additional_drop_params`	array[string]	黑名单，无论 provider 支不支持都丢

详见 05-provider-passthrough-and-drop-params。

路由 / 模型选择¶

`model` / `model_list`¶

model: 必填
model_list: 仅 SDK 直调时使用，代理模式请用 yaml 配置

`deployment_id`¶

Azure 场景用来强制指定 deployment。

`api_base` / `api_key` / `api_version` / `organization`¶

请求级覆盖（见 01-openai-standard-params § 路由级覆盖）。

`use_in_pass_through` / `use_litellm_proxy`¶

内部路由标志，一般业务方不需要显式传。

计费相关（CustomPricing）¶

body 里传这几个字段可以按请求动态定价（Router 侧会把它们读进 deployment 的 litellm_params）：

字段	类型	含义
`input_cost_per_token`	float	每输入 token 价格
`output_cost_per_token`	float	每输出 token 价格
`input_cost_per_second`	float	按秒计费（audio / video）
`output_cost_per_second`	float	同上，输出侧
`base_model`	string	用真实 base model 名去查 model_prices_and_context_window.json（Azure 自定义 deployment 名常用）

生产环境一般不允许下游直接改价。见 billing-and-pricing/04-billing-flow。

预算 / 限流（极少用）¶

字段	说明
`max_budget` / `budget_duration`	请求级预算（罕见）
`rpm` / `tpm`	请求级速率覆盖
`max_parallel_requests`	请求级并发
`fallback_depth` / `max_fallbacks`	fallback 链的深度控制

这些正常应该在 Key/Team/User 级别设定，body 里传值一般被忽略或只对 SDK 直调生效。

常用组合模板¶

带可观测性标签的普通调用

{
  "model": "gpt-4o",
  "messages": [{"role": "user", "content": "..."}],
  "metadata": {"feature": "doc-summary"},
  "tags": ["prod", "team-search"],
  "litellm_session_id": "sess-abc-123"
}

带缓存 + 降级

{
  "model": "claude-3-5-sonnet",
  "messages": [...],
  "caching": true,
  "fallbacks": ["gpt-4o"],
  "num_retries": 2
}

调试联调

{
  "model": "any",
  "messages": [{"role":"user","content":"ping"}],
  "mock_response": "pong"
}

04 - LiteLLM 特有字段¶

可观测性 / 归因¶

metadata¶

litellm_metadata¶

tags¶

litellm_trace_id / litellm_session_id¶

user¶

no-log¶

缓存¶

caching (bool)¶

cache (dict)¶

ttl / cache_key / preset_cache_key¶

容错 / 重试 / 路由¶

fallbacks¶

context_window_fallback_dict¶

num_retries / max_retries¶

cooldown_time¶

timeout / stream_timeout / request_timeout¶

retry_policy / retry_strategy¶