概念 & 配方

策略

路由如何决策：先过滤，再对存活者排序，不存在静默降级。按目标分组的可复用策略预设。

路由语义

先过滤，再对存活者排序，不存在静默降级。路由是确定性且有序的——成本上限和质量下限位于过滤器中，而非评分中，因此低价模型永远无法凭分数超越你的规则。若无法通过最低门槛，它就根本不是候选模型。

先过滤

缺少所需能力、未达到质量下限或超出价格上限的候选模型将被淘汰——永远不会被悄悄替换。

对存活者排序

在通过门槛的模型中，评分器对其排序，通常以成本由低到高为优先。

选取排名最高者

确定性地选出一个模型。相同策略、相同输入、相同目录，始终选出相同模型。

按顺序回退

若所选模型超时或出错，路由器将依次切换至下一个通过候选（成本由低到高），并记录每次切换。

逐步决策示例

一条策略（需要工具能力、bench_intelligence ge 0.5、成本择优）作用于实时目录（price_out 为每百万输出令牌的 USD 价格）。过滤器淘汰了两个未达下限的模型；成本最低的存活者胜出：

modelprice_outintelverdict

deepseek-v4-flash$0.400.465未达下限

minimax-m2.7$0.500.496未达下限

deepseek-v4-pro$1.500.515胜出

glm-5.1$2.000.514高于下限

gpt-5.5$10.000.602高于下限

下限是约束，不是建议。路由在你的下限之内优化成本，而不会绕过下限。若没有模型满足要求，请求将明确失败（no_candidates）；你永远不会得到未经请求的静默降级。项（term）在运行前会被验证并生成指纹（fingerprint），因此格式错误的策略会被拒绝（invalid_policy），而不是被错误路由。

在不产生推理成本的情况下预览决策：POST /x/rank 返回排序后的存活者，以及每个模型被淘汰的规则。

观察策略决策过程

同样的原理在工作流中体现：每个节点声明自己的策略——过滤器加排序规则。下方图表展示了每一步如何淘汰未达标的候选模型，并锁定成本最低的存活者（output 节点不声明模型，仅返回上一步的结果）。图表中的模型 ID 与成本均来自真实目录数据。

策略预设

按目标分组的可复用路由模式。每个预设都是一个普通的 policy_ir：粘贴后调整下限与上限即可使用。建议先用 POST /x/rank 进行演练。

智能均衡能力与价格兼顾，无需复杂权衡成本 · 质量

smart-balance.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]]],
  ["add",
    ["scale", 0.6, ["normalize", ["field", "bench_intelligence"]]],
    ["scale", 0.4, ["neg", ["normalize", ["field", "price_out"]]]]],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

低价可用在智能下限内降低成本，unhardcoded 的核心机制成本

cheapest-decent.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]],
         ["cmp", "bench_intelligence", "ge", 0.5]],
  ["neg", ["normalize", ["field", "price_out"]]],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

仅免费仅限输出成本为零的模型成本

free-only.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]], ["cmp", "price_out", "le", 0]],
  ["field", "bench_intelligence"],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

高智能优先不设成本上限，选取能力更强的模型，适用于关键任务质量

best-intelligence.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]]],
  ["field", "bench_intelligence"],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

仅推理模型仅限具备推理能力的模型质量

reasoning-only.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]], ["is", "cap_reasoning"]],
  ["field", "bench_intelligence"],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

视觉 · 成本优先支持图像输入，选取成本更低的模型能力

vision-cheapest.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]], ["is", "in_image"]],
  ["neg", ["normalize", ["field", "price_out"]]],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

长上下文 RAG要求大上下文窗口，再选取满足条件中成本更低的模型能力

long-context-rag.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]],
         ["cmp", "context", "ge", 200000]],
  ["neg", ["normalize", ["field", "price_out"]]],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

结构化输出按能力过滤后，在能力与成本间择优能力

structured-output.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]],
         ["has_cap", "supports_json_mode"]],
  ["add",
    ["scale", 0.5, ["normalize", ["field", "bench_intelligence"]]],
    ["scale", 0.5, ["neg", ["normalize", ["field", "price_out"]]]]],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

智能体集群工具调用能力更强，取排名前 5 的智能体模型智能体

agentic-fleet.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]],
         ["has_cap", "supports_tools"],
         ["cmp", "bench_agentic_rank", "le", 5]],
  ["field", "bench_agentic"],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

限价编程具备工具调用和硬性输出价格上限的前 5 名编程模型编程 · 成本

coding-under-a-cap.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]],
         ["has_cap", "supports_tools"],
         ["cmp", "bench_coding_rank", "le", 5],
         ["cmp", "price_out", "le", 5]],
  ["field", "bench_coding"],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

可复现抽样用于集成多样性的可复现随机采样智能体

reproducible-sample.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]]],
  ["field", "bench_intelligence"],
  ["sample", 0.3], ["id"], ["always", {"action": "next_candidate"}]]

低延迟对话按延迟过滤，再在速度与能力间择优延迟

low-latency-chat.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]],
         ["cmp", "latency_ms", "le", 2000]],
  ["add",
    ["scale", 0.7, ["neg", ["normalize", ["field", "latency_ms"]]]],
    ["scale", 0.3, ["normalize", ["field", "bench_intelligence"]]]],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

私密 / 合规仅限 TEE 且不记录日志，按能力优先排序合规

private-compliant.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]],
         ["is", "has_tee"], ["is", "no_log"]],
  ["field", "bench_intelligence"],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

弹性级联保留综合排名前 3（智能度 + 可靠性），构成回退级联可靠性

resilience-cascade.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]]],
  ["add",
    ["scale", 0.6, ["normalize", ["field", "bench_intelligence"]]],
    ["scale", 0.4, ["normalize", ["field", "success_rate"]]]],
  ["top_k", 3, ["argmax"]], ["id"], ["always", {"action": "next_candidate"}]]

← 返回文档