概念 & 配方

策略

路由如何決策：先過濾，再對存活者排序，不存在靜默降級。依目標分組的可複製策略預設。

路由語義

先過濾，再對存活者排序，不存在靜默降級。路由是確定性且有序的——成本上限與品質下限位於過濾器中，而非評分中，因此低價模型無法憑分數繞過您的規則。若未能通過門檻，該模型就根本不會成為候選模型。

先過濾

缺少所需能力、未達品質下限或超出價格上限的候選模型，將被排除——絕不會被悄悄替換。

對存活者排序

在通過門檻的模型中，評分器依序排列，通常以成本由低至高為優先。

選取排名靠前者

確定性地選出一個模型。相同策略、相同輸入、相同目錄，始終選出相同模型。

依序備援

若所選模型逾時或發生錯誤，路由器將依序切換至下一個通過篩選的候選模型（成本由低至高），並記錄每次切換。

逐步決策示例

一條策略（需要工具能力、bench_intelligence ge 0.5、成本擇優）作用於即時目錄（price_out 為每百萬輸出權杖的 USD 價格）。過濾器排除了兩個未達下限的模型；成本較低的存活者勝出：

modelprice_outintelverdict

deepseek-v4-flash$0.400.465未達下限

minimax-m2.7$0.500.496未達下限

deepseek-v4-pro$1.500.515勝出

glm-5.1$2.000.514高於下限

gpt-5.5$10.000.602高於下限

下限是約束，不是建議。路由在您的下限之內優化成本，而不會繞過下限。若沒有模型符合要求，請求將明確失敗（no_candidates）；您不會在未要求的情況下收到靜默降級的結果。項（term）在執行前會經過驗證並產生指紋（fingerprint），因此格式錯誤的策略會被拒絕（invalid_policy），而不是被錯誤路由。

在不產生推理成本的情況下預覽決策：POST /x/rank 會回傳排序後的存活者，以及每個模型被排除的規則。

觀察策略決策過程

同樣的原理在工作流程中體現：每個節點各自宣告策略——過濾器加排序規則。下方圖表展示了每一步如何排除未達標的候選模型，並鎖定成本較低的存活者（output 節點不宣告模型，僅回傳上一步的結果）。

策略預設

依目標分組的可複製路由模式。每個預設都是一個普通的 policy_ir：貼上後調整下限與上限即可使用。建議先用 POST /x/rank 進行演練。

智慧均衡能力與價格兼顧，無需複雜取捨成本 · 品質

smart-balance.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]]],
  ["add",
    ["scale", 0.6, ["normalize", ["field", "bench_intelligence"]]],
    ["scale", 0.4, ["neg", ["normalize", ["field", "price_out"]]]]],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

低價可用在智能下限內降低成本，unhardcoded 的核心機制成本

cheapest-decent.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]],
         ["cmp", "bench_intelligence", "ge", 0.5]],
  ["neg", ["normalize", ["field", "price_out"]]],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

僅免費僅限輸出成本為零的模型成本

free-only.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]], ["cmp", "price_out", "le", 0]],
  ["field", "bench_intelligence"],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

高智能優先不設成本上限，選取能力更強的模型，適用於關鍵任務品質

best-intelligence.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]]],
  ["field", "bench_intelligence"],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

僅推理模型僅限具備推理能力的模型品質

reasoning-only.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]], ["is", "cap_reasoning"]],
  ["field", "bench_intelligence"],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

視覺 · 成本優先支援圖像輸入，選取成本較低的模型能力

vision-cheapest.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]], ["is", "in_image"]],
  ["neg", ["normalize", ["field", "price_out"]]],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

長情境 RAG要求大型情境視窗，再從符合條件者中選取成本較低的模型能力

long-context-rag.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]],
         ["cmp", "context", "ge", 200000]],
  ["neg", ["normalize", ["field", "price_out"]]],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

結構化輸出依能力過濾後，在能力與成本間擇優能力

structured-output.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]],
         ["has_cap", "supports_json_mode"]],
  ["add",
    ["scale", 0.5, ["normalize", ["field", "bench_intelligence"]]],
    ["scale", 0.5, ["neg", ["normalize", ["field", "price_out"]]]]],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

代理程式機群工具呼叫能力更強，取排名前 5 的代理程式模型代理程式

agentic-fleet.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]],
         ["has_cap", "supports_tools"],
         ["cmp", "bench_agentic_rank", "le", 5]],
  ["field", "bench_agentic"],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

限價程式設計具備工具呼叫且有硬性輸出價格上限的前 5 名程式設計模型程式設計 · 成本

coding-under-a-cap.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]],
         ["has_cap", "supports_tools"],
         ["cmp", "bench_coding_rank", "le", 5],
         ["cmp", "price_out", "le", 5]],
  ["field", "bench_coding"],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

可重現抽樣用於集成多樣性的可重現隨機採樣代理程式

reproducible-sample.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]]],
  ["field", "bench_intelligence"],
  ["sample", 0.3], ["id"], ["always", {"action": "next_candidate"}]]

低延遲對話依延遲過濾，再在速度與能力間擇優延遲

low-latency-chat.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]],
         ["cmp", "latency_ms", "le", 2000]],
  ["add",
    ["scale", 0.7, ["neg", ["normalize", ["field", "latency_ms"]]]],
    ["scale", 0.3, ["normalize", ["field", "bench_intelligence"]]]],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

私密 / 合規僅限 TEE 且不記錄日誌，依能力優先排序合規

private-compliant.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]],
         ["is", "has_tee"], ["is", "no_log"]],
  ["field", "bench_intelligence"],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

彈性串聯備援保留綜合排名前 3（智能度 + 可靠性），構成備援串聯可靠性

resilience-cascade.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]]],
  ["add",
    ["scale", 0.6, ["normalize", ["field", "bench_intelligence"]]],
    ["scale", 0.4, ["normalize", ["field", "success_rate"]]]],
  ["top_k", 3, ["argmax"]], ["id"], ["always", {"action": "next_candidate"}]]

← 返回文件