개념 & 레시피

정책

라우팅 결정 방식: 먼저 필터링, 생존자 순위 지정, 조용한 다운그레이드 없음. 목표별로 그룹화된 복사 가능한 프리셋 포함.

라우팅 시맨틱

먼저 필터링. 생존자 순위 지정. 조용한 다운그레이드 없음. 라우팅은 결정론적이고 순서가 있으며, 비용 상한과 품질 최솟값은 점수가 아닌 필터에 있어서 저렴한 모델이 절대 규칙을 이기며 우선순위를 가질 수 없습니다. 기준을 통과하지 못하면 후보 자체가 아닙니다.

먼저 필터링

필요한 기능이 없거나 품질 기준 미달이거나 가격 상한을 초과한 후보는 제거됩니다 — 조용히 대체되는 일은 없습니다.

생존자 순위 지정

기준을 통과한 모델 중에서 채점기가 순위를 매깁니다. 보통 가장 저렴한 것이 우선입니다.

상위 선택

하나의 모델이 결정론적으로 선택됩니다. 동일한 정책, 입력값, 카탈로그는 항상 동일한 모델을 선택합니다.

순서대로 폴백

선택된 모델이 타임아웃되거나 오류가 발생하면 라우터는 통과한 다음 후보로 이동하며 (가장 저렴한 순서로) 모든 이동을 기록합니다.

단계별 결정

도구, bench_intelligence ge 0.5, 최저 비용 조건이 담긴 하나의 정책이 실시간 카탈로그에 적용됩니다 (price_out은 백만 출력 토큰당 USD). 필터가 기준 미달인 두 모델을 탈락시키고, 가장 저렴한 생존자가 선택됩니다:

modelprice_outintelverdict

deepseek-v4-flash$0.400.465기준 미달

minimax-m2.7$0.500.496기준 미달

deepseek-v4-pro$1.500.515선택됨

glm-5.1$2.000.514기준 초과

gpt-5.5$10.000.602기준 초과

기준은 보장이지, 권고가 아닙니다. 라우팅은 기준 내에서 비용을 최적화하며, 절대 기준을 우회하지 않습니다. 요건을 충족하는 모델이 없으면 요청은 명시적으로 실패합니다 (no_candidates). 요청하지 않은 조용한 다운그레이드는 절대 발생하지 않습니다. 항(term)은 실행 전에 검증되고 지문이 부여되므로, 잘못된 정책은 잘못 라우팅되는 대신 거부됩니다 (invalid_policy).

추론 비용 없이 결정을 미리 확인하세요: POST /x/rank는 순위가 매겨진 생존자와 각 모델이 탈락한 규칙을 반환합니다.

정책 결정 과정 살펴보기

워크플로우 내에서 같은 원리가 적용됩니다: 각 노드는 자체 정책(필터와 순위 기준)을 선언합니다. 아래 다이어그램에서 각 단계가 기준에 미달하는 항목을 탈락시키고 가장 저렴한 생존자를 확정하는 과정을 확인하세요 (output 노드는 모델을 선언하지 않고 마지막 단계의 결과를 반환합니다). 다이어그램의 모델 ID와 비용은 실제 카탈로그 데이터입니다.

정책 프리셋

목표별로 그룹화된 복사 가능한 라우팅 패턴. 각각은 일반적인 policy_ir입니다: 붙여넣고 기준과 상한을 조정한 뒤 배포하세요. 먼저 POST /x/rank로 드라이런하세요.

스마트 균형기능과 가격의 균형, 복잡한 사고 불필요비용 · 품질

smart-balance.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]]],
  ["add",
    ["scale", 0.6, ["normalize", ["field", "bench_intelligence"]]],
    ["scale", 0.4, ["neg", ["normalize", ["field", "price_out"]]]]],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

가장 저렴한 적정 품질지능 기준 아래에서 비용을 낮춥니다. unhardcoded의 핵심 훅비용

cheapest-decent.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]],
         ["cmp", "bench_intelligence", "ge", 0.5]],
  ["neg", ["normalize", ["field", "price_out"]]],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

무료 전용출력 비용이 0인 모델만비용

free-only.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]], ["cmp", "price_out", "le", 0]],
  ["field", "bench_intelligence"],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

최고 지능비용 상한 없이 가장 유능한 모델, 중요한 작업용품질

best-intelligence.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]]],
  ["field", "bench_intelligence"],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

추론 전용추론 기능이 있는 모델만품질

reasoning-only.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]], ["is", "cap_reasoning"]],
  ["field", "bench_intelligence"],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

비전 · 최저 비용이미지 입력, 가장 저렴한 모델기능

vision-cheapest.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]], ["is", "in_image"]],
  ["neg", ["normalize", ["field", "price_out"]]],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

장문 컨텍스트 RAG큰 컨텍스트 윈도우가 필요할 때, 그다음 가장 저렴한 모델기능

long-context-rag.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]],
         ["cmp", "context", "ge", 200000]],
  ["neg", ["normalize", ["field", "price_out"]]],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

구조화된 출력기능으로 필터링한 뒤 능력과 비용의 균형기능

structured-output.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]],
         ["has_cap", "supports_json_mode"]],
  ["add",
    ["scale", 0.5, ["normalize", ["field", "bench_intelligence"]]],
    ["scale", 0.5, ["neg", ["normalize", ["field", "price_out"]]]]],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

에이전틱 플릿도구 사용 최강, 상위 5개 에이전틱에이전트

agentic-fleet.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]],
         ["has_cap", "supports_tools"],
         ["cmp", "bench_agentic_rank", "le", 5]],
  ["field", "bench_agentic"],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

비용 상한 코딩도구와 하드 출력 가격 상한이 있는 상위 5개 코딩코딩 · 비용

coding-under-a-cap.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]],
         ["has_cap", "supports_tools"],
         ["cmp", "bench_coding_rank", "le", 5],
         ["cmp", "price_out", "le", 5]],
  ["field", "bench_coding"],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

재현 가능한 샘플앙상블 다양성을 위한 재현 가능한 확률적 선택에이전트

reproducible-sample.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]]],
  ["field", "bench_intelligence"],
  ["sample", 0.3], ["id"], ["always", {"action": "next_candidate"}]]

저지연 채팅지연 시간으로 필터링 후 빠름과 능력의 균형지연 시간

low-latency-chat.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]],
         ["cmp", "latency_ms", "le", 2000]],
  ["add",
    ["scale", 0.7, ["neg", ["normalize", ["field", "latency_ms"]]]],
    ["scale", 0.3, ["normalize", ["field", "bench_intelligence"]]]],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

프라이빗 / 규정 준수TEE 전용 및 로그 없음, 기능 우선규정 준수

private-compliant.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]],
         ["is", "has_tee"], ["is", "no_log"]],
  ["field", "bench_intelligence"],
  ["argmax"], ["id"], ["always", {"action": "next_candidate"}]]

복원력 캐스케이드폴백 캐스케이드로 상위 3개(지능 + 신뢰성)를 유지안정성

resilience-cascade.json

["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]]],
  ["add",
    ["scale", 0.6, ["normalize", ["field", "bench_intelligence"]]],
    ["scale", 0.4, ["normalize", ["field", "success_rate"]]]],
  ["top_k", 3, ["argmax"]], ["id"], ["always", {"action": "next_candidate"}]]

← 문서로 돌아가기