Where model choice shouldn’t live in app code.
Use unhardcoded wherever model choice should not be buried in application code. The policy travels with the request, the cheapest model that passes answers, and the trace shows why — the mechanism is the same everywhere. Pick the place it bites.
Same policy, same trace. Pick where it hurts.
It’s the same policy and the same kind of trace every time — only the decision inside changes. Switch tabs to see it re-shot for each place.
Easy requests should not hit expensive models by default.
Set a quality floor and a price ceiling; the router takes the cheapest model that clears them — here gemini-3.5-flash at $0.018, not a pinned premium model.
The cheapest model that clears the floor wins — the trace shows the spread you’d otherwise overpay.
Provider failures should not become application logic.
Fallback is part of the policy, not retry code you maintain. When a model errors or times out, the router advances to the next one that still passes.
See the reliability use caseNo retry branch, no redeploy — the standby cascade is part of the policy, and every hop is in the trace.
Different users, plans, regions, and workloads need different model policies.
Your backend builds the policy per request, so each tenant carries its own floor, ceiling, and region rules — the same endpoint resolves to a different model.
See the tenant-rules use caseOne endpoint, one policy your backend builds per request — different rule, different model, no redeploy.
Agentic systems need routing rules that can change without redeploying the app.
A workflow is a bounded graph where each step carries its own policy, so every step routes — and fails over — on its own. Change the rules in the policy you send, not in code you ship.
See the workflow visualizerEach step carries its own policy and fails over on its own — stitched into one replayable trace.
Teams need to know why a model was used, not just what it returned.
Every run leaves a receipt: which models were accepted, rejected, used, or skipped, the rule that filtered each one, and a fingerprint to replay it.
See the audit use caseReplay it months later — same rule, same reasons, same outcome. That’s what makes it auditable.
The workload changes; the primitive does not. Policy is the thing you send — and the same trace proves every decision.
Move model choice out of app code, then send a policy.
Cost, reliability, tenant rules, workflows, audit — one control layer sits under all of them. Point your existing SDK at one endpoint, send a policy with the call, keep your provider keys, and get a trace for every decision.