Fallback is policy, not retry code.
Providers fail, rate-limit, and spike latency. When yours does, unhardcoded routes to the next model that still passes your policy — cheapest-first, no redeploy, and the user never sees it.
One pinned model is one point of failure.
If your app hardcodes a single model, every provider hiccup is a user-facing outage — and the only lever you have is a deploy.
It errors
A 500 or a 429 under load. The model is fine — you just can't reach it right now.
It hangs
The call times out at 504. To the user that is indistinguishable from a crash.
It degrades
Quality or availability dips for a region or a window. Nothing errors; responses just get worse.
Fall back to the next model that still passes the policy.
The usual fix is a try/except and a hardcoded "if the primary fails, try this other model" branch — backup logic that rots in a different file and downgrades silently. unhardcoded already has a better answer: your policy defines a set of survivors, every model that clears the hard requirements, ranked by cost. Fallback isn't a separate code path; it's the next survivor in that same list. Same rules, no redeploy, decided at request time.
filter → rank → select
The router drops models that fail a hard requirement — context, tools, region, price ceiling, or the quality floor — then ranks the survivors by cost and selects the cheapest.
fallback advances in order
If the pick errors or times out, the router advances to the next-cheapest survivor — a model the same policy already approved. No new model is invented mid-request.
per step in workflows
In a workflow, fallback is per node. If one step's model fails, that step falls back without restarting the graph — and the whole run still writes one stitched trace.
The primary timed out. The user never knew.
When a fallback fires, the trace records it — a 504 becomes a line in the receipt instead of a support ticket. Figures below are illustrative.
Fallback never silently drops below the floor.
A fallback hop is held to the same rules as the primary, so a weaker-but-cheaper model like deepseek-v4-flash stays filtered out — even in an outage — because it misses the quality floor (bench_intelligence, a 0..1 score).
If no survivor passes, the request fails loud instead of quietly answering with something worse. You can dry-run a policy to see which models would survive an incident before real traffic depends on it.
Make the outage a line in the trace.
Stop hand-wiring retry loops and stale backup-model branches. Let your policy carry the fallback order, keep it above the quality floor, and prove what happened on every run.