use case · reliability

Fallback is policy, not retry code.

Providers fail, rate-limit, and spike latency. When yours does, unhardcoded routes to the next model that still passes your policy — cheapest-first, no redeploy, and the user never sees it.

the problem

One pinned model is one point of failure.

If your app hardcodes a single model, every provider hiccup is a user-facing outage — and the only lever you have is a deploy.

It errors

A 500 or a 429 under load. The model is fine — you just can't reach it right now.

It hangs

The call times out at 504. To the user that is indistinguishable from a crash.

It degrades

Quality or availability dips for a region or a window. Nothing errors; responses just get worse.

the unhardcoded way

Fall back to the next model that still passes the policy.

The usual fix is a try/except and a hardcoded "if the primary fails, try this other model" branch — backup logic that rots in a different file and downgrades silently. unhardcoded already has a better answer: your policy defines a set of survivors, every model that clears the hard requirements, ranked by cost. Fallback isn't a separate code path; it's the next survivor in that same list. Same rules, no redeploy, decided at request time.

filter → rank → select

The router drops models that fail a hard requirement — context, tools, region, price ceiling, or the quality floor — then ranks the survivors by cost and selects the cheapest.

fallback advances in order

If the pick errors or times out, the router advances to the next-cheapest survivor — a model the same policy already approved. No new model is invented mid-request.

per step in workflows

In a workflow, fallback is per node. If one step's model fails, that step falls back without restarting the graph — and the whole run still writes one stitched trace.

See how the same survivors drive cost savings

the trace

The primary timed out. The user never knew.

When a fallback fires, the trace records it — a 504 becomes a line in the receipt instead of a support ticket. Figures below are illustrative.

trace · req-209174 200 OK

fallbackgemini-3.5-flash → claude-sonnet-4-6 · 504 timed out

selectedclaude-sonnet-4-6 · next eligible survivor

reasontools ✓, score 0.57 ≥ floor, next passing survivor

latency438 ms · incl. one failed primary

cost$0.041 · per run, not per token

policysupport · fingerprint 301140696-1054914287 · sigma-pol/v1

See how the trace records every fallback →

Fallback never silently drops below the floor.

A fallback hop is held to the same rules as the primary, so a weaker-but-cheaper model like deepseek-v4-flash stays filtered out — even in an outage — because it misses the quality floor (bench_intelligence, a 0..1 score).

If no survivor passes, the request fails loud instead of quietly answering with something worse. You can dry-run a policy to see which models would survive an incident before real traffic depends on it.

Make the outage a line in the trace.

Stop hand-wiring retry loops and stale backup-model branches. Let your policy carry the fallback order, keep it above the quality floor, and prove what happened on every run.

Read the docs

No SDK rewriteYour provider keysOpen policy_irEvery run traced