Product Use cases Compare Pricing Docs Blog
Read the docs Join the waitlist
use case · reliability

Fallback is policy, not retry code.

Providers fail, rate-limit, and spike latency. When yours does, unhardcoded routes to the next model that still passes your policy — cheapest-first, no redeploy, and the user never sees it.

the problem

One pinned model is one point of failure.

If your app hardcodes a single model, every provider hiccup is a user-facing outage — and the only lever you have is a deploy.

It errors

A 500 or a 429 under load. The model is fine — you just can't reach it right now.

It hangs

The call times out at 504. To the user that is indistinguishable from a crash.

It degrades

Quality or availability dips for a region or a window. Nothing errors; responses just get worse.

the unhardcoded way

Fall back to the next model that still passes the policy.

The usual fix is a try/except and a hardcoded "if the primary fails, try this other model" branch — backup logic that rots in a different file and downgrades silently. unhardcoded already has a better answer: your policy defines a set of survivors, every model that clears the hard requirements, ranked by cost. Fallback isn't a separate code path; it's the next survivor in that same list. Same rules, no redeploy, decided at request time.

1

filter → rank → select

The router drops models that fail a hard requirement — context, tools, region, price ceiling, or the quality floor — then ranks the survivors by cost and selects the cheapest.

2

fallback advances in order

If the pick errors or times out, the router advances to the next-cheapest survivor — a model the same policy already approved. No new model is invented mid-request.

3

per step in workflows

In a workflow, fallback is per node. If one step's model fails, that step falls back without restarting the graph — and the whole run still writes one stitched trace.

See how the same survivors drive cost savings
the trace

The primary timed out. The user never knew.

When a fallback fires, the trace records it — a 504 becomes a line in the receipt instead of a support ticket. Figures below are illustrative.

trace · req-209174 200 OK
fallbackgemini-3.5-flashclaude-sonnet-4-6 · 504 timed out
selectedclaude-sonnet-4-6 · next eligible survivor
reasontools ✓, score 0.57 ≥ floor, next passing survivor
latency438 ms · incl. one failed primary
cost$0.041 · per run, not per token
policysupport · fingerprint 301140696-1054914287 · sigma-pol/v1

See how the trace records every fallback →

Fallback never silently drops below the floor.

A fallback hop is held to the same rules as the primary, so a weaker-but-cheaper model like deepseek-v4-flash stays filtered out — even in an outage — because it misses the quality floor (bench_intelligence, a 0..1 score).

If no survivor passes, the request fails loud instead of quietly answering with something worse. You can dry-run a policy to see which models would survive an incident before real traffic depends on it.

Make the outage a line in the trace.

Stop hand-wiring retry loops and stale backup-model branches. Let your policy carry the fallback order, keep it above the quality floor, and prove what happened on every run.

No SDK rewriteYour provider keysOpen policy_irEvery run traced