use case · routing traces

Know why every model decision happened.

Logs tell you which model answered. The trace tells you why it was chosen — the policy sent, every candidate considered, which models were rejected and by which rule, and the fingerprint to replay it months later.

the problem

Your logs show what answered, not why it was chosen.

A line in your logs says gemini-3.5-flash returned a reply. It doesn't say which models were in the running, which rule dropped the cheaper one, whether a fallback fired, or what the call would have cost on your old baseline. When a finance lead asks "why did this run cost what it did," or a customer disputes an answer, you're reconstructing the decision from memory and scattered config.

the unhardcoded way

Every run writes a replayable receipt.

The router applies your policy — filter → rank → select → fallback — and records the whole decision: the policy sent, the candidates it considered, every model it rejected and the exact rule that rejected it, the winner and why it won, any fallback hop, cost versus your baseline, latency, and the fingerprint to replay it. Nothing is implicit. The trace is the proof that the rules were followed.

trace · req_8f41c2 200 OK

selectedgemini-3.5-flash · score 0.54

reasontools ✓, over quality floor, price ceiling passed, cheapest survivor

rejecteddeepseek-v4-flash filtered: below quality floor (score 0.42 < 0.5)

rejectedmistral-small-4 filtered: no tools

fallbackclaude-sonnet-4-6 → gpt-5.5 · standby cascade, not triggered this run

latency412 ms

cost$0.018 · ↓71% vs gpt-5.5 baseline ($0.063) · illustrative

policysupport · fingerprint 1734059821-988143307 · sigma-pol/v1

Every rule on the receipt points at a catalog field you can inspect — the quality floor is bench_intelligence, not a black box. The same receipt is written whether the run was one call or a 5-node workflow that stitches one trace across every step. Read the trace schema →

who uses it

One receipt, four different questions.

The trace is the same object for everyone who needs to answer for a model decision — they just read different rows of it.

Engineering · debug a bad answer

A reply looks wrong. Open its trace and see which model produced it, which rule let that model through, and whether a fallback quietly changed the pick — instead of guessing from logs.

Finance · cost review

Every run records its cost against the baseline it's measured on. Roll the traces up to see what routing to the cheapest passing model actually saved — per workload, with the rejected pricier candidates listed.

Enterprise · governance & audit

Show an auditor the rule that was enforced on a request, the fingerprint of the policy that produced it, and the models that were excluded by region or capability — with no after-the-fact reconstruction.

Support · investigate a reply

A customer disputes a specific answer. Pull that one request's trace by its id and walk the decision: policy sent, winner, why each alternative was filtered, and what it cost.

replayable

Reconstruct the decision months later.

Because the policy is a fingerprinted object, a trace isn't just a log line — it's enough to rebuild the run. The fingerprint 1734059821-988143307 (sigma-pol/v1) pins the exact rule that was sent, so you can re-evaluate it against the catalog as it stood, dry-run it with POST /x/rank and GET /x/fields, and confirm the same model still passes — no inference, no production traffic. The decision is auditable because it's reproducible.

Pair traces with cost control to prove the savings are real, not estimated — every run carries the baseline it beat.

Make every model decision answer for itself.

A replayable trace on every run — the policy sent, the rejected candidates and the rule, the winner and its cost versus baseline. Join the waitlist and put the decision on the record.

No SDK rewriteYour provider keysEvery request traced