how it works

Your policy decides which model runs.

Q: Do I need to rewrite my app?

No. unhardcoded is OpenAI-compatible. Point the SDK's baseURL at the endpoint, replace the model name with a policy:* name, and send the policy_ir with the call. Your messages and parameters pass through unchanged.

Q: Can I self-host?

Yes. The policy_ir and flow_ir language, its canonical encoding, and the reference interpreter are open source, so you can run the decision layer yourself. The managed host adds provider modules, the live catalog, key management, traces, and failover.

Q: What happens when a model fails?

The fallback step in your policy decides. By default the router advances to the next survivor under the floor, cheapest-first, without a redeploy. Every hop — with latency, cost, and reason — is written to the trace.

Most routers hide the interesting part: why a model was chosen. unhardcoded makes that decision explicit. Each request carries a policy; the router applies it at runtime, selects the cheapest model that passes, and records the decision as a replayable trace.

the mechanism

One request, one explicit decision.

unhardcoded is not a black box that decides for you. The policy travels with the request, the router applies it at runtime, the call runs on your provider keys, and the decision is written down. Five beats, in order.

The policy travels with the request

Your backend builds the routing rule and sends it on one OpenAI-compatible call — no dashboard, no redeploy. The router validates it against the grammar and fingerprints it. In the wire format that rule is a policy_ir (sigma-pol/v1).

The router filters and ranks the catalog

It evaluates the rule over the live catalog and routes to the cheapest model that passes. Hard requirements drop the rest; survivors are ranked by what you optimize for, usually cost.

filterrankselectfallback

Execution runs on your provider keys

The selected model is called with your own keys. unhardcoded never resells tokens — you pay providers directly for inference and pay us per run. If the pick errors or times out, the router advances to the next survivor under the same rule.

The trace records the decision

Every run writes a receipt: the policy that was sent, the candidates, accept and reject reasons, the fallback path, the model used, latency, and cost. The decision is explicit, not inferred. See it below.

Replay makes the decision auditable

The trace carries the policy fingerprint, so any run replays months later — same rule, same reasons, same outcome. That is what makes routing here auditable rather than opaque.

Read the policy_ir grammar in the docs

when a request is a graph

Each step makes its own decision.

Some tasks are one call; some are a bounded graph of steps — triage, draft, verify, synthesize. The mechanism is the same at each node: a step carries its own policy and resolves to the cheapest model that passes. Cheap long context here, a stronger model where it matters.

lineartriage → draft → guard · cheap where it can be, a strict final gate before send

Ticket

triageclassify · extract account id · cheapest JSONgemini-3.5-flash

draftreply from ticket + triage · quality floorclaude-sonnet-4-6

guardbrand voice · PII · refund limits · no-logclaude-opus-4-8can abort

Each step routes independentlyEvery node carries its own policy and resolves over the live catalog on its own.

Failover is per stepIf a node's model errors or times out, that node falls back without restarting the workflow.

One stitched traceThe whole graph writes a single replayable record — every node's decision, in order.

billing A workflow counts as one billable run, up to 5 nodes.

Read the flow_ir docs

the proof object

This is the decision, written down.

Here is the receipt from a single run. The selected model, why it won, the fallback path that stood ready, the cost per run, and the policy fingerprint that replays it. Nothing about the choice is hidden.

trace · req-204815 200 OK

selectedgemini-3.5-flash

reasontools ✓, price ceiling passed, cheapest survivor

fallbackclaude-sonnet-4-6 → gpt-5.5 · standby cascade, not triggered this run

latency412 ms

cost$0.018 · per run, not per token

policysupport · fingerprint 301140696-1054914287 · sigma-pol/v1

A workflow stitches one trace across every node — so a replay shows each step's decision, in order. Read the trace schema →

FAQbefore you wire it in

Questions, answered plainly.

Do I need to rewrite my app?

No. unhardcoded is OpenAI-compatible. Point the SDK's baseURL at the endpoint, replace the model name with a policy:* name, and send the policy_ir with the call. Your messages and parameters pass through unchanged.

Do I need to give you provider keys?

You bring your own provider keys and pay your providers directly for inference. The managed host uses them to execute the call the policy selects; we never resell tokens and price the routing per run.

Can I self-host?

Yes. The policy_ir and flow_ir language, its canonical encoding, and the reference interpreter are open source, so you can run the decision layer yourself. The managed host adds provider modules, the live catalog, key management, traces, and failover.

What happens when a model fails?

The fallback step in your policy decides. By default the router advances to the next survivor under the floor, cheapest-first, without a redeploy. Every hop — with latency, cost, and reason — is written to the trace.

How is this different from a model router or orchestrator?

A normal router hides the decision. unhardcoded makes the decision explicit. You send the policy, unhardcoded executes it, and the trace shows why each model was accepted, rejected, or used.

How do I know why a model was chosen?

Every run produces a trace. The trace shows the policy, candidate models, accept and reject reasons, the fallback path, and the final model used.

More on pricing, self-hosting, and the open core in the docs →

open core

A decision layer can't be a black box if you can read it. The policy_ir and flow_ir language, its canonical encoding, and the reference interpreter are open source — the language is yours, the upkeep is ours. The managed host runs the provider modules, live catalog, key management, traces, and failover, priced per run.

See the open core

That's the mechanism. Now wire it in.

Point your existing SDK at one endpoint, send a policy with the call, keep your provider keys, and get a trace for every decision. The docs take it from here — quickstart, the OpenAI-compatible API, the full grammar, and copyable presets.

Read the docs

No SDK rewriteYour provider keysOpen policy_irEvery run traced