Product Use cases Compare Pricing Docs Blog
Read the docs Join the waitlist
Model decision layer · OpenAI-compatible

Stop hardcoding model decisions.

Send a policy with each LLM call. unhardcoded routes to the cheapest model that passes your rules — over your own provider keys — and records why it chose.

No SDK rewriteYour provider keysAutomatic failoverAuditable trace
your app POST /v1 An OpenAI-compatible request
carries policy_ir cheapest · tools · score ≥ 0.5 sent with the call
routes unhardcoded filter · rank · select your keys
selects gemini-3.5-flash cheapest that passes $0.018
records trace req-204815 replayable

One request carries the rule, the route, and the receipt.

the problem

Model choice is frozen in code.

A pinned model name is a decision baked into your application — it can't adapt to the request in front of it, and no one can explain it later. It shows up as three production problems.

Cost

Easy requests still hit your most expensive model, every time.

Reliability

A provider outage turns into fallback logic you hand-write in app code.

Control

No one can say why a model was used for a given request, after the fact.

how it works

The policy travels with the request.

Four steps, the same every time.

No dashboard config and no redeploy. Your backend builds the rule, sends it with the call, and the router does the rest — the same way every time, with a record you can replay.

  • Built in your backend, per request — from your own tenants and tiers.
  • Validated and fingerprinted before it runs — unknown ops are rejected.
  • Replayable from the trace — reconstruct any decision months later.
1Generatea policy_ir from your tenants & tiersyour backend
2Send with the callone OpenAI-compatible request.create()
3Route by rulescheapest model that passesyour keys
4Trace the decisionevery candidate, the pick, any fallbackreplayable
policy_irthe whole decision, in one JSON rule
what you get

One decision layer, three production wins.

The same primitive — a policy sent with the call — answers each of the three problems above.

Spend less

Use the cheapest model that clears the floor, instead of your priciest model for every easy call. Cut spend →

Fail over safely

When a provider errors, advance to the next passing model — no retry code, no redeploy. Reliability →

Know why

Every run records which models passed, which were rejected and why, and what it cost. Traces →

Per-customer rules and multi-step workflows run on the same mechanism. See all use cases →

the proof

Every run leaves a receipt.

The trace is the receipt: the rule that was sent, every candidate it considered, the model that won, and why. Replay it any time.

POST api.unhardcoded.com/v1/chat/completions
Request · generated at runtime, sent with the callpolicy_ir
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.unhardcoded.com/v1",
apiKey: process.env.UNHARDCODED_KEY,
});
// the rule your backend builds for this call
const policy_ir = ["policy", ["ev_zero"],
["and", ["meets_req"], ["has_cap","supports_tools"],
["cmp","bench_intelligence","ge",0.5],
["cmp","price_out","le",5]],
["neg",["field","price_out"]],
["argmax"], ["id"],
["always",{ action: "next_candidate" }]];
// reads as: of models with tools & score ≥ 0.5,
// pick the cheapest; if it fails, try the next.
const res = await client.chat.completions.create({
model: "policy:support", // free-form trace label
policy_ir, // ← sent with the request
messages,
});
policy_ir fingerprinted & validated before it runs
Response · routing decision 200 OK
deepseek-v4-flash score 0.42 · $0.011 below floor
mistral-small-4 no tools · $0.014 filtered
gemini-3.5-flash score 0.54 · $0.018 selected
claude-sonnet-4-6 score 0.57 · $0.041 over floor
gpt-5.5 score 0.60 · $0.063 over floor
claude-opus-4-8 score 0.59 · $0.121 over floor
cost$0.018 · ↓71% vs gpt-5.5
tracereq-204815
latency412 ms
failoverclaude-sonnet-4-6 → gpt-5.5 · standby cascade, not triggered this run
The policy travels with the request Demo trace using a fixed example request — no inference runs in your browser. Replay it any time.
FAQbefore you wire it in

Questions, answered plainly.

Do I have to rewrite my app?
No. unhardcoded is OpenAI-compatible. Point the SDK's baseURL at the endpoint, replace the model name with a policy:* name, and send the policy_ir with the call. Your messages and parameters pass through unchanged.
Are you reselling tokens?
No. You bring your own provider keys and pay your providers directly for inference. We price the routing per run, never the tokens.
What happens when a model fails?
The fallback step in your policy decides. By default the router moves to the next candidate under the floor, cheapest-first. Every hop — with latency, cost, and reason — is written to the trace.

More on pricing, self-hosting, and the open core in the docs →

Stop hardcoding. Send the decision.

Point your SDK at one endpoint, keep your keys, and send your first policy. We route it through your providers and trace every decision.

No credit cardFree to startOpen core

Prefer to dig in first? Read the docs →