Model decision layer · OpenAI-compatible

Stop hardcoding model decisions.

Q: Do I have to rewrite my app?

No. unhardcoded is OpenAI-compatible. Point the SDK's baseURL at the endpoint, replace the model name with a policy:* name, and send the policy_ir with the call. Your messages and parameters pass through unchanged.

Q: What happens when a model fails?

The fallback step in your policy decides. By default the router moves to the next candidate under the floor, cheapest-first. Every hop — with latency, cost, and reason — is written to the trace.

Send a policy with each LLM call. unhardcoded routes to the cheapest model that passes your rules — over your own provider keys — and records why it chose.

Join the waitlist View a trace

No SDK rewriteYour provider keysAutomatic failoverAuditable trace

your app POST /v1 An OpenAI-compatible request

carries policy_ir cheapest · tools · score ≥ 0.5 sent with the call

routes unhardcoded filter · rank · select your keys

selects gemini-3.5-flash cheapest that passes $0.018

records trace req-204815 replayable

One request carries the rule, the route, and the receipt.

the problem

Model choice is frozen in code.

A pinned model name is a decision baked into your application — it can't adapt to the request in front of it, and no one can explain it later. It shows up as three production problems.

Cost

Easy requests still hit your most expensive model, every time.

Reliability

A provider outage turns into fallback logic you hand-write in app code.

Control

No one can say why a model was used for a given request, after the fact.

how it works

The policy travels with the request.

Four steps, the same every time.

No dashboard config and no redeploy. Your backend builds the rule, sends it with the call, and the router does the rest — the same way every time, with a record you can replay.

Built in your backend, per request — from your own tenants and tiers.
Validated and fingerprinted before it runs — unknown ops are rejected.
Replayable from the trace — reconstruct any decision months later.

1Generatea policy_ir from your tenants & tiersyour backend

2Send with the callone OpenAI-compatible request.create()

3Route by rulescheapest model that passesyour keys

4Trace the decisionevery candidate, the pick, any fallbackreplayable

policy_irthe whole decision, in one JSON rule

what you get

One decision layer, three production wins.

The same primitive — a policy sent with the call — answers each of the three problems above.

Spend less

Use the cheapest model that clears the floor, instead of your priciest model for every easy call. Cut spend →

Fail over safely

When a provider errors, advance to the next passing model — no retry code, no redeploy. Reliability →

Know why

Every run records which models passed, which were rejected and why, and what it cost. Traces →

Per-customer rules and multi-step workflows run on the same mechanism. See all use cases →

the proof

Every run leaves a receipt.

The trace is the receipt: the rule that was sent, every candidate it considered, the model that won, and why. Replay it any time.

POST api.unhardcoded.com/v1/chat/completions

Request · generated at runtime, sent with the callpolicy_ir

import OpenAI from "openai";

const client = new OpenAI({

baseURL: "https://api.unhardcoded.com/v1",

apiKey: process.env.UNHARDCODED_KEY,

});

// the rule your backend builds for this call

const policy_ir = ["policy", ["ev_zero"],

["and", ["meets_req"], ["has_cap","supports_tools"],

["cmp","bench_intelligence","ge",0.5],

["cmp","price_out","le",5]],

["neg",["field","price_out"]],

["argmax"], ["id"],

["always",{ action: "next_candidate" }]];

// reads as: of models with tools & score ≥ 0.5,

// pick the cheapest; if it fails, try the next.

const res = await client.chat.completions.create({

model: "policy:support", // free-form trace label

policy_ir, // ← sent with the request

messages,

});

policy_ir fingerprinted & validated before it runs

Response · routing decision 200 OK

deepseek-v4-flash score 0.42 · $0.011 below floor

mistral-small-4 no tools · $0.014 filtered

gemini-3.5-flash score 0.54 · $0.018 selected

claude-sonnet-4-6 score 0.57 · $0.041 over floor

gpt-5.5 score 0.60 · $0.063 over floor

claude-opus-4-8 score 0.59 · $0.121 over floor

cost$0.018 · ↓71% vs gpt-5.5

tracereq-204815

latency412 ms

failoverclaude-sonnet-4-6 → gpt-5.5 · standby cascade, not triggered this run

The policy travels with the request Demo trace using a fixed example request — no inference runs in your browser. Replay it any time.

Priced per run, your keys Open core, self-host free Where the decision lives, vs gateways

FAQbefore you wire it in

Questions, answered plainly.

Do I have to rewrite my app?

No. unhardcoded is OpenAI-compatible. Point the SDK's baseURL at the endpoint, replace the model name with a policy:* name, and send the policy_ir with the call. Your messages and parameters pass through unchanged.

Are you reselling tokens?

No. You bring your own provider keys and pay your providers directly for inference. We price the routing per run, never the tokens.

What happens when a model fails?

The fallback step in your policy decides. By default the router moves to the next candidate under the floor, cheapest-first. Every hop — with latency, cost, and reason — is written to the trace.

More on pricing, self-hosting, and the open core in the docs →

Stop hardcoding. Send the decision.

Point your SDK at one endpoint, keep your keys, and send your first policy. We route it through your providers and trace every decision.

No credit cardFree to startOpen core

Prefer to dig in first? Read the docs →