Ship a policy in five minutes.
unhardcoded is OpenAI-compatible. Keep your SDK, change the baseURL, generate a policy in your backend, and send it with the call.
Three steps to your first routed call.
No app rewrite, no server config. Install the SDK, repoint the client, build a policy in your backend, and attach it to a single create() call. The host admits the term, resolves it over the live catalog, and traces the decision.
BEFORE YOU SEND TRAFFIC
- Create or receive your unhardcoded workspace.
- Add your provider keys (OpenAI, Anthropic, Gemini, …) during onboarding — inference always runs over your own accounts.
- Create an unhardcoded API key.
- Use that key as the
apiKeyin the OpenAI-compatible client below.
During early access, provider keys are configured with you during onboarding.
Install
Keep the OpenAI SDK you already use. The policy_ir is plain JSON — no extra package required.
$ npm i openai
Point the client at the endpoint
Same SDK, one new baseURL. Your messages and parameters pass through unchanged.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.unhardcoded.com/v1",
apiKey: process.env.UNHARDCODED_KEY,
});
Build a policy and send it with the call
Generate the policy_ir term at request time, then attach it to create(). The router picks the cheapest model that passes your rules. The model field is a free-form label used only to group traces — routing comes from the attached policy_ir, so any string works.
// built in your backend, at request time — a plain JSON term
const policy_ir = [
"policy",
["and", ["meets_req"], ["not", ["is", "disabled"]], ["is", "cap_tools"],
["cmp", "bench_intelligence", "ge", 0.9]], // filter
["neg", ["normalize", ["field", "price_out"]]], // cheapest survivor
["argmax"], ["id"],
["always", { action: "next_candidate" }],
];
const res = await client.chat.completions.create({
model: "policy:support", // free-form trace label, not a route
policy_ir,
messages,
});
The response carries the decision — which model was selected, why, the cost, and a replayable trace id:
{
"selected": "gemini-3.5-flash",
"reason": "cheapest passing candidate",
"policy": "ir_8f41",
"cost": "$0.018",
"trace": "req_8f41"
}
One OpenAI-compatible surface.
An OpenAI-compatible completions endpoint, dry-run helpers, a field-schema lookup, and one header. Carry a policy_ir or flow_ir in the request body; inference runs over the provider keys you configured on the host.
- POST /v1/chat/completions
- OpenAI-compatible completions. Carry
policy_irorflow_irin the body; everything else is the standard request shape. The router resolves the model over the live catalog and writes the decision to the trace. - POST /x/rank
- dry-runReturns the candidate ranking and per-model verdicts without running inference. Use it to preview which models clear the floor and why before a single token is spent.
- POST /x/policy/normalize
- dry-runAdmits a
policy_irand returns its canonical form, contentfingerprint, and grammarversion— identify and cache a term without running it. - POST /x/flow/normalize
- dry-runAdmits and identifies a
flow_ir— the bounded DAG and each node's policy — before use, so a malformed workflow fails fast instead of mid-run. - GET /x/fields
- Returns the live field vocabulary — the core fields plus this host's registered extensions — that policies gate on with
cmp/isand score on withfield. The source of truth for valid field names (e.g.price_out,context,bench_intelligence). - Authorization: Bearer <key>
- Authenticate every request with your unhardcoded key. The key identifies your workspace and its trace history — never a provider account.
Note: BYO provider keys are configured on the host. unhardcoded routes through your own OpenAI, Anthropic, Gemini, and other accounts — it does not resell tokens or mark up inference, and it never bills the model spend.
The raw policy_ir term.
The real interface is the term itself: a plain JSON array you inspect, hash, log, and replay. A policy is a six-element array — ["policy", filter, rank, select, mutate, fallback]. You author the filter, rank, and select; keep mutate (["id"]) and fallback (["always", {"action":"next_candidate"}]) as-is. Hover a verb to see where it lives in the term.
[ "policy", // filter — the gate; narrows the host floor, never widens it ["and", ["meets_req"], ["not",["is","disabled"]], ["is","cap_tools"], ["cmp","bench_intelligence","ge",0.5], ["cmp","price_out","le",5]], // rank — score the survivors, cheapest wins ["neg",["normalize",["field","price_out"]]], // select — take the single top of the ranked list ["argmax"], // mutate — pass the request through unchanged ["id"], // fallback — on any failure, next candidate ["always",{"action":"next_candidate"}] ]
The term is just data — built in your backend, sent with the call. A typed builder (@unhardcoded/policy / buildPolicy(...)) is planned convenience sugar over this language — it will lower a compact spec to the same array. It is not shipping yet; today the raw term above is the interface, and every preset below is one of these arrays you can copy as-is.
The full sigma-pol/v2 operator vocabulary, grouped by position. These are the only ops — there is no filter/rank/select/fallback wrapper.
- filter — predicates
and/or/not— boolean combinators ·is <flag>— a model capability flag (e.g.cap_tools) ·cmp <field> <op> <value>— a numeric bound (ge,le, …) ·meets_req— the request's own implied requirements. Returns the surviving candidates.- rank — scorers
field <name>— a raw catalog field ·normalize— scale a sub-score to 0–1 ·neg— invert (cheapest, lowest-latency) ·scale <w>— weight a sub-score ·add— sum weighted sub-scores. Produces one score per survivor.- select — selectors
argmax— the single highest-scoring model ·top_k <n> <sel>— keep the top n as an ordered cascade ·sample <t>— a reproducible stochastic pick.- mutate — request mutators
id— pass the request through unchanged (the default). Request-mutators such asclamp_paramalso exist for shaping parameters on the chosen call.- fallback
always {"action":"next_candidate"}— on a provider failure, advance to the next survivor in the cascade.
Filter first. Rank survivors. No silent downgrade.
Routing is deterministic and ordered. Spend ceilings and quality minimums live in the filter, not the score — a cheap model can never win on points against your rules. If it does not clear the floor, it is not a candidate at all.
Filter first
Candidates that lack a required capability, miss the quality floor, or exceed the price ceiling are eliminated — removed, never silently substituted.
Rank the survivors
Among the models that cleared the floor, the scorer orders them — usually cheapest-first.
Select the top
One model, chosen deterministically — the cheapest that passed.
Fall back in order — and the floor is a guarantee, not a suggestion.
If the selected model times out or errors, the router moves to the next passing candidate, cheapest-first, and every hop is recorded. It optimizes cost beneath your floor, never around it. If no model meets the requirements, the request fails loudly — you never get a silent downgrade you didn't ask for.
Change the rule. Watch the winner move.
Toggle the rules below. The router filters the live catalog, then ranks the survivors cheapest-first — the cheapest model that clears every rule wins. Tighten the floor past what any model can meet and the call fails loudly rather than downgrading.
Illustrative catalog and figures. Against the live catalog, POST /x/rank returns the same per-model verdicts without running inference. See the worked example →
The trace object.
Every completion carries a trace: a structured record of how the model was chosen. The trace is what makes every routing decision replayable and auditable — the same term, the same catalog snapshot, the same verdicts.
- trace
- The replayable identifier/fingerprint for the run — e.g.
"req_8f41c2". - policy.fingerprint
- Content hash of the normalized
policy_ir. Identical terms share a fingerprint. - policy.version
- The grammar version —
"sigma-pol/v2". - selected
- The chosen model id.
- candidates[]
- One entry per considered model:
{ model, status: "winner" | "passed" | "rejected", dropped_by: "<rule that eliminated it>" | null }. A rejected model carries the exact rule that dropped it; a passing model carriesnull. - reason
- Human-readable why-this-model.
- fallback[]
- Ordered hops taken on failure:
{ from, to, cause }. An empty array if the first pick succeeded. - latency_ms
- Total routing + inference latency.
- cost
- USD billed for the run.
- usage
{ prompt_tokens, completion_tokens }— informational only; billing is per run, not per token.- created
- ISO timestamp.
To preview the verdicts before spending a token, dry-run with POST /x/rank. The router filters first, then ranks the survivors — a failing model is dropped, never silently substituted. Here a policy requires tool use and a quality floor (cap_tools AND bench_intelligence ge 0.5) and ranks cheapest:
{
"selected": "gemini-3.5-flash",
"candidates": [
{ "model": "gemini-3.5-flash", "passed": true, "status": "winner", "price_out": 0.30, "dropped_by": null,
"note": "meets floor · cheapest survivor" },
{ "model": "mistral-small-4", "passed": true, "status": "passed", "price_out": 0.35, "dropped_by": null,
"note": "meets floor" },
{ "model": "claude-sonnet-4-6", "passed": true, "status": "passed", "price_out": 3.00, "dropped_by": null,
"note": "meets floor · not cheapest" },
{ "model": "gemini-3.1-flash-lite", "passed": false, "status": "rejected", "dropped_by": "is cap_tools",
"note": "no tool support" },
{ "model": "tiny-draft-1", "passed": false, "status": "rejected", "dropped_by": "cmp bench_intelligence ge 0.5",
"note": "below quality floor" }
]
}
Filter first, then rank survivors. Each rejected model names the exact rule that eliminated it in dropped_by — no silent substitution, no hidden downgrade.
Compose policies into a flow.
Not every task is one call. A flow is a bounded, acyclic graph of LLM steps — ["flow", { id: node, … }] — with exactly one input node and one output node. Each llm node carries its own system prompt and a full policy_ir term, so every step routes independently at request time over the live catalog. You send the whole graph as flow_ir on a single call, and it writes one stitched, replayable trace.
Edges are pull-model: a node's inputs list names the nodes it consumes, so "b": { inputs: ["a"] } means a → b — a node with two or more inputs is a fusion step, and there are no loops or conditionals. Switch the examples below; each box is an llm node with its own policy and model, and the dashed ends are the input / output nodes.
That graph is just data. Below is a flow written out as flow_ir — the same input / llm / output nodes, the same inputs edges, each llm node carrying its own policy:
[
"flow",
{
"u": { "kind": "input" },
"draft": { "kind": "llm",
"system": "Draft an answer.",
"policy": [ "policy", … ], // a full policy_ir term — cheapest survivor
"inputs": [ "u" ] },
"critique": { "kind": "llm",
"system": "List the concrete flaws in the draft.",
"policy": [ "policy", … ], // strongest model, no cost ceiling
"inputs": [ "draft" ] },
"revise": { "kind": "llm",
"system": "Rewrite the answer, fixing every point.",
"policy": [ "policy", … ], // strongest model
"inputs": [ "u", "draft", "critique" ], // fan-in: three predecessors
"template": "Q:\n$1\n\nDraft:\n$2\n\nCritique:\n$3" },
"out": { "kind": "output", "inputs": [ "revise" ] }
}
]
Each policy is a full policy_ir term — the same six-element array from The SDK above, elided here as …. Copyable end-to-end flows are in Presets below.
A flow has three node kinds. Author the llm nodes; input and output are the single entry and exit.
- input
- The single entry node —
{ "kind": "input" }. The call'smessagesenter the graph here. Exactly one per flow. - llm
- A routed step.
system— its prompt ·policy— a fullpolicy_irterm, resolved over the live catalog at runtime ·inputs— the node ids it consumes ·template(optional) — joins multiple inputs with$1, $2, …placeholders in input order. Each node routes — and fails over — on its own. - output
- The single exit node —
{ "kind": "output", "inputs": ["…"] }. Itsinputsname the node whose result is returned to the caller. Exactly one per flow. - edges & shape
- Pull-model:
inputsdefines the DAG ("b": { inputs: ["a"] }isa → b). Fan-out is one node feeding several; fan-in is several feeding one fusion node. Acyclic, bounded, no loops or conditionals — every node runs once, so cost and latency are knowable up front.
Send a flow as flow_ir on any completion; dry-run with POST /x/flow/normalize to admit the DAG and each node's policy before a single token is spent. Limits: ≤ 256 nodes, in-degree ≤ 32. Billing counts a flow as one run up to 5 nodes, and the whole graph writes one stitched trace — see the trace object. For the picture, the workflow diagrams on the product page show three flow shapes end to end.
Copy a term, send it.
Ready-to-send policies and workflows. Each is a piece of data — drop a policy into policy_ir or a flow into flow_ir on any OpenAI-compatible call, and the host admits it, hashes it, and interprets it deterministically over the live catalog. Dry-run first with POST /x/rank for policies or POST /x/flow/normalize for flows.
Explore the interactive policy playground →
A policy is a six-element array — ["policy", filter, rank, select, mutate, fallback] — where you author the filter, rank, and select, and keep mutate (["id"]) and fallback (["always", {"action":"next_candidate"}]) as-is. Survivors are scored over raw catalog fields with field / normalize / neg. Spend ceilings ride as a cmp over price_* in the filter — never trusted to the score. Every term can only narrow the host floor, never widen it.
More starting points
Workflows · flow_ir
Six ready-to-send flows — each a bounded DAG where every node carries its own policy. The full flow_ir grammar is in Flows above; the complete term on each card is folded, so open it to copy. Dry-run with POST /x/flow/normalize first.
Common failures.
What each one means and what to change. Dry-run with POST /x/rank to see the decision before it costs anything.
- No model passes the floor
- Your filter excluded every candidate, so the request fails loudly instead of silently downgrading. Loosen a
cmpbound or a hard["is", …], and run/x/rankto see which rule dropped each model. - Unknown field or operator
- Admission rejects a term that names a field the host doesn't serve or an op the interpreter doesn't know (
invalid_policy). Use a real field fromGET /x/fields— e.g.price_out, notprice. - Provider key missing
- The chosen model's provider isn't configured on your workspace, so the upstream call can't authenticate. Add the provider key during onboarding — inference always runs over your own accounts.
- Provider timeout or error
- The selected model errored or timed out; the router fails over to the next passing candidate and writes the hop to the trace. If every candidate fails, the request errors — widen the cascade with
["top_k", N, ["argmax"]]. - Flow exceeds the limits
- A flow is admitted only within bounds (≤ 256 nodes, in-degree ≤ 32); past that it's rejected before running — split the workflow. (Billing counts a flow as one billable run up to 5 nodes; the structural cap is separate.)
- Dry run passes, live call fails
/x/rankand/x/*/normalizeadmit and evaluate the term but run no inference — a live failure is a provider/runtime issue (rate limit, timeout, auth), not a policy error. Read the trace's decision path for the failing hop.
Stop hardcoding. Send the decision.
Point your traffic at one endpoint, generate a policy or workflow at runtime, and send it with the call. unhardcoded routes through your provider keys and traces every decision.