Ship a policy in five minutes.
unhardcoded is OpenAI-compatible. Keep your SDK, change the baseURL, generate a policy in your backend, and send it with the call. It routes to the cheapest model that passes your rules, over your own provider keys, and writes a replayable trace of every decision.
From your SDK to a traced call.
No app rewrite, no server config. Point the OpenAI SDK at one endpoint, build a policy in your backend, send it with the call, and read the trace it returns.
You need an unhardcoded API key for the apiKey below, and your provider keys (OpenAI, Anthropic, Gemini, …) configured on the workspace — inference always runs over your own accounts. During early access, both are set up with you during onboarding.
Install
Keep the OpenAI SDK you already use. The policy_ir is plain JSON — no extra package required.
$ npm i openai
Point the client at the endpoint
Same SDK, one new baseURL. Your messages and parameters pass through unchanged.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.unhardcoded.com/v1",
apiKey: process.env.UNHARDCODED_KEY,
});
Build a policy and send it with the call
Generate the policy_ir term at request time, then attach it to create(). The router picks the cheapest survivor that clears your rules. The model field is a free-form label used only to group traces — routing comes from the attached policy_ir, so any string works.
// built in your backend, at request time — a plain JSON term
const policy_ir = [
"policy",
["ev_zero"],
["and", ["meets_req"], ["not", ["is", "disabled"]], ["has_cap", "supports_tools"],
["cmp", "bench_intelligence", "ge", 0.5]], // filter
["neg", ["normalize", ["field", "price_out"]]], // cheapest survivor
["argmax"], ["id"],
["always", { action: "next_candidate" }],
];
const res = await client.chat.completions.create({
model: "policy:support", // free-form trace label, not a route
policy_ir,
messages,
});
Not on the OpenAI SDK? It is a plain HTTP call — policy_ir is a top-level sibling of model and messages in the JSON body:
$ curl https://api.unhardcoded.com/v1/chat/completions \
-H "Authorization: Bearer $UNHARDCODED_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "policy:support",
"messages": [{ "role": "user", "content": "…" }],
"policy_ir": ["policy", ["ev_zero"], …]
}'
The response carries the decision — which model was selected, why, the cost, and a replayable trace id (one illustrative run; the live pick depends on your catalog at request time):
{
"selected": "gemini-3.5-flash",
"reason": "cheapest passing candidate",
"policy": "301140696-1054914287",
"cost": "$0.018",
"trace": "req-204815"
}
One OpenAI-compatible surface.
An OpenAI-compatible completions endpoint, dry-run helpers, a field-schema lookup, and one header. Carry a policy_ir or flow_ir in the request body; inference runs over the provider keys you configured on the host.
- POST /v1/chat/completions
- OpenAI-compatible completions. Carry
policy_irorflow_irin the body; everything else is the standard request shape. The router resolves the model over the live catalog and writes the decision to the trace. - POST /x/rank
- dry-runReturns the candidate ranking and per-model verdicts without running inference. Use it to preview which models clear the floor and why before a single token is spent.
- POST /x/policy/normalize
- dry-runAdmits a
policy_irand returns its canonical form, contentfingerprint, and grammarversion— identify and cache a term without running it. - POST /x/flow/normalize
- dry-runAdmits and identifies a
flow_ir— the bounded graph and each node's policy — before use, so a malformed workflow fails fast instead of mid-run. - GET /x/fields
- Returns the live field vocabulary — the core fields plus this host's registered extensions — that policies gate on with
cmp/isand score on withfield. The source of truth for valid field names (e.g.price_out,context,bench_intelligence). - Authorization: Bearer <key>
- Authenticate every request with your unhardcoded key. The key identifies your workspace and its trace history — never a provider account.
Note: BYO provider keys are configured on the host. unhardcoded routes through your own OpenAI, Anthropic, Gemini, and other accounts — it does not resell tokens or mark up inference, and it never bills the model spend.
The raw policy_ir term.
The real interface is the term itself: a plain JSON array you inspect, hash, log, and replay. A policy is a seven-element array — ["policy", evidence, filter, rank, select, mutate, fallback]. You author the filter, rank, and select; keep evidence (["ev_zero"]), mutate (["id"]), and fallback (["always", {"action":"next_candidate"}]) as-is. The evidence slot reserves room for future provenance-weighting of candidates; today every policy leaves it ["ev_zero"] (none attached). Hover a verb to see where it lives in the term.
[ "policy", // evidence — the fixed evidence slot, kept as-is ["ev_zero"], // filter — the gate; narrows the host floor, never widens it ["and", ["meets_req"], ["not",["is","disabled"]], ["has_cap","supports_tools"], ["cmp","bench_intelligence","ge",0.5], ["cmp","price_out","le",5]], // rank — score the survivors, cheapest wins ["neg",["normalize",["field","price_out"]]], // select — take the single top of the ranked list ["argmax"], // mutate — pass the request through unchanged ["id"], // fallback — on any failure, next candidate ["always",{"action":"next_candidate"}] ]
The term is just data — built in your backend, sent with the call. A typed builder (@unhardcoded/policy / buildPolicy(...)) is planned convenience sugar over this language — it will lower a compact spec to the same array. It is not shipping yet; today the raw term above is the interface, and every preset below is one of these arrays you can copy as-is.
The full sigma-pol/v1 operator vocabulary, grouped by position. These are the core ops — there is no filter/rank/select/fallback wrapper.
- filter — predicates
and/or/not— boolean combinators ·has_cap <name>— a candidate capability flag (e.g.supports_tools,supports_json_mode) ·is <flag>— a boolean model field (e.g.cap_tools,cap_reasoning,in_image,no_log,has_tee,disabled); image and reasoning gates useis(in_image/cap_reasoning), nothas_cap·cmp <field> <op> <value>— a numeric bound (ge,le, …) ·meets_req— the request's own implied requirements, auto-derived from the call (tool calls require tool support, image inputs require vision, a JSON response format requires structured output) ·family_eq <name>— keep only candidates in one model family (pin a node to a single provider or model line). Returns the surviving candidates.- rank — scorers
field <name>— a raw catalog field ·normalize— scale a sub-score to 0–1 ·neg— invert (cheapest, lowest-latency) ·scale <w>— weight a sub-score ·add— sum weighted sub-scores ·zero— a constant score (when the filter already leaves a single survivor). Produces one score per survivor.- select — selectors
argmax— the single highest-scoring model ·top_k <n> <sel>— keep the top n as an ordered cascade ·sample <t>— a reproducible stochastic pick.- mutate — request mutators
id— pass the request through unchanged (the default). Request-mutators such asclamp_paramalso exist for shaping parameters on the chosen call.- fallback
always {"action":"next_candidate"}— on a provider failure, advance to the next survivor in the cascade.
Filter first. Rank survivors. No silent downgrade.
Routing is deterministic and ordered. Spend ceilings and quality minimums live in the filter, not the score — a cheap model can never win on points against your rules. If it does not clear the floor, it is not a candidate at all.
Filter first
Candidates that lack a required capability, miss the quality floor, or exceed the price ceiling are eliminated — removed, never silently substituted.
Rank the survivors
Among the models that cleared the floor, the scorer orders them — usually cheapest-first.
Select the top
One model, chosen deterministically — the cheapest that passed.
Fall back in order — and the floor is a guarantee, not a suggestion.
If the selected model times out or errors, the router moves to the next passing candidate, cheapest-first, and every hop is recorded. It optimizes cost beneath your floor, never around it. If no model meets the requirements, the request fails loudly — you never get a silent downgrade you didn't ask for.
Change the rule. Watch the winner move.
The same filter-then-rank semantics, made tangible. Toggle the rules and the winner moves; tighten the floor past what any model can meet and the call fails loudly instead.
Illustrative catalog and figures. Against the live catalog, POST /x/rank returns the same per-model verdicts without running inference. See the worked example →
The trace object.
Every completion carries a trace: a structured record of how the model was chosen. The trace is what makes every routing decision replayable and auditable — the same term, the same catalog snapshot, the same verdicts.
- trace
- The replayable identifier/fingerprint for the run — e.g.
"req-204815". - policy.fingerprint
- Content fingerprint of the normalized
policy_ir. Identical terms share a fingerprint. - policy.version
- The grammar version —
"sigma-pol/v1". - selected
- The chosen model id.
- candidates[]
- One entry per considered model:
{ model, status: "winner" | "passed" | "rejected", dropped_by: "<rule that eliminated it>" | null }. A rejected model carries the exact rule that dropped it; a passing model carriesnull. - reason
- Human-readable why-this-model.
- fallback[]
- Ordered hops taken on failure:
{ from, to, cause }. An empty array if the first pick succeeded. - latency_ms
- Total routing + inference latency.
- cost
- USD billed for the run.
- usage
{ prompt_tokens, completion_tokens }— informational only; billing is per run, not per token.- created
- ISO timestamp.
Every rejected model names the exact rule that eliminated it in dropped_by, so the trace reads as a complete account of the decision. To see the same per-model verdicts before spending a token, dry-run with POST /x/rank — or try the playground above.
Compose policies into a workflow.
Not every task is one call. A workflow is a bounded (up to 256 nodes), acyclic graph of LLM steps — ["flow", { id: node, … }] — with exactly one input node and one output node. Each llm node carries its own system prompt and a full policy_ir term, so every step routes independently at request time over the live catalog. You send the whole graph as flow_ir on a single call, and it writes one stitched, replayable trace.
Edges are pull-model: a node's inputs list names the nodes it consumes, so "b": { inputs: ["a"] } means a → b — a node with two or more inputs is a fusion step, and there are no loops or conditionals. Switch the examples below; each box is an llm node with its own policy and model, and the dashed ends are the input / output nodes.
That graph is just data. Below is a workflow written out as flow_ir — the same input / llm / output nodes, the same inputs edges, each llm node carrying its own policy:
[
"flow",
{
"u": { "kind": "input" },
"draft": { "kind": "llm",
"system": "Draft an answer.",
"policy": [ "policy", … ], // a full policy_ir term — cheapest survivor
"inputs": [ "u" ] },
"critique": { "kind": "llm",
"system": "List the concrete flaws in the draft.",
"policy": [ "policy", … ], // strongest model, no cost ceiling
"inputs": [ "draft" ] },
"revise": { "kind": "llm",
"system": "Rewrite the answer, fixing every point.",
"policy": [ "policy", … ], // strongest model
"inputs": [ "u", "draft", "critique" ], // fan-in: three predecessors
"template": "Q:\n$1\n\nDraft:\n$2\n\nCritique:\n$3" },
"out": { "kind": "output", "inputs": [ "revise" ] }
}
]
Each policy is a full policy_ir term — the same seven-element array from The SDK above, elided here as …. Copyable end-to-end workflows are in Presets below.
A workflow has three node kinds. Author the llm nodes; input and output are the single entry and exit.
- input
- The single entry node —
{ "kind": "input" }. The call'smessagesenter the graph here. Exactly one per workflow. - llm
- A routed step.
system— its prompt ·policy— a fullpolicy_irterm, resolved over the live catalog at runtime ·inputs— the node ids it consumes ·template(optional) — joins multiple inputs with$1, $2, …placeholders in input order. Each node routes — and fails over — on its own. - output
- The single exit node —
{ "kind": "output", "inputs": ["…"] }. Itsinputsname the node whose result is returned to the caller. Exactly one per workflow. - edges & shape
- Pull-model:
inputsdefines the DAG ("b": { inputs: ["a"] }isa → b). Fan-out is one node feeding several; fan-in is several feeding one fusion node. Acyclic, bounded, no loops or conditionals — every node runs once, so cost and latency are knowable up front.
Send a workflow as flow_ir on any completion; dry-run with POST /x/flow/normalize to admit the DAG and each node's policy before a single token is spent. Limits: ≤ 256 nodes, in-degree ≤ 32. Billing counts a workflow as one run up to 5 nodes, and the whole graph writes one stitched trace — see the trace object. For the picture, the workflow diagrams on the product page show three workflow shapes end to end.
Copy a term, send it.
Ready-to-send policies and workflows. Each is a piece of data — drop a policy into policy_ir or a workflow into flow_ir on any OpenAI-compatible call, and the host admits it, fingerprints it, and interprets it deterministically over the live catalog. Dry-run first with POST /x/rank for policies or POST /x/flow/normalize for workflows.
Explore the interactive policy playground →
Each is a complete policy_ir term — the seven-element array from above, with the filter, rank, and select filled in for one job. Copy one as your starting point and edit the rules.
More starting points
Workflows · flow_ir
Six ready-to-send workflows — each a bounded graph where every node carries its own policy. The full flow_ir grammar is in Workflows above; the complete term on each card is folded, so open it to copy. Dry-run with POST /x/flow/normalize first.
Common failures.
What each one means and what to change. Dry-run with POST /x/rank to see the decision before it costs anything.
- No model passes the floor
- Your filter excluded every candidate, so the request fails loudly instead of silently downgrading. Loosen a
cmpbound or a hard["is", …], and run/x/rankto see which rule dropped each model. - Unknown field or operator
- Admission rejects a term that names a field the host doesn't serve or an op the interpreter doesn't know (
invalid_policy). Use a real field fromGET /x/fields— e.g.price_out, notprice. - Provider key missing
- The chosen model's provider isn't configured on your workspace, so the upstream call can't authenticate. Add the provider key during onboarding — inference always runs over your own accounts.
- Provider timeout or error
- The selected model errored or timed out; the router fails over to the next passing candidate and writes the hop to the trace. If every candidate fails, the request errors — widen the cascade with
["top_k", N, ["argmax"]]. - Workflow exceeds the limits
- A workflow is admitted only within bounds (≤ 256 nodes, in-degree ≤ 32); past that it's rejected before running — split the workflow. (Billing counts a workflow as one billable run up to 5 nodes; the structural cap is separate.)
- Dry run passes, live call fails
/x/rankand/x/*/normalizeadmit and evaluate the term but run no inference — a live failure is a provider/runtime issue (rate limit, timeout, auth), not a policy error. Read the trace's decision path for the failing hop.
Stop hardcoding. Send the decision.
Point your traffic at one endpoint, generate a policy or workflow at runtime, and send it with the call. unhardcoded routes through your provider keys and traces every decision.