To send your first policy with the call, do four things: install openai, point the OpenAI SDK's baseURL at https://api.unhardcoded.com/v1, build a policy_ir term in your backend (a plain JSON array), and pass it as a policy_ir field on chat.completions.create(). The router admits the policy, evaluates it over the live model catalog, routes to the cheapest model that passes your rules, fails over if one breaks, and writes an auditable trace. No app rewrite, no preregistered routes, no server config — just one new field on a call you already make.

This is the whole loop in about five minutes. Below, each step has the exact code, plus how to dry-run the decision before you spend a token and how to read the trace afterward.

The five-minute version

If you have used the OpenAI SDK before, you already know 90% of this. Here is the shape of the change, end to end, before we break it down:

  • Install the OpenAI SDK you already use.
  • Repoint the client's baseURL and pass your unhardcoded key.
  • Build a policy_ir term: the rules a model must satisfy and how to pick among the survivors.
  • Send it on the call by setting a policy:* model string (a free-form trace label) and adding the policy_ir field.
  • Read the trace to see which model won, what it cost, and every candidate it beat.

That sequence is the core idea behind policy-based LLM routing: instead of pinning a model name in your code, you ship the decision rule with the request and let the router resolve it against what is actually available right now.

Step 1 — Install the OpenAI SDK

Add the OpenAI SDK you already use. Nothing here replaces your existing client library — you keep it.

$ npm i openai

A policy_ir is just a plain JSON array — you build it with the language and data you already have, no extra dependency required. You can inspect it, hash it, log it, and replay it. You need the managed host only to execute one over a live catalog with traces and failover. (A higher-level buildPolicy(...) helper that compiles a short spec down to the term is planned convenience sugar; it is not a shipping package yet, so this guide builds the raw term directly.)

Step 2 — Point the client at one endpoint

Same SDK, one new baseURL. Authenticate with your unhardcoded key, which identifies your workspace and its trace history — never a provider account. Your provider keys (OpenAI, Anthropic, Gemini, and the rest) are configured on the host, so the router runs inference through your accounts and never resells tokens.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.unhardcoded.com/v1",
  apiKey: process.env.UNHARDCODED_KEY,
});

This is the only configuration step, and it is reversible in one line. Point baseURL back at OpenAI and your app behaves exactly as before. There is no dashboard you have to populate first, and no routes to register — the policy travels with the request.

Step 3 — Build a policy_ir in your backend

Generate the policy_ir at request time. A policy is a small contract: which capabilities a model must have, what quality floor it has to clear, what it is allowed to cost, how to pick among the models that pass, and what to do if the winner fails.

// built in your backend, at request time — the raw policy_ir term
const policy = [
  "policy",
  ["ev_zero"],   // evidence: reserved slot, none attached
  // filter: meets reqs, not disabled, supports tools, under the ceiling
  ["and", ["meets_req"], ["not", ["is", "disabled"]],
          ["has_cap", "tools"], ["cmp", "price_out", "le", 6.0]],
  ["neg", ["normalize", ["field", "price_out"]]],  // rank: cheapest
  ["argmax"],   // select: the single best survivor
  ["id"],       // mutate: no-op
  ["always", { action: "next_candidate" }],  // fallback
];

Read that term out loud and it is exactly the sentence a product owner would write: "Use any model that meets the request's requirements, is not disabled, supports tool calling, and costs no more than $6 per million output tokens; among those, take the cheapest; if it fails, fall through to the next one." Each part maps cleanly onto a position:

  • The and predicate is the filter — the hard floor. ["meets_req"] applies the request's requirements, ["has_cap", "tools"] demands tool support, and ["cmp", "price_out", "le", 6.0] caps the price. A model that misses any of these is never a candidate.
  • ["neg", ["normalize", ["field", "price_out"]]] is the rank scorer — higher is better, so negating normalized price makes the cheapest survivor score highest — and ["argmax"] is the select that takes that single best one.
  • ["id"] is the mutate no-op, and ["always", { action: "next_candidate" }] is the fallback — the runtime behavior when the chosen model errors or times out.

The result is a policy_ir: the term ["policy", evidence, filter, rank, select, mutate, fallback] — the "policy" tag, an evidence slot, plus the five working verbs. It is a plain JSON array, so you can hand-write or generate it directly. If you want the full structure, read what a policy_ir is and how a routing decision is laid out.

Why the floor is sacred. Spend ceilings and quality minimums live in the filter, not the score. A cheap model can never "win on points" against your rules — if it does not clear the floor, it is not a candidate at all. The cheapest model only wins after everything that disqualifies it has been applied.

Step 4 — Send the policy with the call

Now attach the policy to the call you already make. Set a policy:* model string — it is just a free-form label your trace history is grouped under, not a routing instruction; any string works, and the routing is driven entirely by the attached policy_ir, never parsed from the model name — and add the policy_ir field. Everything else — messages, tools, temperature, streaming — is the standard OpenAI request shape and passes through untouched.

const res = await client.chat.completions.create({
  model: "policy:support",
  policy_ir: policy,
  messages,
});

When this request lands, the router hashes the policy_ir, admits it (rejecting unknown ops or undeclared fields), evaluates it over the live catalog, and routes to the cheapest model that clears your floor. Because the policy rides in the body, the same endpoint can carry a different policy on the next call — nothing is pinned to a server config. This is the difference between a router that reads a static config file and one that is genuinely runtime; see what an LLM policy router is for the full mental model.

Step 5 — Read the trace

Every routed request produces a replayable, auditable trace. It is the answer to the question every hardcoded setup leaves you guessing at: which model actually ran, and why? A trace records the resolved policy hash, the candidate set, the per-model verdicts against your floor, the model that was selected, what it cost, and any failover hops.

{
  "policy": "policy:support",
  "resolved": "gemini-3.5-flash",
  "reason": "cheapest passing candidate",
  "candidates": 9,
  "filter": ["and", ["meets_req"], ["has_cap", "tools"], ["cmp", "price_out", "le", 6.0]],
  "hops": [
    { "model": "gemini-3.5-flash", "status": "ok", "latency_ms": 740 }
  ]
}

Here the router landed on gemini-3.5-flash: it cleared the tool-calling requirement, came in under the price ceiling, and was the cheapest survivor. The day a cheaper model qualifies, the same policy picks it up automatically — your code does not change. That is the point. The old way was to write model: "gpt-5.5" in source and revisit it by hand every quarter; the policy makes that decision once, declaratively, and re-derives it on every call.

Dry-run first with POST /x/rank

Before you wire a new policy into production traffic, preview it. POST /x/rank takes the same policy_ir and returns the candidate ranking and per-model verdicts without running inference — so you can confirm which model would win, and which ones your floor excludes, before a single token is spent.

$ curl https://api.unhardcoded.com/v1/x/rank \
    -H "Authorization: Bearer $UNHARDCODED_KEY" \
    -H "Content-Type: application/json" \
    -d '{ "policy_ir": [ ...your term... ] }'

Use it in CI to assert that a policy still resolves the way you expect, or in review to show a teammate exactly how a rule change shifts the ranking. A dry-run that returns "no candidates clear the floor" is a far better failure than discovering it in a production 500.

That is the full loop: install, repoint, build, send, trace — with a dry-run whenever you want a preview. If you are weighing this against the alternatives, the hidden cost of hardcoding model decisions spells out what you stop paying for, and the docs have the complete API reference and a library of copyable presets.