A policy_ir is a runtime object: a small JSON term your application builds and sends with each LLM call to tell the router how to choose a model. It is the machine-readable form of a routing rule — "pick the cheapest model that supports tools, has at least a 128k context, and comes in under a price ceiling; if it errors, fail over to the next one." The router reads that term, evaluates it against the live model catalog, and runs the winner. The name is short for policy intermediate representation: it sits between the rule you wrote and the model that actually executes.

The whole thing is one canonical structure — the "policy" tag, an evidence slot, plus five working verbs: ["policy", evidence, filter, rank, select, mutate, fallback] — and once you understand those verbs you understand exactly how a routing decision gets made. This post walks through each one with a tiny example, then shows a complete term and how it rides along with your request.

What a policy_ir actually is

Three things, precisely:

  • It is data, not code. A policy_ir is a JSON value with a fixed grammar. It does not run; it gets evaluated. There is no LLM in the loop deciding the route, and there is no arbitrary script the router executes on your behalf.
  • It is a decision, expressed declaratively. You describe the constraints and preferences — what must be true, what to prefer, how to pick, what to do on failure — and the interpreter computes the result. You never name a single fixed model and hope it stays the right choice.
  • It is portable. Because it is a canonical term, the same policy_ir produces the same hash, can be stored in a trace, and can be re-evaluated against any snapshot of the catalog. That is what makes a decision auditable — you can replay exactly why a model was chosen.

If you have read what an LLM policy router is, the policy_ir is the thing the router consumes. This post zooms all the way in on that object.

A runtime object, not server config

The single most important distinction: a policy_ir is built and sent per request, in your application code, at the moment of the call. It is not a YAML file checked into a config repo, not a setting in a dashboard, not a rule pinned to your account.

That difference is the whole point. Server-side config is global and static: it applies to every request until you redeploy or click "save," and the request body has no say in it. A policy_ir inverts that. Your backend decides — for this call, given this user, this tenant, this feature flag, this budget — what the rules should be, and ships them in the request.

Two calls a millisecond apart can carry completely different policies. A free-tier user's request might carry a strict cost ceiling; the same endpoint, same code path, can hand a paying customer's request a policy that prefers a stronger model. Nothing changed on the server. The rule travelled with the call.

Because the rule is a value your code computes, you get to use your own language, your own variables, and your own logic to build it. The router never needs to know about your tenants or your pricing tiers — it only needs the finished term. This is what "send the policy with the call" means in practice, and it is the difference between configuring a router and programming one inline.

The shape of the term

Every policy_ir has the same shape: a JSON array with the "policy" tag, an evidence slot, and five working verbs, in a fixed order.

["policy", ["ev_zero"], filter, rank, select, mutate, fallback]

Position one is the literal tag "policy" — it marks the term as a policy_ir and pins the grammar. Position two is a reserved evidence slot (["ev_zero"] when you attach none). The five verbs that follow are the working parts of a routing decision, and they run in order, like a tiny pipeline over the catalog:

  1. filter — narrow the catalog to candidates that qualify.
  2. rank — order the survivors by what you prefer.
  3. select — pick how many to take and in what mode.
  4. mutate — adjust the call parameters for the chosen model.
  5. fallback — say what happens when a model fails.

Read left to right, it is a sentence: filter to who's allowed, rank them by preference, select the winner, mutate the request to fit it, and define a fallback for when reality disagrees. Any position can be a no-op when you don't need it, but the slot is always there, so the structure is identical for a one-line policy and a complex one. That uniformity is what makes the term cheap to validate and stable to hash.

The five verbs, one at a time

filter — who qualifies

The filter is a predicate over catalog entries. It keeps only the models whose attributes satisfy your hard requirements: capabilities, context window, modality, a price ceiling. Anything that fails is dropped before ranking. This is where you encode the non-negotiables. A predicate is a prefix-array term — an operator name first, then its arguments — and and combines several into one floor.

["and",
  ["meets_req"],
  ["not", ["is", "disabled"]],
  ["has_cap", "tools"],
  ["cmp", "context", "ge", 128000]]

Reads as: keep models that meet the request's requirements, are not disabled, support tools, and have at least a 128k context window. Each leaf is its own little term — ["has_cap", "tools"] requires the tools capability, ["is", "no_log"] would test a boolean field like no_log or has_tee, and ["cmp", "context", "ge", 128000] compares a numeric field.

rank — what you prefer

Rank is a scorer: it assigns each survivor a number, and higher is better. Most routing rules are fundamentally "cheapest that qualifies," so the common case scores on price — but because a scorer is just an expression, you can score on latency, a benchmark, or a weighted blend.

["neg", ["normalize", ["field", "price_out"]]]

"Cheapest" is "highest score on negated price": ["field", "price_out"] reads the output price, normalize scales it into a comparable range, and neg flips it so the lowest price earns the highest score. Filter already guaranteed every candidate is good enough; rank just orders the ones that cleared the bar, so the top of the list is the cheapest acceptable model — never the cheapest model overall.

select — how many, and how

Select turns the ranked list into the actual choice. The common case is "take the single best," but select also controls the strategy: take the one winner, or keep the top N as an ordered candidate set the fallback can walk through.

["argmax"]

Here, argmax runs the single highest-scoring candidate. With ["top_k", 3, ["argmax"]] the router keeps the next two on the bench — a three-deep failover cascade — so fallback has somewhere to go.

mutate — fit the request to the model

Different models want slightly different inputs. Mutate lets the policy adjust call parameters for the chosen model without touching your application code — clamp a parameter to the model's ceiling or normalize text. When you need no adjustment at all, mutate is the identity transform.

["id"]

That ["id"] is the no-op: pass the request through untouched. A real transform takes the same prefix-array shape — for example ["clamp_param", "temperature", 0, 1] bounds a parameter. Mutate is intentionally narrow — it shapes the request, it does not rewrite your prompt.

fallback — what happens when a model fails

Fallback is the policy's failure plan. When the selected model errors, times out, or rate-limits, fallback decides what to do: move to the next candidate from select, stop, or jump to a named safe default. This is the position that turns a single point of failure into a graceful degradation path.

["always", { "action": "next_candidate" }]

On any failure from the chosen model, always applies one action — here, next_candidate — and advances to the next model on the bench under the same constraints. Every hop is recorded, so a failover never becomes a mystery later.

Admission, hashing, and deterministic evaluation

When a policy_ir arrives in a request, the router does three things before a single token is spent, in this order:

  1. Admission. The router validates the term against the grammar. The tag, the evidence slot, and the five working verbs must be present and well-formed; every operator must be one the interpreter knows; every attribute referenced in a filter or rank must be a declared catalog field. Unknown operators or undeclared fields are rejected outright — a malformed or unexpected policy never silently does the wrong thing.
  2. Hashing. The admitted term is encoded canonically (a single, stable byte representation for a given term) and hashed. That content hash names this exact policy. It lets the router cache admission, dedupe identical policies, and — most importantly — pin the decision in the trace so it can be matched back to the rule that produced it. Same term in, same hash out, every time.
  3. Deterministic evaluation. The interpreter runs the working verbs over the live catalog: filter, then rank, then select, then mutate, producing a chosen model and an adjusted request. Given the same term and the same catalog snapshot, evaluation always yields the same result. There is no randomness and no model "deciding" — it is a pure function of (policy_ir, catalog).

That last property is why a routing decision is replayable. The trace stores the policy hash and the catalog snapshot, so you can re-run the exact evaluation and see why a model was chosen — months later, with no guessing. To see this admit-then-evaluate flow end to end, the five-minute quickstart walks through sending a real policy and reading the trace it produces.

A full example term

Here is a complete, valid policy_ir with all five working verbs filled in. It says: among models that meet the request, are not disabled, hold at least a 128k context, and come in under a price ceiling, pick the cheapest one; keep a three-deep failover bench; pass the request through unchanged; and on failure, advance to the next candidate.

[
  "policy",

  // evidence — reserved slot, none attached
  ["ev_zero"],

  // filter — hard requirements
  ["and",
    ["meets_req"],
    ["not", ["is", "disabled"]],
    ["cmp", "context", "ge", 128000],
    ["cmp", "price_out", "le", 6.0]],

  // rank — cheapest = negate normalized output price
  ["neg", ["normalize", ["field", "price_out"]]],

  // select — best candidate, two more on the bench
  ["top_k", 3, ["argmax"]],

  // mutate — pass the request through unchanged
  ["id"],

  // fallback — on failure, advance through the bench
  ["always", { "action": "next_candidate" }]
]

Evaluated against the 2026 catalog, that term selects gemini-3.5-flash as the live pick — the cheapest model that clears every floor — and keeps two more candidates ready behind it. The hardcoded alternative this replaces is a line like model: "gpt-5.5", frozen the day someone typed it and stale the day a cheaper-and-better model shipped. The policy_ir never goes stale, because it describes the requirement, not the answer.

How it travels with the call

The policy_ir rides in the body of an ordinary, OpenAI-compatible request. You point your SDK's baseURL at the router, set a policy:* model string, and attach the term. That model string is just a free-form label your traces are grouped under — any string works, and the routing is driven entirely by the attached policy_ir, never parsed from the model name. Your messages and parameters are unchanged.

const res = await client.chat.completions.create({
  model: "policy:auto",   // a free-form trace label, not a model name
  messages,
  policy_ir: [ "policy", ["ev_zero"], filter, rank, select, mutate, fallback ]
});

From there the path is the one above: admit the term, hash it, evaluate it over the live catalog, run the chosen model over your own provider key, fail over if needed, and write a replayable, auditable trace of every step. You pay per run, not per token, and your keys are never resold.

Because the term is just a value in the request, the same endpoint serves every rule your app can express. There is no separate control plane to keep in sync — the policy is the request. If a single model isn't enough and you need to chain several LLM steps with the same per-call control, that is what the sibling structure, flow_ir (a bounded DAG of LLM nodes), is for; both are introduced in the complete guide to policy-based LLM routing.

That is the whole anatomy of a routing decision: five working verbs, evaluated deterministically over a live catalog, sent with the call and traced every time. When you are ready to wire it in, the docs have the full operator reference.