An LLM policy router is a service that sits in front of your model providers and makes the model-selection decision for you on every request, according to rules you supply. Instead of hardcoding model: "gpt-5.5" in your code, you send a policy — a small, structured set of rules — alongside the call. The router evaluates that policy against a live catalog of available models, filters out anything that fails your constraints, and routes to the cheapest model that still passes. It runs over your own provider keys, fails over when a model is unavailable, and writes an auditable trace of exactly which models it considered, why it rejected them, and which one it chose.

In one line: a policy router turns "which model do I call?" from a string literal buried in your codebase into a decision that is computed, enforced, and logged at runtime.

The short definition

A policy router has three responsibilities, and all three matter:

  • Decide. Given your rules and the current set of models that can serve the request, pick one. The selection is a function of your policy and the live catalog — not a value you typed once and forgot.
  • Enforce. A model that violates your rules is never chosen. The router does not "prefer" the right model; it structurally excludes the wrong ones before it ranks the rest.
  • Account. Every decision produces a record: the candidates, the filters that removed some of them, the ranking, the winner, the cost, the latency, and any failover hops. That record is replayable.

Most teams arrive at the idea of a policy router after living without one. You start with a single hardcoded model. Then you add a cheaper model for simple requests and a stronger model for hard ones, and now there is an if statement. Then a model gets deprecated, or a new one launches that is cheaper and just as good, and the if statement has to be found, edited, tested, and redeployed in every service that calls an LLM. A policy router is the recognition that model selection is a policy decision that deserves its own layer — the same way authorization or feature flagging eventually earn their own layer instead of living as scattered conditionals.

How it differs from a gateway, proxy, load balancer, or marketplace

"Router" is an overloaded word, so it helps to draw sharp lines against the things a policy router is not. The distinction is always the same: does the component reason about which model should serve the request, or does it just move bytes?

  • An API gateway / proxy forwards your request to a provider and layers on cross-cutting concerns: authentication, rate limiting, retries, caching, and logging. It is plumbing. It answers "how do I reach this endpoint reliably?" It does not decide which model is appropriate for a given request — you still tell it the model. A policy router answers a different question: "which model should run this, and why?"
  • A load balancer spreads traffic across interchangeable backends to maximize throughput and availability. Its core assumption is that the targets are equivalent — any one will do. Models are not equivalent: they differ in price, context window, capability, latency, and provider. A policy router selects on those differences instead of treating them as noise.
  • A model marketplace (or aggregator) gives you one API key and one bill across many providers, usually by reselling tokens at a markup and letting you pick a model by name from a long list. That solves access and billing. It does not solve the decision: you still choose the model, and you still pay a per-token margin. A policy router takes over the choice and, in unhardcoded's case, runs on your own keys with per-run pricing instead of token resale.

The clean test: if you still have to name the model in your request, you have a gateway, a proxy, a balancer, or a marketplace. If you describe the rules the model must satisfy and let the system choose, you have a policy router.

A policy router can sit on top of all of these — it is a higher layer of abstraction, not a competitor to the plumbing. unhardcoded happens to be OpenAI-compatible, so it behaves like a gateway at the wire level (point your SDK's base URL at it) while doing policy work the gateway never did. For a deeper comparison against named tools, see unhardcoded vs Portkey vs LiteLLM vs OpenRouter.

What is a "policy" here?

A policy is the set of rules you send with the call. With unhardcoded, that policy is expressed as a policy_ir: a compact, structured term — a JSON array — that the router can parse, validate, hash, and evaluate deterministically. It is not English you hope the system interprets correctly; it is a small instruction language with a fixed shape.

The policy_ir is the "policy" tag, an evidence slot, and five working verbs:

["policy", ["ev_zero"], /* evidence */
           ..., /* filter  */
           ..., /* rank    */
           ..., /* select  */
           ..., /* mutate  */
           ...] /* fallback*/

The five working verbs each do one job:

  • filter — hard constraints. Drop any model that fails (for example: context window below a threshold, missing a capability, above a price ceiling, wrong region).
  • rank — how to order what survives the filter (for example: cheapest first, or fastest first).
  • select — which ranked candidate to use, and the failover cascade behind it.
  • mutate — adjustments applied to the chosen call (for example: clamp parameters to the model's limits).
  • fallback — what to do when the chosen model fails at runtime (it errors, times out, or rate-limits). This is distinct from nothing passing the filter, which fails the request loudly.

Because the policy is structured rather than free-form, the router can admit it before running it — rejecting unknown operations or undeclared fields instead of silently doing the wrong thing. It can also hash it, so the same rules produce the same identifier in your trace every time. If you want the full breakdown of each position and how a decision is evaluated, read what is a policy_ir? The anatomy of a routing decision. The policy router is the broader pattern; the policy_ir is the concrete artifact that makes the pattern enforceable and auditable. The complete picture lives in the pillar guide on policy-based LLM routing.

Enforced, not advisory

This is the line that separates a policy router from a recommendation engine, and it is the property unhardcoded is built around. There are two ways a system can "respect" your rules:

  • Advisory: the system suggests a model and trusts the caller to follow the suggestion. Nothing stops a request from reaching a model that breaks the rule. The rule is a hint.
  • Enforced: the system will not return a result from a model that violates the policy. A model that fails a filter is removed from the candidate set before ranking even begins. The rule is a guarantee.

The difference is not academic. If your policy says "context window at least 200k tokens and price under a ceiling," an advisory system might still route a long document to a model that truncates it, and you would only find out from a degraded answer. An enforced router cannot do that: the undersized model is filtered out, full stop. The constraint holds whether or not anyone is watching, on the millionth request as reliably as the first.

Enforcement is also why the trace is trustworthy. Because every decision passes through the same filter-rank-select machinery, the trace is not a best-effort log written after the fact — it is the actual record of the decision the router was structurally bound to make. You can replay it, hand it to an auditor, or diff it against yesterday's behavior. That is what makes the routing auditable rather than merely logged.

A concrete example

Suppose you are summarizing support tickets. Most are short and simple; a few are long threads that need a big context window. You want the cheapest model that can do the job, you want a known fallback if your first choice is down, and you never want to send personally identifiable data to a model outside your region.

Without a policy router, that logic is a tangle of conditionals and a hardcoded model name. With a policy router, you describe the rules once and send them with the call. The router evaluates them against the live catalog:

# The catalog at request time (illustrative)
candidates = [
  "gemini-3.5-flash",   # 1M ctx, in-region, cheap  → passes
  "claude-haiku-4-5",   # in-region, cheap          → passes
  "gpt-5.5",            # the old hardcoded baseline → struck
]

# filter:  ctx >= 200k AND region == "eu" AND price <= ceiling
# rank:    cheapest first
# select:  best passing candidate, then failover in order

 selected: gemini-3.5-flash
 reason:   cheapest model passing all filters
 fallback: claude-haiku-4-5 (if primary unavailable)
 trace:    written — candidates, filters, winner, cost, latency

Three things just happened that a gateway or marketplace would not do for you. First, the router chose the model from your rules rather than from a string you typed. Second, the choice was enforced: any out-of-region or undersized model was excluded, not merely deprioritized. Third, the decision was recorded, so when a finance reviewer asks why this request cost what it did, or an incident review asks which model served a bad answer at 3 a.m., the answer is in the trace — not reconstructed from memory.

When a cheaper, in-policy model launches next quarter, you do not hunt down conditionals across services. The catalog updates, your policy stays the same, and the router starts selecting the better option on its own. That is the whole point of moving model selection out of your code and into a policy. See the unhardcoded overview for how it plugs into an OpenAI-compatible call, or read why the status quo is so costly in stop hardcoding model decisions.