What is the runtime policy layer?

The runtime policy layer is the place where model decisions are made — at request time, from a policy that travels with the call, rather than from hardcoded model names or static server config. It admits the policy, evaluates it over a live catalog of models and prices, routes to the cheapest candidate that passes your rules, fails over, and writes an auditable trace of the decision.

How is this different from a model gateway or proxy?

A gateway forwards a request to a model you already chose. The runtime policy layer makes the choice. The decision logic — filter, rank, select, mutate, fallback — is data in the request body, not configuration baked into a dashboard or your code. That makes the same decision portable across providers and replayable after the fact.

If the policy language is open source, why pay for the host?

The same reason teams pay for managed infrastructure they could technically run themselves: you pay us to keep it running. The open core gives you the policy language and reference interpreter. The host gives you maintained provider modules, live catalog data, key management, failover, traces, history, and uptime — the operational surface that ages fastest as models churn.

The Runtime Policy Layer: Where LLM Infra Goes

Here is the thesis: model decisions should be runtime data, not hardcoded code or static server config. Today most teams pick a model the way they pick a library — they type its name into the source, ship it, and forget. But a model name is not a dependency. It is a pricing, capability, and availability bet that goes stale in weeks. The runtime policy layer is the part of your stack that turns that bet into a rule you send with every call — a small, portable, auditable artifact that the router evaluates against the live market the moment the request arrives. The model gets chosen at runtime, by your policy, over your keys. That is the shift, and it is coming whether or not any single company builds it well.

Why a model name in your code is a liability

A hardcoded model name encodes three assumptions at once: that this model is good enough for the job, that its price is acceptable, and that it will be available when the call fires. All three move. A cheaper model ships that clears your quality bar. A provider raises prices or deprecates the endpoint. A region degrades for an hour. Each time, someone opens the file, edits a string, and redeploys — and the reasoning behind the original choice lives nowhere except a stale comment and a fading memory.

This is the same mistake the industry already learned not to make with connection strings, feature flags, and infrastructure. We stopped hardcoding the database host. We stopped hardcoding which experiment a user sees. We externalized those decisions into systems built to change at runtime, with an audit trail. Model selection is exactly that kind of decision — high-churn, cost-sensitive, and consequential — and it is still buried in application code. We wrote a whole piece on the hidden cost of hardcoding model decisions; this one is about the layer that replaces it.

Why now: churn, cost, and governance

Three forces are converging, and they are why this layer becomes infrastructure rather than a nice idea.

Model churn. The frontier moves monthly. New models land, prices fall, capabilities reshuffle. A choice that was optimal in Q1 is overpriced by Q2. Code that hardcodes a model is permanently behind the market it depends on.
Cost pressure. LLM spend has moved from a line item to a board-level number. Teams need to route to the cheapest model that still meets the rules — per request, per route, per tenant — without a human re-deciding each time. That is an optimization problem, and optimization problems want data, not branches in a switch statement.
Governance. "Which model handled this request, why, and what did it cost?" is now a question auditors, customers, and incident reviews ask. If the answer is "whatever the deploy was that day," you do not have governance — you have a guess. The decision has to be recorded as it happens.

Any one of these would push selection out of code. Together they make the case overwhelming.

The runtime policy layer, defined

The runtime policy layer sits between your application and the providers, and it makes one thing its job: turning a policy you send into a routed, traced decision. It has two halves that have to exist together.

The first is an open language for the decision. At unhardcoded that is the policy_ir — a small term, ["policy", evidence, filter, rank, select, mutate, fallback] — an evidence slot plus five working verbs. Filter narrows the catalog to candidates that qualify. Rank orders them — usually cheapest-first. Select picks. Mutate adjusts the request for the chosen model. Fallback says what to do when a candidate fails. It reads like a routing decision because that is exactly what it is, written as data. If you want the full anatomy, see what a policy_ir is.

The second half is a maintained host that enforces it. The language is inert without something that knows today's model catalog, today's prices, which providers are healthy right now, and how to fail over when one is not. The host admits the policy (rejecting unknown ops and undeclared fields), evaluates it against the live catalog, routes the call over your own provider keys, and writes a replayable, auditable trace of every hop with its latency, cost, and reason. Crucially, the policy is enforced, not advisory — the router will not pick a model your rules excluded. For the mechanics, the LLM policy router explainer walks the request lifecycle end to end.

A model gateway forwards a request to a model you already chose. The runtime policy layer makes the choice — and records it. That difference is the whole product.

Provider aggregation is the moat

A policy language is only as useful as the market it can route across. The defensible asset under the runtime policy layer is provider aggregation: continuously maintained modules for each provider, normalized capability and pricing data, health signals, and the unglamorous glue that keeps a single OpenAI-compatible endpoint speaking fluently to a dozen back ends that all drift in their own direction.

This is the part that does not want to live in your codebase, because it never stops changing. A provider tweaks a rate limit, renames a parameter, ships a model, deprecates another, adjusts a price. Multiply that by every provider you might want to reach, and "keep the catalog current" becomes a standing job. The runtime policy layer absorbs that churn so your policy can stay simple: you say "cheapest model over 100k context that passes JSON-mode," and the layer knows which models that means this week. The decision language is open; the relentlessly maintained map of the provider market is the moat.

It also matters that we do not resell tokens. You bring your own keys and pay providers directly. That keeps incentives honest: we are paid to route well and stay current, not to mark up inference. Our interest and yours point the same way — toward the cheapest model that meets your rules. The economics are spelled out on the pricing page: per run, not per token.

What compounds on top

Once the decision is data sent at runtime, capabilities stack on the same foundation instead of being bolted on one by one.

Workflows. A single decision generalizes to a bounded chain of them. A flow_ir is a small DAG of LLM nodes — one flow is one request, up to five nodes — so a draft-then-critique-then-finalize pipeline routes each step to the right model under one policy and one trace. Multi-step does not mean multi-vendor glue.
Governance and audit. Because every routing decision is written down — candidates considered, why one won, what it cost, where it failed over — compliance stops being a retrofit. The trace is the audit. "Which model, why, what cost" has an answer for every request, replayable later.
Portability across providers. A policy expresses intent — "cheapest model that clears the bar" — not a vendor name. When a new model lands or a price drops, the same policy picks it up with no redeploy. The decision outlives any single provider's roster, which is the opposite of a hardcoded name.

Each of these is valuable alone. The point of the runtime policy layer is that they are the same primitive seen from different angles — a decision made from data, at request time, recorded as it happens.

The open-core bet: you pay us to keep it running

Here is how we think about open core, and it borrows directly from the model Vercel made obvious: you could run it yourself; you pay us to keep it running.

The decision language and its reference interpreter are open source. The encoding is open. You can read exactly how a policy is admitted and evaluated, and you can self-host the router against your own catalog. Nothing about the decision is a black box, and nothing locks your policies inside our walls — a policy_ir is just data you could hand to any conformant interpreter. That is the anti-lock-in half, and it is real. You can dig into it on the open source section.

What you pay for is the half that ages fastest: the maintained host. Provider modules that track every API change, live catalog and pricing data, key management and OAuth, failover and uptime, traces and history, support when a route misbehaves at 2 a.m. This is operational surface, not magic — the same thing teams happily pay for with a managed database or a managed deploy platform. Most engineering teams do not want to staff "keep the LLM provider map current forever" as an internal function. So the deal is plain: the language is yours; the upkeep is ours. That is maintenance, not lock-in.

Where this goes

Picture the steady state. Your backend generates a policy — or a workflow — and sends it with the call. The runtime policy layer evaluates it against the live market, routes to the cheapest model that meets your rules over your keys, fails over cleanly, and hands back both the completion and a trace you can replay. When the frontier moves, your policies pick up the change automatically. When finance asks where the spend went, the trace answers. When an auditor asks why a given request used a given model, the decision is on record. Nobody opens a source file to swap a model name, because the model name was never the decision — the policy was.

This is the natural endpoint of three trends that are not slowing down: models will keep churning, cost will keep mattering, and governance will keep getting stricter. Infrastructure resolves pressures like these by externalizing the volatile decision and recording it. That is what happened to configuration, to feature flags, to deploys. Model selection is next, and the runtime policy layer is the shape it takes. The pillar guide — policy-based LLM routing: the complete guide — maps the whole territory if you want the full picture. We are building the version we want to use, and you can see the product taking shape now. Stop hardcoding model decisions. Start sending them.

The runtime policy layer: where LLM infrastructure is going

Why a model name in your code is a liability

Why now: churn, cost, and governance

The runtime policy layer, defined

Provider aggregation is the moat

What compounds on top

The open-core bet: you pay us to keep it running

Where this goes

Frequently asked questions

Why a model name in your code is a liability

Why now: churn, cost, and governance

The runtime policy layer, defined

Provider aggregation is the moat

What compounds on top

The open-core bet: you pay us to keep it running

Where this goes

Frequently asked questions

Policy-based LLM routing: the complete guide

Stop hardcoding model decisions: the hidden cost (and the fix)

What is an LLM policy router?

What is a policy_ir? The anatomy of a routing decision