A hardcoded model name is a decision you made once and can no longer see. It quietly costs you money (you run your most expensive model on traffic that didn't need it), agility (changing the model means a code change and a redeploy), and auditability (nothing records why that model was chosen for that call). The fix is to stop treating the model as a string literal and start treating it as data: send a policy with each request that states your requirements, and let a router pick the cheapest model that actually meets them. You can do this without rewriting your application.
If you have ever shipped model: to production and never touched it again, this post is about the bill — and the blind spot — that string is creating."gpt-5.5"
The hidden cost of a hardcoded model name
The cost isn't the line of code. The cost is what the line of code prevents. A hardcoded model name freezes a decision at deploy time and applies it to every request forever, regardless of how that request differs from the one you were looking at when you wrote it.
That is fine when every call is identical and the right model never changes. In production, neither is true. Your traffic is a mix of trivial classifications and genuinely hard reasoning. New models ship every few weeks. Providers have outages. Finance wants to know where the money went. A static string answers none of that — and because it answers silently, the gap between "the model you picked" and "the model this call needed" never shows up on a dashboard. It shows up on the invoice.
The core problem: a hardcoded model is an unconditional decision applied to conditional traffic. It can only ever be right by accident, and it is wrong without telling you.
The five quiet taxes you are already paying
Hardcoding doesn't fail loudly. It bleeds. Here are the five places it bleeds.
- You overpay for the top model on easy traffic. Most production requests — classification, extraction, short summaries, routine support replies — clear a quality floor that a fast, cheap model meets comfortably. If your code says "always use the flagship," you pay flagship prices on every one of them. The expensive model isn't wrong; it's just unnecessary for the majority of your volume, and you have no mechanism to tell the difference.
- Every model change is a redeploy. A better or cheaper model ships on Tuesday. To adopt it you edit a string, open a pull request, get a review, wait for CI, and ship a release — for what is, in effect, a configuration value. The friction means you don't do it, so you stay on last quarter's model long after a better option exists.
- There is no automatic failover. When your one hardcoded provider rate-limits or times out, the call fails. A static model name has no notion of "try the next acceptable candidate." You either build that retry logic by hand for every call site, or you take the outage.
- There is no per-decision audit. Six weeks from now, someone asks why a particular request used a particular model and cost what it cost. With a hardcoded name the only answer is "because the code said so on the day we deployed." There is no record of the requirements, the alternatives, or the reasoning — because there were never any requirements written down to begin with.
- You are locked in by inertia. Model names scattered across dozens of call sites become a migration project. The switching cost isn't technical difficulty; it's that the decision was never centralized, so changing it means finding and editing every place it was duplicated.
None of these is catastrophic on its own. Together they mean you are running more expensive, less reliable, and less explainable than you have to be — and you can't see any of it, because the decision that causes it is invisible by design.
The fix: send a policy, get the cheapest model that passes
The fix is to stop hardcoding the answer and start sending the question. Instead of naming a model, you describe what the call requires — capabilities, a quality floor, a price ceiling, a fallback order — and let a router resolve that description against the live catalog at request time. This is policy-based LLM routing, and it inverts the relationship between your code and the model.
With unhardcoded, the description is a policy_ir: a small JSON term your backend generates and sends with the call. The router admits it, evaluates it over the current set of models, and routes to the cheapest model that meets your rules — not the cheapest model overall. The order matters:
- Filter first. Candidates that lack a required capability, miss the quality floor, or exceed the price ceiling are eliminated. A failing model is removed, never silently substituted.
- Rank the survivors by cost. Among the models that cleared the floor, cheapest wins.
- Select the top. One model, chosen deterministically — the cheapest that passed.
- Fall back in order. If it times out or errors, the router moves to the next passing candidate. Every hop is recorded.
Because the policy travels in the request body, the same endpoint can carry a different policy on the very next call — nothing is pinned to a server config. And because it is admitted and hashed before it runs, every decision is written to a replayable, auditable trace: the rules that were sent, the candidates considered, why each was kept or dropped, the model selected, the fallback path, the latency, and the cost. If you want the deeper mechanics, see the anatomy of a policy_ir and what an LLM policy router actually is.
The floor is a guarantee, not a suggestion. The router optimizes cost beneath your floor, never around it. If no model meets the requirements, the request fails loudly — you never get a silent downgrade you didn't ask for.
What changes for engineering, platform, and finance
Moving the model decision out of a string literal and into a policy changes the day-to-day for three different people — and the trace is what serves all three from a single record.
- Engineering stops shipping releases to change a config value. Adopting a newly released model, tightening a quality floor, or adding a fallback becomes a change to the policy your backend generates — not a redeploy of every call site. When a route surprises you, the trace shows exactly which rule the chosen model passed and which path it came from, so debugging is reading a record instead of re-running the request.
- Platform gets an enforceable floor instead of a convention. "Don't use models below this quality bar" stops being a code-review comment people forget and becomes a filter the router enforces on every request. No call site can route around it, and the policy is visible in every trace, so you can verify the floor is actually holding in production rather than hoping it is.
- Finance gets attribution instead of a lump-sum bill. Each run records the model chosen and its cost, priced per run, against your own baseline. Spend stops being a single opaque provider invoice and becomes a per-decision record you can read, group, and explain. The cheapest-that-passes routing is what bends the curve; the trace is what proves it bent.
Crucially, none of this requires giving up your provider relationships. You bring your own provider keys and pay your providers directly — unhardcoded prices the routing per run, not the tokens, and never resells inference.
Before and after: one line of code
Here is the change at a single call site. The old baseline is one hardcoded string. The fix replaces it with a policy name and a policy_ir — and your messages, tools, and parameters pass through untouched.
Before — one model, applied to everything, forever:
const res = await client.chat.completions.create({
model: "gpt-5.5", // hardcoded; the old baseline, applied to every request
messages,
});
After — the requirements travel with the call; the router picks the cheapest model that meets them:
// generated in your backend, at request time
const policy_ir = ["policy",
["ev_zero"],
["and", ["meets_req"], ["not",["is","disabled"]],
["cmp", "bench_intelligence", "ge", 0.5],
["cmp", "price_out", "le", 5]], // floor
["neg", ["normalize", ["field", "price_out"]]], // cheapest that passes
["argmax"], ["id"],
["always", { action: "next_candidate" }]];
const res = await client.chat.completions.create({
model: "policy:support", // free-form trace label, not the route
policy_ir, // the decision, sent with the call
messages, // unchanged
});
The selected model on a representative run is gemini-3.5-flash — it clears the quality floor, supports tools, and sits under the price ceiling. If it times out, the router fails over to the next passing candidate and writes the hop to the trace. The struck gpt-5.5 above is only there to show what you are leaving behind; it never appears in the live decision.
That's the whole migration: change the model string, attach a policy, keep everything else. No app rewrite, no proxy to operate, no fallback retry logic hand-wired at every call site. For a step-by-step walkthrough, see the 5-minute quickstart. If you are weighing alternatives, the comparison with Portkey, LiteLLM, and OpenRouter covers where each puts the routing decision and what that means for your bill.
Bottom line: a hardcoded model is a silent, unconditional bet on one provider for all of your traffic. A policy is a written, conditional rule the router enforces and records on every call — cheaper, more reliable, and finally auditable, for the cost of changing one line.