Different customers, different model rules.
In B2B SaaS, no two tenants want the same thing from your AI. Compile each customer's state into a policy your backend sends with the call — no dashboard, no redeploy. New here? unhardcoded is a runtime LLM policy layer — you send a policy with your OpenAI-compatible call, and it routes to the cheapest model that meets your rules, over your own provider keys.
One model rule can't serve every tenant.
The free tier needs to stay cheap. Enterprise wants approved providers only. An EU customer needs region-allowed models; a regulated tenant can't send data to certain providers and wants nothing logged; a premium account is paying for the best answer. Same product, different model decisions — and they change every time a customer upgrades, expands to a new region, or signs a DPA. Hardcode them and you get if tier == "enterprise" branches spread across product code and a stack of per-tenant gateway configs that drift out of sync — and you still can't easily prove, for a given request, which rule applied.
Compile tenant state into a runtime policy.
You already know who the customer is at request time — their plan, region, and data class live in your database. Read that state and build a policy_ir: a small JSON rule the router validates, fingerprints, and evaluates against the live catalog. It routes to the cheapest model that passes that tenant's rules — over your own provider keys. The decision lives in data your backend generates per request, not in a dashboard you click through or a deploy you wait on.
Read tenant state
Plan, region, data class, and any DPA flags — straight from your own records, at request time.
Build the policy
Compile that state into a policy_ir and send it with one OpenAI-compatible call. No per-tenant config to maintain.
Route & trace
The router runs filter → rank → select → fallback and writes a trace showing which tenant rule applied.
A tenant policy is the same primitive as any other — a rule sent with the call. See how runtime policies work end to end.
One field of tenant state, one rule.
Each customer attribute maps to a clause your backend adds to that request's policy. Model names and figures below are illustrative.
Free tier → cheaper floor
Relax the quality floor so the cheapest passing model can win — usually gemini-3.5-flash at $0.018 a run. Raise the same floor for a VIP account and only stronger survivors qualify, without pinning a model name.
Enterprise → approved providers
The account's contract names which providers are allowed, and the EU tenant adds region-allowed models. The filter drops everything off-list before ranking, so routing can only land on a model that's signed off.
provider ∈ approved · region = euSensitive data → no_log
For regulated workloads, filter out providers the tenant won't allow and require a non-logging model (["is","no_log"]), so request bodies aren't retained at the provider. The trace still records the decision.
Every tenant's rules — the floor, the allowed providers, the region, the no_log flag — are visible in that request's trace, by fingerprint 301140696-1054914287 (sigma-pol/v1). Preview any policy with POST /x/rank before it ships. See why every decision is auditable.
Give every customer their own model rules.
Compile tenant state into a policy your backend sends with the call — cheaper floors for free, approved providers for enterprise, region limits for the EU, and a trace that proves which rule applied. Join the waitlist to build it on your own keys.