use case · per-customer policies

Different customers, different model rules.

In B2B SaaS, no two tenants want the same thing from your AI. Compile each customer's state into a policy your backend sends with the call — no dashboard, no redeploy. New here? unhardcoded is a runtime LLM policy layer — you send a policy with your OpenAI-compatible call, and it routes to the cheapest model that meets your rules, over your own provider keys.

the problem

One model rule can't serve every tenant.

The free tier needs to stay cheap. Enterprise wants approved providers only. An EU customer needs region-allowed models; a regulated tenant can't send data to certain providers and wants nothing logged; a premium account is paying for the best answer. Same product, different model decisions — and they change every time a customer upgrades, expands to a new region, or signs a DPA. Hardcode them and you get if tier == "enterprise" branches spread across product code and a stack of per-tenant gateway configs that drift out of sync — and you still can't easily prove, for a given request, which rule applied.

the unhardcoded way

Compile tenant state into a runtime policy.

You already know who the customer is at request time — their plan, region, and data class live in your database. Read that state and build a policy_ir: a small JSON rule the router validates, fingerprints, and evaluates against the live catalog. It routes to the cheapest model that passes that tenant's rules — over your own provider keys. The decision lives in data your backend generates per request, not in a dashboard you click through or a deploy you wait on.

Read tenant state

Plan, region, data class, and any DPA flags — straight from your own records, at request time.

Build the policy

Compile that state into a policy_ir and send it with one OpenAI-compatible call. No per-tenant config to maintain.

Route & trace

The router runs filter → rank → select → fallback and writes a trace showing which tenant rule applied.

A tenant policy is the same primitive as any other — a rule sent with the call. See how runtime policies work end to end.

examples

One field of tenant state, one rule.

Each customer attribute maps to a clause your backend adds to that request's policy. Model names and figures below are illustrative.

Free tier → cheaper floor

Relax the quality floor so the cheapest passing model can win — usually gemini-3.5-flash at $0.018 a run. Raise the same floor for a VIP account and only stronger survivors qualify, without pinning a model name.

["cmp","bench_intelligence","ge",0.5]

Enterprise → approved providers

The account's contract names which providers are allowed, and the EU tenant adds region-allowed models. The filter drops everything off-list before ranking, so routing can only land on a model that's signed off.

provider ∈ approved · region = eu

Sensitive data → `no_log`

For regulated workloads, filter out providers the tenant won't allow and require a non-logging model (["is","no_log"]), so request bodies aren't retained at the provider. The trace still records the decision.

provider ∉ blocked · is no_log

Every tenant's rules — the floor, the allowed providers, the region, the no_log flag — are visible in that request's trace, by fingerprint 301140696-1054914287 (sigma-pol/v1). Preview any policy with POST /x/rank before it ships. See why every decision is auditable.

Read how the trace proves the rule that applied

Give every customer their own model rules.

Compile tenant state into a policy your backend sends with the call — cheaper floors for free, approved providers for enterprise, region limits for the EU, and a trace that proves which rule applied. Join the waitlist to build it on your own keys.

Read the docs

No SDK rewriteYour provider keysEvery request traced