open source · MIT · runs on your keys

Stop hardcoding which model to call.

Two ideas. A policy decides which model handles one call. A workflow wires those decisions into a pipeline. Open source, MIT, and it runs on your own provider keys.

Get started Read the docs Host repo Engine repo

policies · the unit

A policy picks the model, not you.

Hardcode model="gpt-5.5" and it rots. New models ship weekly, prices move, and some requests need tools, vision, or just something cheaper. A policy declares what qualifies and how to rank it, picks the right model per request, and falls back on failure. You send the rule, not a name.

Every decision leaves a replayable trace: the same policy, inputs, and catalog reproduce it exactly, on the host or your own machine. Nothing routes in the dark.

compose routed calls

A workflow routes the whole job.

Real features aren’t one call. A support reply is triage → draft → guard: each step wants a different model, a guard can refuse before anything ships, and steps fan out and back in. One policy routes a call; a workflow routes the pipeline, and writes one stitched trace.

Triage cheap, draft to a quality floor, then a strong no-log guard that can refuse before anything ships.

Show flow_ir

flow.support-ticket.json

["flow", {
  "u": {"kind": "input"},
  "t": {"kind": "llm", "system": "Classify the ticket and extract the account id as JSON.",
    "policy": ["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]], ["has_cap", "supports_json_mode"]],
      ["neg", ["normalize", ["field", "price_out"]]], ["argmax"], ["id"], ["always", {"action": "next_candidate"}]],
    "inputs": ["u"]},
  "d": {"kind": "llm", "system": "Write a reply using the ticket and the triage.",
    "policy": ["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]], ["cmp", "bench_intelligence", "ge", 0.55]],
      ["neg", ["normalize", ["field", "price_out"]]], ["argmax"], ["id"], ["always", {"action": "next_candidate"}]],
    "inputs": ["u", "t"], "template": "Ticket:\n$1\n\nTriage:\n$2"},
  "g": {"kind": "llm", "system": "Check brand voice, PII, refund limits. Refuse if any fail.",
    "policy": ["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]], ["is", "no_log"]],
      ["field", "bench_intelligence"], ["argmax"], ["id"], ["always", {"action": "next_candidate"}]],
    "inputs": ["d"]},
  "out": {"kind": "output", "inputs": ["g"]}
}]

A cheap first draft, a strong critic, then a rewrite. A quality jump on a budget.

Show flow_ir

flow.draft-critique-revise.json

["flow", {
  "u": {"kind": "input"},
  "d": {"kind": "llm", "system": "Draft an answer.",
    "policy": ["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]]],
      ["neg", ["normalize", ["field", "price_out"]]], ["argmax"], ["id"], ["always", {"action": "next_candidate"}]],
    "inputs": ["u"]},
  "c": {"kind": "llm", "system": "Critique the draft: list concrete flaws and gaps.",
    "policy": ["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]]],
      ["field", "bench_intelligence"], ["argmax"], ["id"], ["always", {"action": "next_candidate"}]],
    "inputs": ["d"]},
  "r": {"kind": "llm", "system": "Rewrite the answer, fixing every point in the critique.",
    "policy": ["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]]],
      ["field", "bench_intelligence"], ["argmax"], ["id"], ["always", {"action": "next_candidate"}]],
    "inputs": ["u", "d", "c"], "template": "Question:\n$1\n\nDraft:\n$2\n\nCritique:\n$3"},
  "out": {"kind": "output", "inputs": ["r"]}
}]

N seeded draws from the same strong policy (sample spreads them across the top of the ranking), then a judge picks the best.

Show flow_ir

flow.best-of-n.json

["flow", {
  "u": {"kind": "input"},
  "n1": {"kind": "llm", "system": "Answer the question.",
    "policy": ["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]]],
      ["field", "bench_intelligence"], ["sample", 0.5], ["id"], ["always", {"action": "next_candidate"}]],
    "inputs": ["u"]},
  "n2": {"kind": "llm", "system": "Answer the question.",
    "policy": ["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]]],
      ["field", "bench_intelligence"], ["sample", 0.5], ["id"], ["always", {"action": "next_candidate"}]],
    "inputs": ["u"]},
  "n3": {"kind": "llm", "system": "Answer the question.",
    "policy": ["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]]],
      ["field", "bench_intelligence"], ["sample", 0.5], ["id"], ["always", {"action": "next_candidate"}]],
    "inputs": ["u"]},
  "j": {"kind": "llm", "system": "Pick the single best candidate; return it verbatim.",
    "policy": ["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]]],
      ["field", "bench_intelligence"], ["argmax"], ["id"], ["always", {"action": "next_candidate"}]],
    "inputs": ["n1", "n2", "n3"], "template": "A:\n$1\n\nB:\n$2\n\nC:\n$3"},
  "out": {"kind": "output", "inputs": ["j"]}
}]

Three models pinned by family draft in parallel; a fourth synthesizes the best single answer.

Show flow_ir

flow.panel.json

["flow", {
  "u": {"kind": "input"},
  "a": {"kind": "llm", "system": "Draft an answer.",
    "policy": ["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]], ["family_eq", "gemini-3.1-pro-preview"]],
      ["zero"], ["argmax"], ["id"], ["always", {"action": "next_candidate"}]], "inputs": ["u"]},
  "b": {"kind": "llm", "system": "Draft an answer.",
    "policy": ["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]], ["family_eq", "claude-opus-4-8"]],
      ["zero"], ["argmax"], ["id"], ["always", {"action": "next_candidate"}]], "inputs": ["u"]},
  "c": {"kind": "llm", "system": "Draft an answer.",
    "policy": ["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]], ["family_eq", "deepseek-v4-flash"]],
      ["zero"], ["argmax"], ["id"], ["always", {"action": "next_candidate"}]], "inputs": ["u"]},
  "f": {"kind": "llm", "system": "Synthesize the single best answer from the drafts.",
    "policy": ["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]]],
      ["add", ["scale", 0.7, ["normalize", ["field", "bench_intelligence"]]],
             ["scale", 0.3, ["neg", ["normalize", ["field", "price_in"]]]]],
      ["argmax"], ["id"], ["always", {"action": "next_candidate"}]],
    "inputs": ["a", "b", "c"]},
  "out": {"kind": "output", "inputs": ["f"]}
}]

A reasoner and a coder run in parallel under different filters, then a merge step fuses them, for mixed reasoning-and-code tasks.

Show flow_ir

flow.specialist-split.json

["flow", {
  "u": {"kind": "input"},
  "rz": {"kind": "llm", "system": "Reason through the problem step by step.",
    "policy": ["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]], ["is", "cap_reasoning"]],
      ["field", "bench_intelligence"], ["argmax"], ["id"], ["always", {"action": "next_candidate"}]], "inputs": ["u"]},
  "cd": {"kind": "llm", "system": "Produce any code the problem needs.",
    "policy": ["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]], ["cmp", "bench_coding_rank", "le", 5]],
      ["add", ["scale", 0.7, ["normalize", ["field", "bench_coding"]]],
             ["scale", 0.3, ["neg", ["normalize", ["field", "price_out"]]]]],
      ["argmax"], ["id"], ["always", {"action": "next_candidate"}]], "inputs": ["u"]},
  "m": {"kind": "llm", "system": "Merge the reasoning and the code into one answer.",
    "policy": ["policy", ["and", ["meets_req"], ["not", ["is", "disabled"]]],
      ["add", ["scale", 0.6, ["normalize", ["field", "bench_intelligence"]]],
             ["scale", 0.4, ["neg", ["normalize", ["field", "price_out"]]]]],
      ["argmax"], ["id"], ["always", {"action": "next_candidate"}]],
    "inputs": ["rz", "cd"], "template": "Reasoning:\n$1\n\nCode:\n$2"},
  "out": {"kind": "output", "inputs": ["m"]}
}]

scroll the diagram sideways →

See every workflow pattern in the docs →

start

Two repos, MIT, ready now.

Clone the host and the engine, point them at your provider accounts, and route on your own keys today. No markup, no hidden routing.

unhardcoded

The reference host that admits and fingerprints the policy, routes the call over your provider keys, and writes the trace.

View repo →Quickstart →

unhardcoded-engine

The policy and flow IR, the reference interpreter, and the conformance vectors, the spec other implementations check themselves against.

View repo →policy_ir →

MIT licensed provider-neutral portable IR self-hostable

Prefer not to run the host yourself? A hosted version (managed catalogs, trace storage, team controls) is coming.

No spam. One email when the hosted version is ready.