Rungate vs LiteLLM.
LiteLLM is a widely-adopted provider-normalization layer — 100+ LLMs behind an OpenAI-compatible interface, usable as a Python SDK or a proxy server. Rungate is a run-level governance control plane. They overlap at the proxy surface but model the problem at different layers. Both are MIT/Apache 2.0 open source.
- Breadth: LiteLLM wins by a lot. 100+ providers including Bedrock, SageMaker, Vertex, self-hosted Ollama/VLLM, specialist endpoints (embeddings, TTS, rerank). Rungate ships OpenAI + Anthropic today, Gemini planned.
- Budgets: LiteLLM caps per virtual key (returns HTTP 401 when exceeded). Rungate caps per run (returns HTTP 402 Payment Required with full context). Different layer, different semantics.
- Approval gates: Rungate has HTTP 202 pause/resume on policy-gated tool calls. LiteLLM has MCP
require_approvalconfig but no documented pause-retry flow. - Run-level state: Not present in LiteLLM. Budgets, policies, and audit attach to keys/teams/users. No concept of a multi-call workflow as a governed unit.
- Deployment: LiteLLM offers two modes (SDK + proxy). Rungate is proxy-only. For script-heavy Python workflows, LiteLLM’s SDK mode is a real convenience.
Capability comparison
Built from the LiteLLM GitHub README and docs.litellm.ai as of 2026-04-20. Corrections welcome at [email protected].
| Capability | Rungate | LiteLLM |
|---|---|---|
| Deployment modes | Proxy (HTTP) | SDK (Python) + Proxy (HTTP) |
| Governed primitive | Run (workflow) | Virtual key |
| Cross-call budget | Yes — spans all calls in a run | Per key; not per workflow |
| Budget-exceeded HTTP code | 402 Payment Required (semantic) | 401 Authentication Error |
| Budget-exceeded context in response | Rich context object: spend, limit, rule, run_id, steps | Error message with spend + budget |
| Human approval gate | HTTP 202 pause/resume on policy match | MCP require_approval flag; no HTTP 202 semantics |
| Per-request policy override | x-rungate-policy header | Config on the key |
| Supported providers | OpenAI, Anthropic (Gemini planned) | 100+ across all major families |
| Self-hosted model support | — | VLLM, Ollama, LM Studio, Llamafile |
| Specialist endpoints | — | Embeddings, TTS, transcription, rerank |
| Observability callbacks | Native + OTel export | Lunary, MLflow, Langfuse, many others |
| Full-run audit reconstruction | Native — every call, tool, gate, approver, cost | Per-request logs |
| Latency at scale | Production-deployed; benchmarks pending | 8ms p95 at 1k RPS |
| Teams + virtual-key hierarchy | Org → member → agent tokens | Team → key → user |
| License | Apache 2.0 | MIT |
| Managed cloud | Rungate cloud (Free → Enterprise) | LiteLLM Cloud |
When LiteLLM is the right choice
You need maximum provider coverage.
100+ LLMs across OpenAI, Anthropic, Google, AWS (Bedrock, SageMaker), Azure, Cohere, self-hosted, and specialist endpoints. If any of your workflow touches an unusual provider, LiteLLM probably already has an adapter.
You want the SDK path, not a proxy.
LiteLLM can be a Python library you import — no proxy to run. For batch scripts, notebooks, or one-off pipelines where an extra hop is overhead, this is a real convenience. Rungate is proxy-only.
You need specialist endpoints (embeddings, TTS, rerank).
LiteLLM normalizes /embeddings, /audio/transcriptions, /audio/speech, /rerank. If your pipeline mixes chat with embeddings or transcription across providers, this is the right tool.
You already have an observability stack.
LiteLLM callback integrations for Lunary, MLflow, Langfuse, and others mean you can keep your existing dashboards. Rungate exports OpenTelemetry and emits its own webhooks, but the off-the-shelf list is narrower.
When Rungate is the right choice
Your budget needs to span a workflow, not a key.
“This customer-support run can’t spend more than $2 across every LLM call + tool invocation” is native in Rungate. In LiteLLM you can cap a virtual key, but once the workflow ends and the next one starts, the old spend is still on the key until you reset it.
You need HTTP 402 + rich error context.
When Rungate blocks a call, you get HTTP 402 Payment Required with a context object an agent can parse: cumulative spend, limit, run ID, triggering rule. LiteLLM returns HTTP 401 and a string error message — workable, but semantically it conflates “bad credentials” with “over budget,” which complicates agent error-handling code.
You need human-in-the-loop approval on specific tool calls.
Rungate’s policy engine can mark a tool as “requires approval” and pause the run via HTTP 202 until a human approves in Slack/email/dashboard. The agent retries the same call and it goes through. LiteLLM’s MCP require_approval flag doesn’t model the pause-retry flow — it’s a tool-side config, not a runtime gate with resume semantics.
Compliance asks “what did the agent do?” and wants one coherent answer.
Rungate reconstructs a run as one audit trail: every call, tool, retry, approver, cost — in order, correlated, exportable. LiteLLM’s logs give you rows that you’d join yourself on key and timestamp.
The architectural difference, in code
Same scenario — an agent doing multi-step work against a $1 budget cap — expressed in each.
# LiteLLM — virtual key with per-key budget.
# The key is the unit of spend tracking + isolation.
# The proxy blocks requests synchronously when the key's cap is hit.
import openai
client = openai.OpenAI(
base_url="https://your-litellm-proxy/v1",
api_key="sk-your-virtual-key-with-1usd-cap", # budget on the key
)
# All calls in this workflow use the same virtual key.
# LiteLLM accrues cost-per-token into the key's running total.
# When the key hits max_budget, the NEXT call fails.
r1 = client.chat.completions.create(...) # ok
# ... workflow calls ...
# HTTP 401 "Authentication Error, ExceededTokenBudget: ..."
rN = client.chat.completions.create(...)
# Note: LiteLLM returns 401 for budget-exceeded, which is
# semantically an auth failure. HTTP 402 Payment Required would
# be more precise for spending-cap violations. Why the status code matters: an agent looking at HTTP 401 is supposed to retry with different credentials. An agent looking at HTTP 402 is supposed to stop dispatching and escalate. LiteLLM’s choice of 401 for budget-exceeded conflates these, which forces agent error handling to inspect the error body instead of the status. Small thing, real friction.
Transparency: HTTP 402 is IANA-reserved “for future use.” We picked it because the emerging de facto use across the industry — Stripe, GitHub, and a growing list of SaaS APIs — is billing/quota exhaustion, which is exactly what a budget cap is. 402 isn’t a ratified standard here, but it’s a better semantic fit than 401.
Can you use both?
Yes. A common shape: agent → Rungate (run-level governance) → LiteLLM (provider breadth + routing) → provider. You get Rungate’s run-level enforcement and LiteLLM’s 100+ LLMs. The cost is one extra hop. If you genuinely need both — wide provider fabric and run-level governance — this is a legitimate architecture.
Try Rungate
Apache 2.0 self-hosted today. Point your existing OpenAI or Anthropic SDK at https://api.rungate.dev/v1 with an rg_agt_* token. See the agent-native reference for code in Python, TypeScript, and curl.