╔═══════════════════════════════════════════════════════════════════╗ ║ RUNGATE v0.1.0 — Governance Control Plane for AI Agents ║ ║ ║ ║ Base URL: https://api.rungate.dev ║ ║ Spec: /openapi.json · full machine-readable surface ║ ║ Machine: /llms-full.txt · single-file context dump ║ ║ License: Apache 2.0 · source via early-access invite ║ ║ Status: preview · production-deployed · self-host supported ║ ╚═══════════════════════════════════════════════════════════════════╝
1. What Rungate Is
Rungate is a governance proxy that sits between AI agents and LLM provider APIs (OpenAI, Anthropic). Agents point their existing provider SDK at Rungate instead of the provider directly. Rungate forwards the request with the original request/response format preserved and enforces governance — budgets, policy rules, approval gates, rate limits — at the run level rather than the individual request level.
A run is the complete unit of agent work: first
model call, every tool invocation, every retry, every approval
gate, the final output. Rungate groups related calls into a run via
the x-rungate-run-id header. Budgets, policies, and
audit trails apply to the run as a whole.
Rungate is in-series. Every call passes through. That is what makes enforcement possible (rather than advisory).
2. Authentication
Agent traffic uses bearer tokens of the form rg_agt_*.
These are issued per-agent by a human operator in the Rungate
dashboard or via the admin-token API. Include the token on every
call to the proxy.
Authorization: Bearer rg_agt_your_agent_token_here
Tokens carry the agent’s default policy binding. A per-run
policy override is possible via the x-rungate-policy
header (see §6). Tokens are hashed in storage; the raw value is
displayed once at issue time and cannot be retrieved afterward.
Token types at a glance:
rg_agt_* Agent token. Use this on the proxy (/v1/*).
rg_adm_* Admin token. Use this on the admin API (/api/*). 3. Endpoint Directory
The API is split into two surfaces: the proxy
(/v1) that your agent calls, and the admin
(/api) that the operator’s tools call. Agents
only need the /v1 subset. The full spec lives at
/openapi.json.
PROXY · what your agent calls
POST /v1/chat/completions OpenAI-format proxy
POST /v1/messages Anthropic-format proxy
GET /v1/me Current agent identity + policy
GET /v1/runs/mine Your recent runs
GET /v1/runs/current In-flight run (if any)
POST /v1/runs/current/complete Close the current run
GET /v1/runs/{id} Get a run by id
GET /v1/runs/{id}/steps Step-by-step reconstruction
POST /v1/runs/{id}/complete Close a specific run
POST /v1/runs/complete-all Bulk-close all your open runs
ADMIN · what the operator’s tools call
GET /api/agents List agents
POST /api/agents Create agent + issue token
GET /api/policies List policies
GET /api/runs Admin-scope run search
GET /api/approval-requests Pending approval gates
POST /api/approval-requests/{id}/approve
POST /api/approval-requests/{id}/reject
GET /api/webhooks Webhook subscriptions
GET /api/analytics/costs Cost analytics
... (full list in /openapi.json) 4. Proxy: OpenAI-compatible
Rungate accepts the full OpenAI Chat Completions request format
unchanged. Point the OpenAI client’s
base_url at Rungate and it works.
from openai import OpenAI
client = OpenAI(
base_url="https://api.rungate.dev/v1",
api_key="rg_agt_your_agent_token_here",
)
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "..."}],
extra_headers={
"x-rungate-run-id": "run_customer_refund_2025_01_17",
"x-rungate-policy": "prod-agents/v4",
},
)
Streaming (stream=True) is supported — the response is
a normal SSE stream. Failover across providers is transparent: if
the primary provider errors or trips a circuit breaker, Rungate
re-emits the request to a configured fallback and the agent sees
the response it expected.
5. Proxy: Anthropic-compatible
Rungate also accepts Anthropic’s native Messages format. Use the standard Anthropic SDK with a custom base URL.
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic({
baseURL: 'https://api.rungate.dev/v1',
apiKey: 'rg_agt_your_agent_token_here',
});
const response = await client.messages.create({
model: 'claude-sonnet-4',
max_tokens: 1024,
messages: [{ role: 'user', content: '...' }],
}, {
headers: {
'x-rungate-run-id': 'run_customer_refund_2025_01_17',
'x-rungate-policy': 'prod-agents/v4',
},
}); 6. Run Grouping
Rungate’s governance primitive is the run, not the
individual call. You tell Rungate what run a call belongs to via
one of three headers (or the equivalent body field under a
rungate key — SDKs that strip unknown headers may
prefer the body form).
| Header | Body field | Purpose |
|---|---|---|
x-rungate-run-id | rungate.run_id | Stable ID grouping calls into one run. Missing = auto-grouped by idle timeout. |
x-rungate-new-run | rungate.new_run | Boolean. Force a new run to start on this call, even if a run is in-flight. |
x-rungate-policy | rungate.policy | Override the agent’s default policy for this run (policy-id or name). |
x-rungate-user | rungate.user | End-user identifier for attribution + analytics filtering. |
x-rungate-tags | rungate.tags | Comma-separated tags. Surface in analytics + webhook payloads. |
When you send x-rungate-run-id, Rungate either:
- Creates a new run with that ID if it doesn’t exist yet.
- Appends the call to the existing run if it’s still open.
- Returns
409 Conflictif the run is already closed.
A run auto-closes after the idle timeout (default 15 minutes since
last call). Or close it explicitly with
POST /v1/runs/{id}/complete. Closing is idempotent.
Long-horizon workflows (overnight pipelines, multi-day
research runs): the idle timeout is a fallback, not a hard
cap. As long as the agent keeps sending calls with the same
x-rungate-run-id, the run stays open indefinitely.
Budgets and policies span the entire life of the run. The
auto-close only catches runs where the agent crashed or forgot
to close.
7. Response Codes
Standard HTTP semantics plus two governance-specific codes
(202 and 402). Every governance
response includes a machine-parseable context object
so the agent can recover without re-reading docs.
| Code | Meaning | Agent action |
|---|---|---|
200 | Success. Response is the provider’s native format. | Proceed. |
202 | Approval gate — paused for human review (§9). | Store gate_id, retry the same request. |
400 | Malformed request. | Fix input; do not retry. |
401 | Token missing or invalid. | Rotate token; do not retry with same one. |
402 | Budget exceeded (§8). | Stop this run. Human resets cap or starts a new run. |
403 | Policy violation (§10). | Different model/tool, or escalate to operator. |
404 | Run/resource not found. | Check ID; surface to operator if persistent. |
409 | Run already closed. | Start a new run with x-rungate-new-run: true. |
422 | Semantically invalid (bad enum, schema mismatch). | Fix input; do not retry. |
429 | Rate limit (§11). | Honor Retry-After; back off. |
5xx | Rungate or upstream provider error. | Exponential backoff up to the agent’s retry budget. |
8. Budget Enforcement (402)
When cumulative run spend crosses the configured ceiling, the next
call returns 402 Payment Required. In-flight calls
complete normally; no new calls are dispatched on this run.
HTTP/1.1 402 Payment Required
Content-Type: application/json
{
"error": {
"code": "budget_exceeded",
"message": "Run budget ceiling reached.",
"context": {
"run_id": "run_customer_refund_2025_01_17",
"cumulative_spend_usd": "1.03",
"limit_usd": "1.00",
"rule": "stop_on_budget",
"policy_id": "pol_prod_v4",
"policy_name": "prod-agents/v4",
"step_that_tripped": "llm.claude-sonnet-4"
}
}
}
The agent should stop issuing calls on this run. A human
(or automation) resets the cap via the admin API or starts a new
run with x-rungate-new-run: true.
9. Approval Gates (202)
When a tool call matches a policy rule requiring human review,
Rungate returns 202 Accepted and pauses the run.
HTTP/1.1 202 Accepted
Content-Type: application/json
Retry-After: 5
{
"status": "awaiting_approval",
"context": {
"gate_id": "gate_8xk2m",
"run_id": "run_customer_refund_2025_01_17",
"rule": "refund:over-$500",
"proposed_action": {
"tool": "issue_refund",
"args": { "order": "ord_2H4p", "amount_usd": 1240.00 }
},
"approver_channel": "slack://#customer-ops",
"expires_at": "2025-01-17T15:48:12Z"
}
} Retry the same request after
Retry-After seconds (or on webhook signal — see §13).
When approved, the retry succeeds and the run continues as if the
gate had never been there. When rejected, the retry returns
403 with rule context.
Approval resolution is idempotent from the agent’s perspective: send the same request, same headers. Do not re-engineer a new request on retry.
10. Policy Violations (403)
Policy rules are declarative: allowed models, tool blocklists,
rate limits, routing preferences. When a call violates a rule,
Rungate returns 403 with the exact rule and field.
HTTP/1.1 403 Forbidden
Content-Type: application/json
{
"error": {
"code": "policy_violation",
"message": "Model not in policy allowlist.",
"context": {
"policy_id": "pol_prod_v4",
"policy_name": "prod-agents/v4",
"rule": "allowed_models",
"field": "model",
"requested": "gpt-3.5-turbo",
"allowed": ["openai/gpt-4", "anthropic/claude-sonnet-4", "anthropic/claude-opus-4"]
}
}
} Not retryable without a policy change. The agent should select a different model/tool from the allowed set, or escalate to the operator.
11. Rate Limits (429)
Per-agent and per-policy rate limits return
429 with a Retry-After header.
HTTP/1.1 429 Too Many Requests
Retry-After: 12
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1737126712 Rate limits catch runaway loops: an agent spinning on the same tool call 200 times per minute is almost always a bug. The rate limit is a coarse safety net — the budget enforcement (§8) is the hard cost ceiling.
12. Runs API
Once a run exists, your agent can inspect it without admin
privileges through the /v1/runs endpoints.
GET /v1/runs/mine?limit=20 Your recent runs (token-scoped)
GET /v1/runs/current The run in progress, if any
GET /v1/runs/{id} Full run object
GET /v1/runs/{id}/steps Step-by-step reconstruction
POST /v1/runs/{id}/complete Close the run (idempotent)
POST /v1/runs/current/complete Close whatever run is current
POST /v1/runs/complete-all Bulk-close (recovery utility)
The run object exposes cumulative cost, step count, status
(running, completed, stopped,
blocked), and a timeline of steps. Steps include model
calls, tool calls, approvals, and governance events (alerts,
blocks).
13. Webhook Events
Rungate posts webhook events for state changes the operator (or your agent) wants to act on. Subscriptions are created in the admin API; agents consume payloads at an HTTPS endpoint they own.
| Event | Fires when |
|---|---|
run.started | First call in a run. |
run.completed | Run closed normally. |
run.blocked | Run stopped by budget or policy. |
approval.pending | Gate requires human review. |
approval.approved | Gate resolved positively. |
approval.rejected | Gate resolved negatively. |
budget.alert | Run crossed the 80% alert threshold. |
budget.exceeded | Run exceeded the hard cap. |
policy.drift | Agent attempted an action outside policy allowlist. |
Payload shape is consistent across events:
{
"event": "approval.pending",
"delivered_at": "2025-01-17T14:12:20Z",
"data": {
"gate_id": "gate_8xk2m",
"run_id": "run_customer_refund_2025_01_17",
"agent_id": "agt_claude_code_prod",
"rule": "refund:over-$500",
"proposed_action": { "tool": "issue_refund", "args": { "amount_usd": 1240.00 } },
"expires_at": "2025-01-17T15:48:12Z"
}
} 14. Webhook Signature Verification
Every webhook delivery carries an HMAC-SHA256 signature over the raw request body. Verify before trusting the payload.
X-Rungate-Timestamp: 1737126732
X-Rungate-Signature: v1=1e4b0f... # current secret
X-Rungate-Signature: v1=9c3a87... # previous secret (grace window) Rungate sends up to two signatures during a rotation grace window so you can roll secrets without downtime. Accept either.
import hmac, hashlib
def verify(body: bytes, timestamp: str, signatures: list[str], secret: str) -> bool:
signed = timestamp.encode() + b"." + body
expected = "v1=" + hmac.new(secret.encode(), signed, hashlib.sha256).hexdigest()
return any(hmac.compare_digest(expected, sig) for sig in signatures) Also reject stale timestamps (> 5 minutes old) to prevent replay attacks.
15. Tracing (OTLP)
Rungate emits OpenTelemetry spans for every run, model call, tool call, and governance event. Point an OTLP collector at the configured endpoint; spans arrive with the run ID as the root.
Span naming convention:
run.init Root span
llm.{provider}/{model} e.g. llm.openai/gpt-4
tool.{name} e.g. tool.issue_refund
approval.{rule} Fires on gate creation
policy.{rule} Fires on policy decision
budget.{event} Fires on alert/exceeded Each span carries attributes: cost in USD, token counts (input, output, cache_read, cache_creation), provider, model, latency. Run-level cost attribution is the sum of the run’s span costs.
16. Providers
Rungate currently supports OpenAI and Anthropic as first-class providers. Both have adapter implementations that handle format translation, failover, streaming, and cost calculation (including the four-bucket token model: input, output, cache_read, cache_creation).
Cross-provider failover is transparent: the request format the agent sent is preserved in the response, even if Rungate had to re-emit the call to a fallback provider in a different native format. Load balancing, circuit breakers, and latency-based routing are all configurable at the policy level.
Google Gemini is planned (the latent long-context pricing tier infrastructure already ships).
17. Self-Host
The full server is Apache 2.0 and self-hostable. One server process, one SQLite file, optional Redis for rate limiting at scale.
git clone <rungate-repo-url> # repo URL provided after early-access onboarding
cd rungate
npm install
npm run build
npm run migrate # creates/updates local SQLite file
npm run start # listens on :3000 by default
# Create a bootstrap admin token:
node dist/server/cli/index.js tokens create --name bootstrap
# Create your first agent + get an rg_agt_* token:
node dist/server/cli/index.js agents create --name my-first-agent
Point your agent at http://localhost:3000/v1 with
the agent token. Everything in this document works locally.
18. Pricing
Self-host is free (Apache 2.0, no usage limits). The managed cloud is tier-based; a step is one proxied call or tool invocation.
| Tier | Price | Steps/mo | Retention | Notes |
|---|---|---|---|---|
| Free | $0 | 500 | 30 days | Solo, all governance features. |
| Pro | $49 | 10,000 | 90 days | All alert types, Slack integration. |
| Growth | $299 | 100,000 | 1 year | Org membership, RBAC, SSO-ready. |
| Enterprise | Contact | Custom | Custom | SLA, SSO, custom contract, dedicated support. |
19. Security Summary
See /security for the full claims-backed page. Short form:
- Agent tokens hashed in storage (SHA-256). Raw value shown once.
- Provider API keys encrypted at rest with AES-256-GCM.
- Webhook signatures HMAC-SHA256 with a rotation grace window.
- TLS 1.2+ required. HSTS, CSP, X-Frame-Options DENY on all HTML.
- Data residency: US (Railway,
us-east-1). EU residency option on Enterprise. - No paid audit. No SOC 2. Honest about that on the security page.
- Responsible disclosure:
[email protected], 72h ack.
20. Support + Links
- Docs (full): rungate.dev/docs
- OpenAPI 3.1: /openapi.json
- Machine-readable context: /llms-full.txt
- Status: /status
- Security contact:
[email protected] - General support:
[email protected]