Skip to main content
╔═══════════════════════════════════════════════════════════════════╗
║ RUNGATE v0.1.0 — Governance Control Plane for AI Agents           ║
║                                                                   ║
║ Base URL:   https://api.rungate.dev                               ║
║ Spec:       /openapi.json      · full machine-readable surface    ║
║ Machine:    /llms-full.txt     · single-file context dump         ║
║ License:    Apache 2.0         · source via early-access invite   ║
║ Status:     preview · production-deployed · self-host supported   ║
╚═══════════════════════════════════════════════════════════════════╝
/openapi.json /llms-full.txt Full docs
IMPORTANT — READ TOP TO BOTTOM. This page is the complete integration surface. If you skip sections, you will miss load-bearing behavior (run grouping, 202 semantics, approval-gate retries). Most Rungate integrations succeed on the first try when this page is read in order; most failures come from assuming a section doesn’t apply.

1. What Rungate Is

Rungate is a governance proxy that sits between AI agents and LLM provider APIs (OpenAI, Anthropic). Agents point their existing provider SDK at Rungate instead of the provider directly. Rungate forwards the request with the original request/response format preserved and enforces governance — budgets, policy rules, approval gates, rate limits — at the run level rather than the individual request level.

A run is the complete unit of agent work: first model call, every tool invocation, every retry, every approval gate, the final output. Rungate groups related calls into a run via the x-rungate-run-id header. Budgets, policies, and audit trails apply to the run as a whole.

Rungate is in-series. Every call passes through. That is what makes enforcement possible (rather than advisory).

2. Authentication

Agent traffic uses bearer tokens of the form rg_agt_*. These are issued per-agent by a human operator in the Rungate dashboard or via the admin-token API. Include the token on every call to the proxy.

Authorization: Bearer rg_agt_your_agent_token_here

Tokens carry the agent’s default policy binding. A per-run policy override is possible via the x-rungate-policy header (see §6). Tokens are hashed in storage; the raw value is displayed once at issue time and cannot be retrieved afterward.

Token types at a glance:

rg_agt_*    Agent token. Use this on the proxy (/v1/*).
rg_adm_*    Admin token. Use this on the admin API (/api/*).

3. Endpoint Directory

The API is split into two surfaces: the proxy (/v1) that your agent calls, and the admin (/api) that the operator’s tools call. Agents only need the /v1 subset. The full spec lives at /openapi.json.

PROXY · what your agent calls
POST   /v1/chat/completions          OpenAI-format proxy
POST   /v1/messages                  Anthropic-format proxy
GET    /v1/me                        Current agent identity + policy
GET    /v1/runs/mine                 Your recent runs
GET    /v1/runs/current              In-flight run (if any)
POST   /v1/runs/current/complete     Close the current run
GET    /v1/runs/{id}                   Get a run by id
GET    /v1/runs/{id}/steps             Step-by-step reconstruction
POST   /v1/runs/{id}/complete          Close a specific run
POST   /v1/runs/complete-all         Bulk-close all your open runs

ADMIN · what the operator’s tools call
GET    /api/agents                   List agents
POST   /api/agents                   Create agent + issue token
GET    /api/policies                 List policies
GET    /api/runs                     Admin-scope run search
GET    /api/approval-requests        Pending approval gates
POST   /api/approval-requests/{id}/approve
POST   /api/approval-requests/{id}/reject
GET    /api/webhooks                 Webhook subscriptions
GET    /api/analytics/costs          Cost analytics
...                                  (full list in /openapi.json)

4. Proxy: OpenAI-compatible

Rungate accepts the full OpenAI Chat Completions request format unchanged. Point the OpenAI client’s base_url at Rungate and it works.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.rungate.dev/v1",
    api_key="rg_agt_your_agent_token_here",
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "..."}],
    extra_headers={
        "x-rungate-run-id": "run_customer_refund_2025_01_17",
        "x-rungate-policy": "prod-agents/v4",
    },
)

Streaming (stream=True) is supported — the response is a normal SSE stream. Failover across providers is transparent: if the primary provider errors or trips a circuit breaker, Rungate re-emits the request to a configured fallback and the agent sees the response it expected.

5. Proxy: Anthropic-compatible

Rungate also accepts Anthropic’s native Messages format. Use the standard Anthropic SDK with a custom base URL.

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  baseURL: 'https://api.rungate.dev/v1',
  apiKey: 'rg_agt_your_agent_token_here',
});

const response = await client.messages.create({
  model: 'claude-sonnet-4',
  max_tokens: 1024,
  messages: [{ role: 'user', content: '...' }],
}, {
  headers: {
    'x-rungate-run-id': 'run_customer_refund_2025_01_17',
    'x-rungate-policy': 'prod-agents/v4',
  },
});

6. Run Grouping

Rungate’s governance primitive is the run, not the individual call. You tell Rungate what run a call belongs to via one of three headers (or the equivalent body field under a rungate key — SDKs that strip unknown headers may prefer the body form).

HeaderBody fieldPurpose
x-rungate-run-idrungate.run_idStable ID grouping calls into one run. Missing = auto-grouped by idle timeout.
x-rungate-new-runrungate.new_runBoolean. Force a new run to start on this call, even if a run is in-flight.
x-rungate-policyrungate.policyOverride the agent’s default policy for this run (policy-id or name).
x-rungate-userrungate.userEnd-user identifier for attribution + analytics filtering.
x-rungate-tagsrungate.tagsComma-separated tags. Surface in analytics + webhook payloads.

When you send x-rungate-run-id, Rungate either:

A run auto-closes after the idle timeout (default 15 minutes since last call). Or close it explicitly with POST /v1/runs/{id}/complete. Closing is idempotent.

Long-horizon workflows (overnight pipelines, multi-day research runs): the idle timeout is a fallback, not a hard cap. As long as the agent keeps sending calls with the same x-rungate-run-id, the run stays open indefinitely. Budgets and policies span the entire life of the run. The auto-close only catches runs where the agent crashed or forgot to close.

7. Response Codes

Standard HTTP semantics plus two governance-specific codes (202 and 402). Every governance response includes a machine-parseable context object so the agent can recover without re-reading docs.

CodeMeaningAgent action
200Success. Response is the provider’s native format.Proceed.
202Approval gate — paused for human review (§9).Store gate_id, retry the same request.
400Malformed request.Fix input; do not retry.
401Token missing or invalid.Rotate token; do not retry with same one.
402Budget exceeded (§8).Stop this run. Human resets cap or starts a new run.
403Policy violation (§10).Different model/tool, or escalate to operator.
404Run/resource not found.Check ID; surface to operator if persistent.
409Run already closed.Start a new run with x-rungate-new-run: true.
422Semantically invalid (bad enum, schema mismatch).Fix input; do not retry.
429Rate limit (§11).Honor Retry-After; back off.
5xxRungate or upstream provider error.Exponential backoff up to the agent’s retry budget.

8. Budget Enforcement (402)

When cumulative run spend crosses the configured ceiling, the next call returns 402 Payment Required. In-flight calls complete normally; no new calls are dispatched on this run.

HTTP/1.1 402 Payment Required
Content-Type: application/json

{
  "error": {
    "code": "budget_exceeded",
    "message": "Run budget ceiling reached.",
    "context": {
      "run_id": "run_customer_refund_2025_01_17",
      "cumulative_spend_usd": "1.03",
      "limit_usd": "1.00",
      "rule": "stop_on_budget",
      "policy_id": "pol_prod_v4",
      "policy_name": "prod-agents/v4",
      "step_that_tripped": "llm.claude-sonnet-4"
    }
  }
}

The agent should stop issuing calls on this run. A human (or automation) resets the cap via the admin API or starts a new run with x-rungate-new-run: true.

9. Approval Gates (202)

When a tool call matches a policy rule requiring human review, Rungate returns 202 Accepted and pauses the run.

HTTP/1.1 202 Accepted
Content-Type: application/json
Retry-After: 5

{
  "status": "awaiting_approval",
  "context": {
    "gate_id": "gate_8xk2m",
    "run_id": "run_customer_refund_2025_01_17",
    "rule": "refund:over-$500",
    "proposed_action": {
      "tool": "issue_refund",
      "args": { "order": "ord_2H4p", "amount_usd": 1240.00 }
    },
    "approver_channel": "slack://#customer-ops",
    "expires_at": "2025-01-17T15:48:12Z"
  }
}

Retry the same request after Retry-After seconds (or on webhook signal — see §13). When approved, the retry succeeds and the run continues as if the gate had never been there. When rejected, the retry returns 403 with rule context.

Approval resolution is idempotent from the agent’s perspective: send the same request, same headers. Do not re-engineer a new request on retry.

10. Policy Violations (403)

Policy rules are declarative: allowed models, tool blocklists, rate limits, routing preferences. When a call violates a rule, Rungate returns 403 with the exact rule and field.

HTTP/1.1 403 Forbidden
Content-Type: application/json

{
  "error": {
    "code": "policy_violation",
    "message": "Model not in policy allowlist.",
    "context": {
      "policy_id": "pol_prod_v4",
      "policy_name": "prod-agents/v4",
      "rule": "allowed_models",
      "field": "model",
      "requested": "gpt-3.5-turbo",
      "allowed": ["openai/gpt-4", "anthropic/claude-sonnet-4", "anthropic/claude-opus-4"]
    }
  }
}

Not retryable without a policy change. The agent should select a different model/tool from the allowed set, or escalate to the operator.

11. Rate Limits (429)

Per-agent and per-policy rate limits return 429 with a Retry-After header.

HTTP/1.1 429 Too Many Requests
Retry-After: 12
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1737126712

Rate limits catch runaway loops: an agent spinning on the same tool call 200 times per minute is almost always a bug. The rate limit is a coarse safety net — the budget enforcement (§8) is the hard cost ceiling.

12. Runs API

Once a run exists, your agent can inspect it without admin privileges through the /v1/runs endpoints.

GET    /v1/runs/mine?limit=20        Your recent runs (token-scoped)
GET    /v1/runs/current              The run in progress, if any
GET    /v1/runs/{id}                   Full run object
GET    /v1/runs/{id}/steps             Step-by-step reconstruction
POST   /v1/runs/{id}/complete          Close the run (idempotent)
POST   /v1/runs/current/complete     Close whatever run is current
POST   /v1/runs/complete-all         Bulk-close (recovery utility)

The run object exposes cumulative cost, step count, status (running, completed, stopped, blocked), and a timeline of steps. Steps include model calls, tool calls, approvals, and governance events (alerts, blocks).

13. Webhook Events

Rungate posts webhook events for state changes the operator (or your agent) wants to act on. Subscriptions are created in the admin API; agents consume payloads at an HTTPS endpoint they own.

EventFires when
run.startedFirst call in a run.
run.completedRun closed normally.
run.blockedRun stopped by budget or policy.
approval.pendingGate requires human review.
approval.approvedGate resolved positively.
approval.rejectedGate resolved negatively.
budget.alertRun crossed the 80% alert threshold.
budget.exceededRun exceeded the hard cap.
policy.driftAgent attempted an action outside policy allowlist.

Payload shape is consistent across events:

{
  "event": "approval.pending",
  "delivered_at": "2025-01-17T14:12:20Z",
  "data": {
    "gate_id": "gate_8xk2m",
    "run_id": "run_customer_refund_2025_01_17",
    "agent_id": "agt_claude_code_prod",
    "rule": "refund:over-$500",
    "proposed_action": { "tool": "issue_refund", "args": { "amount_usd": 1240.00 } },
    "expires_at": "2025-01-17T15:48:12Z"
  }
}

14. Webhook Signature Verification

Every webhook delivery carries an HMAC-SHA256 signature over the raw request body. Verify before trusting the payload.

X-Rungate-Timestamp: 1737126732
X-Rungate-Signature: v1=1e4b0f...    # current secret
X-Rungate-Signature: v1=9c3a87...    # previous secret (grace window)

Rungate sends up to two signatures during a rotation grace window so you can roll secrets without downtime. Accept either.

import hmac, hashlib

def verify(body: bytes, timestamp: str, signatures: list[str], secret: str) -> bool:
    signed = timestamp.encode() + b"." + body
    expected = "v1=" + hmac.new(secret.encode(), signed, hashlib.sha256).hexdigest()
    return any(hmac.compare_digest(expected, sig) for sig in signatures)

Also reject stale timestamps (> 5 minutes old) to prevent replay attacks.

15. Tracing (OTLP)

Rungate emits OpenTelemetry spans for every run, model call, tool call, and governance event. Point an OTLP collector at the configured endpoint; spans arrive with the run ID as the root.

Span naming convention:

run.init                  Root span
llm.{provider}/{model}    e.g. llm.openai/gpt-4
tool.{name}               e.g. tool.issue_refund
approval.{rule}           Fires on gate creation
policy.{rule}             Fires on policy decision
budget.{event}            Fires on alert/exceeded

Each span carries attributes: cost in USD, token counts (input, output, cache_read, cache_creation), provider, model, latency. Run-level cost attribution is the sum of the run’s span costs.

16. Providers

Rungate currently supports OpenAI and Anthropic as first-class providers. Both have adapter implementations that handle format translation, failover, streaming, and cost calculation (including the four-bucket token model: input, output, cache_read, cache_creation).

Cross-provider failover is transparent: the request format the agent sent is preserved in the response, even if Rungate had to re-emit the call to a fallback provider in a different native format. Load balancing, circuit breakers, and latency-based routing are all configurable at the policy level.

Google Gemini is planned (the latent long-context pricing tier infrastructure already ships).

17. Self-Host

The full server is Apache 2.0 and self-hostable. One server process, one SQLite file, optional Redis for rate limiting at scale.

git clone <rungate-repo-url>     # repo URL provided after early-access onboarding
cd rungate
npm install
npm run build
npm run migrate            # creates/updates local SQLite file
npm run start              # listens on :3000 by default

# Create a bootstrap admin token:
node dist/server/cli/index.js tokens create --name bootstrap

# Create your first agent + get an rg_agt_* token:
node dist/server/cli/index.js agents create --name my-first-agent

Point your agent at http://localhost:3000/v1 with the agent token. Everything in this document works locally.

18. Pricing

Self-host is free (Apache 2.0, no usage limits). The managed cloud is tier-based; a step is one proxied call or tool invocation.

TierPriceSteps/moRetentionNotes
Free$050030 daysSolo, all governance features.
Pro$4910,00090 daysAll alert types, Slack integration.
Growth$299100,0001 yearOrg membership, RBAC, SSO-ready.
EnterpriseContactCustomCustomSLA, SSO, custom contract, dedicated support.

19. Security Summary

See /security for the full claims-backed page. Short form:

20. Support + Links