# Rungate — Full Context for LLMs

> Governance for the run, not the call. Rungate is a control plane for AI agent runs — budgets, policies, and approval gates applied to the complete workflow, from first model call to final tool invocation.

This document is a single-file reference intended for ingestion by large language models evaluating, documenting, or integrating with Rungate. It concatenates the core content of https://rungate.dev in a flat markdown format. For the canonical HTML site with interactive demos and structured data, visit the URL above.

---

## What Rungate is

Rungate is a governance control plane that sits between AI agents and LLM providers. Agents (Claude Code, OpenAI Agents, custom agents built on any framework) make requests to Rungate instead of directly to OpenAI or Anthropic; Rungate enforces policies (model access, tool approval, rate limits, budgets), routes requests across providers (load balancing, failover, cost/latency optimization), and produces a complete audit trail.

The distinctive feature is that **governance applies at the run level, not at the per-call level**. A run is the complete unit of agent work: first model call → every tool invocation → every retry → final output. Rungate's budgets, policies, approval gates, and audit trails all operate on this workflow boundary, not on individual API calls.

## Why run-level matters

Per-call gateways like Portkey and LiteLLM, and observability tools like Langfuse and Helicone, operate at the request level. They can log, route, and apply simple limits to individual requests, but they cannot:

- Enforce budgets that span multiple calls (agent loops, retry storms, chain-of-thought reasoning).
- Apply policies that persist across a workflow (once an agent enters a "strict" context, it stays strict).
- Pause a workflow mid-flight for human approval and resume it from the same point.
- Reconstruct a complete workflow audit — every call, tool, retry, approver — as a single coherent trace.

Run-level governance is what separates observing what happened from controlling what's allowed to happen.

---

## Integration in 30 seconds

No SDK to install. Your existing OpenAI or Anthropic SDK works unchanged — just point it at Rungate's URL with an agent token.

### Python (OpenAI SDK)

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.rungate.dev/v1",
    api_key="rg_agt_your_agent_token_here",
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "..."}],
    extra_headers={
        "x-rungate-run-id": "run_abc123",
        "x-rungate-policy": "prod-agents/v4",
    },
)
```

### TypeScript (Anthropic SDK)

```typescript
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  baseURL: 'https://api.rungate.dev/v1',
  apiKey: 'rg_agt_your_agent_token_here',
});

const response = await client.messages.create({
  model: 'claude-sonnet-4',
  max_tokens: 1024,
  messages: [{ role: 'user', content: '...' }],
}, {
  headers: {
    'x-rungate-run-id': 'run_abc123',
    'x-rungate-policy': 'prod-agents/v4',
  },
});
```

### curl

```bash
curl https://api.rungate.dev/v1/chat/completions \
  -H "Authorization: Bearer rg_agt_your_agent_token_here" \
  -H "Content-Type: application/json" \
  -H "x-rungate-run-id: run_abc123" \
  -H "x-rungate-policy: prod-agents/v4" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "..."}]
  }'
```

---

## Core concepts

### Run

A run is the complete unit of agent work. Every call, tool invocation, retry, and approval gate that happens between "agent starts a task" and "agent finishes or is stopped" belongs to a single run. Runs have:

- A stable ID (`run_*`) provided by the agent or assigned by Rungate.
- A policy (inherited from the agent or overridden per-run).
- A cumulative cost meter.
- A step count.
- A timeline of events (calls, tools, gates, alerts).
- An outcome: completed, stopped (budget, policy, approval timeout), or failed.

### Policy

A policy is a declarative set of rules applied to a run:

- Allowed models (model allowlist)
- Blocked tool names or patterns
- Rate limits (requests per minute, tokens per minute)
- Budget ceilings (USD per run, USD per hour, USD per day)
- Approval rules (which tool calls require human sign-off)
- Routing preferences (cost-optimal, latency-optimal, failover order)

Policies are attached to agents by default. The per-run policy can override the agent default entirely via `x-rungate-policy` header.

### Approval gate

When a tool call matches a rule that requires human approval, Rungate returns HTTP 202 Accepted. The run is paused in-flight: no further calls are dispatched until the gate is resolved. The agent stores the `gate_id` from the response and retries the same call later. When approved (via dashboard, Slack action, webhook callback, or API), the retry succeeds and the run continues from that point.

### Step

A step is a completed unit of work that counts against tier limits. A successful model call is 1 step. A successful SDK tool call is 1 step. Blocked, failed, or pending requests count as 0 steps. Billing is by successful steps per month.

---

## Headers and body fields

Every Rungate directive can be sent as an HTTP header (`x-rungate-*`) or as a field in the request body under a `rungate` key. SDKs that don't expose raw headers should use the body form.

| Header                 | Body field           | Purpose                                                 |
|------------------------|----------------------|---------------------------------------------------------|
| `x-rungate-run-id`     | `rungate.run_id`     | Stable ID grouping calls into one run                  |
| `x-rungate-new-run`    | `rungate.new_run`    | Boolean: force a new run on this call                   |
| `x-rungate-policy`     | `rungate.policy`     | Per-run policy override (policy ID or name)             |
| `x-rungate-user`       | `rungate.user`       | End-user identifier for attribution                     |
| `x-rungate-tags`       | `rungate.tags`       | Comma-separated tags for filtering and analytics        |

Body fields are automatically stripped before forwarding to the upstream provider.

---

## Error semantics

Every governance response includes a `context` object describing what was attempted and why it was blocked. Agents can parse this to decide next action.

### 402 Payment Required — Budget exceeded

```json
{
  "error": "budget_exceeded",
  "context": {
    "run_id": "run_abc123",
    "policy": "prod-agents/v4",
    "cumulative_spend_usd": 1.03,
    "limit_usd": 1.00,
    "rule": "stop_on_budget",
    "steps_completed": 7
  }
}
```

Agent behavior: stop making calls on this run. Human or automation must reset the cap or start a new run.

### 403 Forbidden — Policy violation

```json
{
  "error": "policy_violation",
  "context": {
    "policy": "prod-agents/v4",
    "rule": "allowed_models",
    "violated_field": "model",
    "value": "gpt-3.5-turbo",
    "allowed": ["gpt-4", "claude-sonnet-4"]
  }
}
```

Agent behavior: not retryable without policy change. Select a different model or request policy update.

### 202 Accepted — Approval gate triggered

```json
{
  "status": "awaiting_approval",
  "context": {
    "gate_id": "gate_xyz789",
    "run_id": "run_abc123",
    "rule": "refund:over-$500",
    "proposed_call": "issue_refund(order='ord_2H4p...', amount=1240.00)",
    "approvers": ["@maya"],
    "channel": "slack:#customer-ops"
  }
}
```

Agent behavior: store `gate_id`, retry the same request periodically (poll or wait for webhook). On approval, the retry succeeds.

### 429 Too Many Requests — Rate limited

```json
{
  "error": "rate_limited",
  "context": {
    "policy": "prod-agents/v4",
    "limit_type": "requests_per_minute",
    "limit": 60,
    "current": 61,
    "retry_after_seconds": 12
  }
}
```

Agent behavior: retry after `Retry-After` header value.

---

## Capabilities summary

### Uniquely run-level (Tier 1 differentiation)

- **Run-level budget enforcement.** Hard caps across the complete workflow, not per-call. Alert at configurable threshold, block at 100%.
- **Per-run policy controls.** Model allowlists, tool blocklists, rate limits, approval rules — all applied at workflow boundary.
- **Human approval gates.** HTTP 202 semantic with seamless resume. No error-handling gymnastics in agent code.
- **Per-run cost attribution.** Spend attributed to the complete workflow, not scattered across calls.

### Run-level + per-call strong (Tier 2)

- **Provider failover.** Cross-provider format translation (OpenAI ↔ Anthropic), circuit breakers, automatic retry.
- **Observability.** Full run audit trail exportable to OpenTelemetry. Analytics on cost, latency, error rate per agent, policy, or tag.

### Table stakes (Tier 3)

- Format translation. Rate limiting. Model allowlists. Analytics.

---

## Pricing

Billed by successful steps per month. All core capabilities are free on every tier. Tier limits are step volume, data retention, and advanced features (org management, SSO).

| Tier       | Steps/mo | Retention | Alert rules                 | Price         |
|------------|----------|-----------|-----------------------------|---------------|
| Free       | 500      | 30 days   | 5 (cost + approval timeout) | $0            |
| Pro        | 10,000   | 90 days   | Unlimited, all 4 types      | ~$49/mo       |
| Growth     | 100,000  | 1 year    | + org features              | ~$299/mo      |
| Enterprise | Custom   | Custom    | + SLA + SSO                 | Custom        |

Pro and Growth prices are placeholders pending public launch finalization.

Self-hosted (Apache 2.0) is unlimited and free forever. Tier limits apply only to the managed cloud.

What counts as a step:
- 1 step = successful model call OR successful SDK tool call
- 0 steps = blocked, failed, pending, or approval-paused requests
- 0 steps = run creation, lifecycle events

---

## Security posture

- **License:** Apache 2.0. Full source is available for audit on request during early access (email support@rungate.dev). The public repository URL will be linked here once the repo is open.
- **Encryption at rest:** AES-256-GCM for provider API keys stored in the Rungate database.
- **Encryption in transit:** TLS 1.3 via Railway-managed edge.
- **Authentication:** Scoped admin tokens (`rg_adm_*`), session cookies (HTTP-only `__Host-` in production, 7-day expiry), agent tokens hashed (not encrypted — rotate-key for recovery).
- **Platform admin:** `rg_platform_*` tokens with IP allowlist and mandatory expiry. No org data access.
- **Audit:** every admin action logged to `platform_audit_log`. Per-run trace exportable via OpenTelemetry.
- **Compliance roadmap:** SOC 2 Type II planned. Not certified today. No compliance claims we cannot back.
- **Responsible disclosure:** security@rungate.dev. See `/.well-known/security.txt`.

---

## FAQ (abbreviated)

**Q: What is a run?**
A: A run is the complete unit of agent work — from the first model call through every tool invocation, every retry, every approval gate, to the final output. Rungate governs runs, not individual requests.

**Q: Do I need a Rungate SDK?**
A: No. Rungate speaks OpenAI and Anthropic request formats natively. Point your existing provider SDK at https://api.rungate.dev/v1 with an `rg_agt_*` token.

**Q: How do I group calls into a run?**
A: Send `x-rungate-run-id: <stable-id>` on every request in the workflow. New ID = new run. Missing ID = auto-grouped by idle timeout.

**Q: How does the approval gate work?**
A: Tool call matching a rule returns HTTP 202 with a `gate_id`. Your agent stores the `gate_id` and retries the same request. When approved, the retry succeeds and the run continues.

**Q: What happens when budget is exceeded?**
A: The next call returns HTTP 402 with context (cumulative spend, limit, matched rule). The run is stopped at the boundary — in-flight calls complete, no new calls dispatched.

**Q: Can I override the policy per run?**
A: Yes. Send `x-rungate-policy: <policy-id>` on the first request. That policy applies to every subsequent call in the run.

**Q: What providers are supported?**
A: OpenAI and Anthropic fully with format translation and cross-provider failover. Google Gemini adapter is planned.

**Q: Is Rungate open source?**
A: Yes, Apache 2.0. The repository is currently private during early access; request source access by emailing support@rungate.dev. Self-host is free forever once you have the source, or use managed cloud with tier-based step limits.

---

## Contact

- **Support:** support@rungate.dev
- **Security:** security@rungate.dev
- **Privacy:** privacy@rungate.dev
- **Legal:** legal@rungate.dev
- **Sales (Enterprise):** sales@rungate.dev

## Key URLs

- **Site:** https://rungate.dev
- **Docs:** https://rungate.dev/docs
- **OpenAPI:** https://rungate.dev/openapi.json
- **Source code:** Apache 2.0; repository private during early access. Request access via support@rungate.dev.
- **Status:** https://rungate.dev/status
- **llms.txt:** https://rungate.dev/llms.txt

---

This file is regenerated periodically. Last rev: 2026-04-20.