CONCEPTS

Policies & budgets

Allowed model lists, blocked tools, rate limits, destructive-action gates, and run-level budget caps. Declared once, enforced server-side, applied to every call in the run.

A policy is a named set of declarative rules Rungate enforces on every call in a run. Policies apply at the run boundary — when a run starts, it picks up the policy of the agent that initiated it (or an explicit x-rungate-policy override). Every subsequent call in that run is checked against the same policy.

Policies are managed through the dashboard at app.rungate.dev/policies or via the admin API.

What you can enforce

Run budget caps

Set a maximum cumulative spend per run, in USD. When cumulative cost crosses the cap, the next call returns 402 Payment Required. In-flight calls complete normally; no new calls are dispatched on this run.

The 402 response includes a structured context object with the run ID, cumulative spend, the limit, and the rule that fired. Your agent should stop issuing calls on this run. A human (or automation) resets the cap or starts a new run with x-rungate-new-run: true.

HTTP/1.1 402 Payment Required
Content-Type: application/json

{
  "error": {
    "code": "budget_exceeded",
    "message": "Run budget ceiling reached.",
    "context": {
      "run_id": "run_customer_refund_2026_04_26",
      "cumulative_spend_usd": "1.03",
      "limit_usd": "1.00",
      "rule": "stop_on_budget",
      "policy_id": "pol_prod_v4",
      "policy_name": "prod-agents/v4",
      "step_that_tripped": "llm.claude-sonnet-4"
    }
  }
}

Allowed models

Restrict which models a policy permits. The first cheap and deterministic check on every request — runs that try a model outside the allowlist get 403 Forbidden with the rule and the allowed list:

{
  "error": {
    "code": "policy_violation",
    "message": "Model not in policy allowlist.",
    "context": {
      "policy_id": "pol_prod_v4",
      "rule": "allowed_models",
      "field": "model",
      "requested": "gpt-3.5-turbo",
      "allowed": [
        "openai/gpt-4",
        "anthropic/claude-sonnet-4",
        "anthropic/claude-opus-4"
      ]
    }
  }
}

Use this to lock production agents to vetted models, prevent accidental fallback to deprecated models, or enforce cost ceilings by model class.

Tool blocklists / approval gates

Specify tool calls that should be blocked outright (return 403) or pause the run for human review (return 202). See Approval gates for the gate flow.

Rate limits

Per-agent and per-policy rate limits. Rungate returns 429 with a Retry-After header. Rate limits catch runaway loops — an agent calling the same tool 200 times per minute is almost always a bug. The rate limit is a coarse safety net; the budget cap is the hard cost ceiling.

HTTP/1.1 429 Too Many Requests
Retry-After: 12
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1737126712

Routing strategy

Configure provider routing per policy: primary, fallback chain, circuit-breaker thresholds, latency-based selection. Cross-provider failover is transparent — if your primary provider errors or trips a circuit breaker, Rungate re-emits the request to the configured fallback (translating format if vendors differ) and the agent sees the response in the format it sent.

Per-run policy overrides

Each agent has a default policy. Override it on a per-run basis with the x-rungate-policy header (policy id or name). The override applies for the entire run.

x-rungate-policy: prod-agents/v4-strict

Useful for: temporarily tightening rules during an incident, running an evaluation under a different policy, A/B testing two policies on the same agent.

Policy versioning

Policies are versioned. Each run records which policy version was active when it ran. Mutating a policy doesn't change history. If you want a hard pin, name a new version and reference it explicitly.

Dry-run mode

Set dry_run: true on a policy (or create a dedicated dry-run policy) to validate the integration end-to-end without spending real money on provider calls. Rungate accepts the request, returns a mock response, and records the run. Use this to test budget enforcement, approval gates, and webhook delivery before flipping to a real policy.

Common patterns

Tight production budget, generous dev budget

Two policies — prod-agents with $1.00/run hard cap, dev-agents with $10.00/run. Production agents bind to the first; the same agent can override to the second during an evaluation via x-rungate-policy.

Model deprecation

Drop a model from the allowlist. The next call using that model fails with 403 + the canonical "model not in allowlist" rule + the new allowed list. Your agent's error handler reads allowed and switches to a permitted model on the retry.

Tool gating without breaking the agent

Block a tool with the approval-gate variant (returns 202) instead of an outright 403. The agent retries the same request after the operator approves; the workflow continues without changing the agent's code path. See Approval gates.