Configure per-agent rate limits (RPM, RPD) and monthly budget caps to prevent runaway costs and abuse.

Rate Limits

Keystore enforces rate limits and budget caps at the proxy layer, before requests ever reach the downstream provider. This protects you from runaway agent loops, accidental infinite retries, and unexpected cost spikes.

Limit Types

Each agent can be configured with three types of limits:

Limit	Description	Default
RPM (Requests Per Minute)	Sliding window, per agent + provider combination	60
RPD (Requests Per Day)	Fixed window (24h), per agent + provider combination	10,000
Monthly Budget	Total estimated spend cap in cents, per agent	No limit

How Limits Are Enforced

On every proxy request, the following checks run in order:

text

1. Validate agent token (SHA-256 hash lookup)
2. Check agent status (active / paused / revoked)
3. Check RPM limit  →  if exceeded, return 429
4. Check RPD limit  →  if exceeded, return 429
5. Check budget     →  if exceeded, return 429
6. Forward request to provider
7. After response: increment spend counter

If any check fails, the request is rejected immediately and no downstream API call is made.

Configuring Limits

Via Dashboard

Navigate to Agents > [Agent Name] > Settings. Set the RPM, RPD, and monthly budget fields. Changes take effect within 5 minutes (the agent context cache TTL).

Via API

bash

curl -X PATCH https://api.keystore.io/v1/agents/<agent-id> \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "rpmLimit": 120,
    "rpdLimit": 50000,
    "budgetCentsMonthly": 5000
  }'

Via CLI

bash

keystore agents update <agent-id> \
  --rpm-limit 120 \
  --rpd-limit 50000 \
  --budget 5000

Setting a field to null removes the custom limit and falls back to the default.

Rate Limit Response

When a request is rate-limited, the proxy returns HTTP 429 Too Many Requests with headers indicating the limit state:

http

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1709712000
Content-Type: application/json

{
  "error": "Rate limit exceeded",
  "limitType": "rpm",
  "limit": 60,
  "remaining": 0,
  "resetAt": 1709712000
}

The limitType field indicates which limit was hit (rpm, rpd, or budget).

RPM: Sliding Window

RPM uses a sliding window algorithm via Upstash Redis. This provides smoother rate limiting compared to fixed windows -- there is no "burst at window boundary" problem.

The window slides continuously, so the limit represents the maximum number of requests in any 60-second period.

text

Example: RPM = 60

  Time 0:00  →  Request 1   (remaining: 59)
  Time 0:30  →  Request 30  (remaining: 30)
  Time 0:59  →  Request 60  (remaining: 0)
  Time 1:00  →  Request 1 from 0:00 expires (remaining: 1)

RPD: Fixed Window

RPD uses a fixed window that resets every 24 hours. The window is aligned to the first request of the day for each agent+provider combination.

text

Example: RPD = 10,000

  Day starts → counter at 0
  Throughout the day → counter increments
  Counter hits 10,000 → all further requests rejected until reset
  Day ends → counter resets to 0

Budget Enforcement

The monthly budget is tracked in Redis as a running total of estimated costs in cents. The key follows the pattern:

text

agent:{agentId}:spend:{YYYY-MM}

Before each request, the proxy compares the current spend to budgetCentsMonthly. If the spend equals or exceeds the budget, the request is rejected.

After each successful request, the proxy increments the spend counter by the estimated cost. Spend keys automatically expire after 32 days, ensuring old month data is cleaned up.

Budget Response

json

{
  "error": "Budget exceeded",
  "limitType": "budget",
  "spentCents": 5000,
  "budgetCents": 5000
}

Per-Provider Scoping

RPM and RPD limits are tracked per agent and per provider. An agent with RPM=60 can make up to 60 requests/minute to OpenAI and 60 requests/minute to Anthropic independently.

text

Rate limit key format: {agentId}:{providerSlug}
Example: 550e8400-e29b-41d4-a716-446655440000:openai

Budget limits, on the other hand, are tracked per agent across all providers. A $50/month budget applies to the agent's combined spend across OpenAI, Anthropic, Resend, and any other assigned providers.

Fail-Open Behavior

If the Redis instance used for rate limiting becomes unavailable, the proxy fails open -- requests are allowed through without rate limiting. This design choice prioritizes availability over strict enforcement during infrastructure incidents.

When Redis recovers, rate limiting resumes automatically. A warning is logged for each request that skipped rate limiting:

text

[Redis] Rate limiting skipped -- Redis unavailable (fail open)

The same fail-open behavior applies to budget checks.

Monitoring Usage

Use the usage API or CLI to monitor how agents are tracking against their limits:

bash

# Get a summary with budget insights
keystore usage summary <agent-id> --period 24h

# View per-provider breakdown
keystore usage providers <agent-id> --period 7d

The summary endpoint returns burn rate, projected monthly cost, and estimated days until budget exhaustion:

json

{
  "totalRequests": 4250,
  "costCents": 1200,
  "budgetCents": 5000,
  "burnRateCentsPerHour": 1.67,
  "projectedMonthlyCents": 1202,
  "daysUntilBudget": 76.2
}