Docs/Advanced/Rate Limits

Rate Limits

Configure per-agent rate limits (RPM, RPD) and monthly budget caps to prevent runaway costs and abuse.


Rate Limits

Keystore enforces rate limits and budget caps at the proxy layer, before requests ever reach the downstream provider. This protects you from runaway agent loops, accidental infinite retries, and unexpected cost spikes.

Limit Types

Each agent can be configured with three types of limits:

LimitDescriptionDefault
RPM (Requests Per Minute)Sliding window, per agent + provider combination60
RPD (Requests Per Day)Fixed window (24h), per agent + provider combination10,000
Monthly BudgetTotal estimated spend cap in cents, per agentNo limit

How Limits Are Enforced

On every proxy request, the following checks run in order:

text
1
2
3
4
5
6
7
1. Validate agent token (SHA-256 hash lookup)
2. Check agent status (active / paused / revoked)
3. Check RPM limit    if exceeded, return 429
4. Check RPD limit    if exceeded, return 429
5. Check budget       if exceeded, return 429
6. Forward request to provider
7. After response: increment spend counter

If any check fails, the request is rejected immediately and no downstream API call is made.

Configuring Limits

Via Dashboard

Navigate to Agents > [Agent Name] > Settings. Set the RPM, RPD, and monthly budget fields. Changes take effect within 5 minutes (the agent context cache TTL).

Via API

bash
1
2
3
4
5
6
7
8
curl -X PATCH https://api.keystore.io/v1/agents/<agent-id> \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "rpmLimit": 120,
    "rpdLimit": 50000,
    "budgetCentsMonthly": 5000
  }'

Via CLI

bash
1
2
3
4
keystore agents update <agent-id> \
  --rpm-limit 120 \
  --rpd-limit 50000 \
  --budget 5000

Setting a field to null removes the custom limit and falls back to the default.

Rate Limit Response

When a request is rate-limited, the proxy returns HTTP 429 Too Many Requests with headers indicating the limit state:

http
1
2
3
4
5
6
7
8
9
10
11
12
13
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1709712000
Content-Type: application/json

{
  "error": "Rate limit exceeded",
  "limitType": "rpm",
  "limit": 60,
  "remaining": 0,
  "resetAt": 1709712000
}

The limitType field indicates which limit was hit (rpm, rpd, or budget).

RPM: Sliding Window

RPM uses a sliding window algorithm via Upstash Redis. This provides smoother rate limiting compared to fixed windows -- there is no "burst at window boundary" problem.

The window slides continuously, so the limit represents the maximum number of requests in any 60-second period.

text
1
2
3
4
5
6
Example: RPM = 60

  Time 0:00    Request 1   (remaining: 59)
  Time 0:30    Request 30  (remaining: 30)
  Time 0:59    Request 60  (remaining: 0)
  Time 1:00    Request 1 from 0:00 expires (remaining: 1)

RPD: Fixed Window

RPD uses a fixed window that resets every 24 hours. The window is aligned to the first request of the day for each agent+provider combination.

text
1
2
3
4
5
6
Example: RPD = 10,000

  Day starts  counter at 0
  Throughout the day  counter increments
  Counter hits 10,000  all further requests rejected until reset
  Day ends  counter resets to 0

Budget Enforcement

The monthly budget is tracked in Redis as a running total of estimated costs in cents. The key follows the pattern:

text
1
agent:{agentId}:spend:{YYYY-MM}

Before each request, the proxy compares the current spend to budgetCentsMonthly. If the spend equals or exceeds the budget, the request is rejected.

After each successful request, the proxy increments the spend counter by the estimated cost. Spend keys automatically expire after 32 days, ensuring old month data is cleaned up.

Budget Response

json
1
2
3
4
5
6
{
  "error": "Budget exceeded",
  "limitType": "budget",
  "spentCents": 5000,
  "budgetCents": 5000
}

Per-Provider Scoping

RPM and RPD limits are tracked per agent and per provider. An agent with RPM=60 can make up to 60 requests/minute to OpenAI and 60 requests/minute to Anthropic independently.

text
1
2
Rate limit key format: {agentId}:{providerSlug}
Example: 550e8400-e29b-41d4-a716-446655440000:openai

Budget limits, on the other hand, are tracked per agent across all providers. A $50/month budget applies to the agent's combined spend across OpenAI, Anthropic, Resend, and any other assigned providers.

Fail-Open Behavior

If the Redis instance used for rate limiting becomes unavailable, the proxy fails open -- requests are allowed through without rate limiting. This design choice prioritizes availability over strict enforcement during infrastructure incidents.

When Redis recovers, rate limiting resumes automatically. A warning is logged for each request that skipped rate limiting:

text
1
[Redis] Rate limiting skipped -- Redis unavailable (fail open)

The same fail-open behavior applies to budget checks.

Monitoring Usage

Use the usage API or CLI to monitor how agents are tracking against their limits:

bash
1
2
3
4
5
# Get a summary with budget insights
keystore usage summary <agent-id> --period 24h

# View per-provider breakdown
keystore usage providers <agent-id> --period 7d

The summary endpoint returns burn rate, projected monthly cost, and estimated days until budget exhaustion:

json
1
2
3
4
5
6
7
8
{
  "totalRequests": 4250,
  "costCents": 1200,
  "budgetCents": 5000,
  "burnRateCentsPerHour": 1.67,
  "projectedMonthlyCents": 1202,
  "daysUntilBudget": 76.2
}