Examples/Production Agent Setup: Rate Limits, Budgets, and Webhooks

Production Agent Setup: Rate Limits, Budgets, and Webhooks

advancedinterceptAll·openaianthropic·5 min read

Production Agent Setup: Rate Limits, Budgets, and Webhooks

Running agents in production means controlling costs and handling failures. This guide covers the full production hardening setup: dashboard-configured budgets, rate limits, webhook alerts, and graceful error handling.

What you'll build

A production-ready agent with:

  • Per-agent monthly budget caps
  • Request rate limits
  • A webhook endpoint that receives budget alerts
  • Graceful handling of 429 (rate limit) and 402 (budget exceeded) responses

Prerequisites

  • A Keystore account with an agent token
  • OpenAI and Anthropic keys in your vault
  • A server or serverless function for webhook receiving
  • Node.js 18+

Dashboard Configuration

1

Set a monthly budget

In the Keystore dashboard:

  1. Go to Agents → select your agent
  2. Under Budget, set a monthly limit (e.g., $50.00)
  3. Set a warning threshold (e.g., 80%) to get alerted before hitting the limit

When the budget is exhausted, the vault returns 402 Payment Required for all subsequent requests.

2

Configure rate limits

On the same agent page:

  1. Under Rate Limits, set requests per minute (e.g., 60 RPM)
  2. Optionally set requests per day (e.g., 5,000 RPD)

Rate-limited requests receive 429 Too Many Requests with a Retry-After header.

3

Register a webhook endpoint

Under Webhooks:

  1. Add your endpoint URL (e.g., https://yourapp.com/api/keystore-webhooks)
  2. Select events: budget.warning, budget.exceeded, agent.rate_limited
  3. Copy the webhook signing secret for verification

Agent Code with Error Handling

4

Set up the agent with retry logic

typescript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import Keystore from "@keystore/sdk";
import OpenAI from "openai";
import Anthropic from "@anthropic-ai/sdk";

const ks = new Keystore({ agentToken: process.env.KS_TOKEN! });
ks.interceptAll();

const openai = new OpenAI();
const claude = new Anthropic();

async function callWithRetry<T>(
  fn: () => Promise<T>,
  maxRetries = 3
): Promise<T> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (err: any) {
      // Budget exceeded — stop immediately, no point retrying
      if (err.status === 402) {
        console.error("Budget exceeded. Stopping agent.");
        throw new Error("BUDGET_EXCEEDED");
      }

      // Rate limited — wait and retry
      if (err.status === 429) {
        const retryAfter = parseInt(err.headers?.["retry-after"] || "5", 10);
        console.warn(`Rate limited. Retrying in ${retryAfter}s...`);

        if (attempt < maxRetries) {
          await new Promise((r) => setTimeout(r, retryAfter * 1000));
          continue;
        }
      }

      // Other errors — retry with exponential backoff
      if (attempt < maxRetries) {
        const delay = Math.pow(2, attempt) * 1000;
        console.warn(`Error (attempt ${attempt + 1}). Retrying in ${delay}ms...`);
        await new Promise((r) => setTimeout(r, delay));
        continue;
      }

      throw err;
    }
  }
  throw new Error("Max retries exceeded");
}
5

Use the retry wrapper

typescript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
async function runAgent() {
  try {
    const result = await callWithRetry(() =>
      openai.chat.completions.create({
        model: "gpt-4o",
        messages: [{ role: "user", content: "Analyze today's metrics." }],
      })
    );

    console.log(result.choices[0].message.content);
  } catch (err: any) {
    if (err.message === "BUDGET_EXCEEDED") {
      // Notify ops team, pause agent, etc.
      console.error("Agent paused: budget exceeded");
      return;
    }
    throw err;
  }
}

Webhook Receiver

6

Build the webhook endpoint

typescript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import { createHmac } from "crypto";

// Express / Hono / any HTTP framework
app.post("/api/keystore-webhooks", async (req, res) => {
  // Verify the webhook signature
  const signature = req.headers["x-keystore-signature"] as string;
  const payload = JSON.stringify(req.body);
  const expected = createHmac("sha256", process.env.WEBHOOK_SECRET!)
    .update(payload)
    .digest("hex");

  if (signature !== expected) {
    return res.status(401).json({ error: "Invalid signature" });
  }

  const event = req.body;

  switch (event.type) {
    case "budget.warning":
      console.warn(
        `Agent ${event.agentId} at ${event.percentUsed}% of budget`
      );
      // Send Slack notification, etc.
      break;

    case "budget.exceeded":
      console.error(`Agent ${event.agentId} exceeded budget`);
      // Trigger incident, pause agent, notify team
      break;

    case "agent.rate_limited":
      console.warn(`Agent ${event.agentId} rate limited`);
      break;
  }

  res.json({ received: true });
});

Production Checklist

Before deploying agents to production:

  • Monthly budget set — prevents runaway costs
  • Rate limits configured — prevents accidental DDoS of providers
  • Webhook endpoint live — get alerted before problems escalate
  • Error handling in agent — graceful 429 and 402 handling
  • Agent token scoped — only enabled providers the agent needs
  • Monitoring dashboard — watch request volume and spend in real-time
  • Kill switch tested — verify you can revoke the token and all requests stop
!

Always test your budget and rate limit handling in a staging environment before deploying to production. Use a low budget ($1) to verify the 402 flow works correctly.

Full agent example

typescript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import Keystore from "@keystore/sdk";
import OpenAI from "openai";

const ks = new Keystore({ agentToken: process.env.KS_TOKEN! });
ks.interceptAll();

const openai = new OpenAI();

async function callWithRetry<T>(fn: () => Promise<T>, maxRetries = 3): Promise<T> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (err: any) {
      if (err.status === 402) throw new Error("BUDGET_EXCEEDED");
      if (err.status === 429 && attempt < maxRetries) {
        const wait = parseInt(err.headers?.["retry-after"] || "5", 10);
        await new Promise((r) => setTimeout(r, wait * 1000));
        continue;
      }
      if (attempt < maxRetries) {
        await new Promise((r) => setTimeout(r, Math.pow(2, attempt) * 1000));
        continue;
      }
      throw err;
    }
  }
  throw new Error("Max retries exceeded");
}

async function main() {
  try {
    const result = await callWithRetry(() =>
      openai.chat.completions.create({
        model: "gpt-4o",
        messages: [{ role: "user", content: "Generate a daily ops summary." }],
      })
    );
    console.log(result.choices[0].message.content);
  } catch (err: any) {
    if (err.message === "BUDGET_EXCEEDED") {
      console.error("Agent paused — budget exceeded. Check dashboard.");
    } else {
      console.error("Agent error:", err.message);
    }
  } finally {
    ks.restore();
  }
}

main();

Next steps