Production Agent Setup: Rate Limits, Budgets, and Webhooks
advancedinterceptAll·openaianthropic·5 min read
Production Agent Setup: Rate Limits, Budgets, and Webhooks
Running agents in production means controlling costs and handling failures. This guide covers the full production hardening setup: dashboard-configured budgets, rate limits, webhook alerts, and graceful error handling.
What you'll build
A production-ready agent with:
- Per-agent monthly budget caps
- Request rate limits
- A webhook endpoint that receives budget alerts
- Graceful handling of 429 (rate limit) and 402 (budget exceeded) responses
Prerequisites
- A Keystore account with an agent token
- OpenAI and Anthropic keys in your vault
- A server or serverless function for webhook receiving
- Node.js 18+
Dashboard Configuration
1
Set a monthly budget
In the Keystore dashboard:
- Go to Agents → select your agent
- Under Budget, set a monthly limit (e.g., $50.00)
- Set a warning threshold (e.g., 80%) to get alerted before hitting the limit
When the budget is exhausted, the vault returns 402 Payment Required for all subsequent requests.
2
Configure rate limits
On the same agent page:
- Under Rate Limits, set requests per minute (e.g., 60 RPM)
- Optionally set requests per day (e.g., 5,000 RPD)
Rate-limited requests receive 429 Too Many Requests with a Retry-After header.
3
Register a webhook endpoint
Under Webhooks:
- Add your endpoint URL (e.g.,
https://yourapp.com/api/keystore-webhooks) - Select events:
budget.warning,budget.exceeded,agent.rate_limited - Copy the webhook signing secret for verification
Agent Code with Error Handling
4
Set up the agent with retry logic
typescript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import Keystore from "@keystore/sdk";
import OpenAI from "openai";
import Anthropic from "@anthropic-ai/sdk";
const ks = new Keystore({ agentToken: process.env.KS_TOKEN! });
ks.interceptAll();
const openai = new OpenAI();
const claude = new Anthropic();
async function callWithRetry<T>(
fn: () => Promise<T>,
maxRetries = 3
): Promise<T> {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (err: any) {
// Budget exceeded — stop immediately, no point retrying
if (err.status === 402) {
console.error("Budget exceeded. Stopping agent.");
throw new Error("BUDGET_EXCEEDED");
}
// Rate limited — wait and retry
if (err.status === 429) {
const retryAfter = parseInt(err.headers?.["retry-after"] || "5", 10);
console.warn(`Rate limited. Retrying in ${retryAfter}s...`);
if (attempt < maxRetries) {
await new Promise((r) => setTimeout(r, retryAfter * 1000));
continue;
}
}
// Other errors — retry with exponential backoff
if (attempt < maxRetries) {
const delay = Math.pow(2, attempt) * 1000;
console.warn(`Error (attempt ${attempt + 1}). Retrying in ${delay}ms...`);
await new Promise((r) => setTimeout(r, delay));
continue;
}
throw err;
}
}
throw new Error("Max retries exceeded");
}5
Use the retry wrapper
typescript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
async function runAgent() {
try {
const result = await callWithRetry(() =>
openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Analyze today's metrics." }],
})
);
console.log(result.choices[0].message.content);
} catch (err: any) {
if (err.message === "BUDGET_EXCEEDED") {
// Notify ops team, pause agent, etc.
console.error("Agent paused: budget exceeded");
return;
}
throw err;
}
}Webhook Receiver
6
Build the webhook endpoint
typescript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import { createHmac } from "crypto";
// Express / Hono / any HTTP framework
app.post("/api/keystore-webhooks", async (req, res) => {
// Verify the webhook signature
const signature = req.headers["x-keystore-signature"] as string;
const payload = JSON.stringify(req.body);
const expected = createHmac("sha256", process.env.WEBHOOK_SECRET!)
.update(payload)
.digest("hex");
if (signature !== expected) {
return res.status(401).json({ error: "Invalid signature" });
}
const event = req.body;
switch (event.type) {
case "budget.warning":
console.warn(
`Agent ${event.agentId} at ${event.percentUsed}% of budget`
);
// Send Slack notification, etc.
break;
case "budget.exceeded":
console.error(`Agent ${event.agentId} exceeded budget`);
// Trigger incident, pause agent, notify team
break;
case "agent.rate_limited":
console.warn(`Agent ${event.agentId} rate limited`);
break;
}
res.json({ received: true });
});Production Checklist
Before deploying agents to production:
- Monthly budget set — prevents runaway costs
- Rate limits configured — prevents accidental DDoS of providers
- Webhook endpoint live — get alerted before problems escalate
- Error handling in agent — graceful 429 and 402 handling
- Agent token scoped — only enabled providers the agent needs
- Monitoring dashboard — watch request volume and spend in real-time
- Kill switch tested — verify you can revoke the token and all requests stop
!
Always test your budget and rate limit handling in a staging environment before deploying to production. Use a low budget ($1) to verify the 402 flow works correctly.
Full agent example
typescript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import Keystore from "@keystore/sdk";
import OpenAI from "openai";
const ks = new Keystore({ agentToken: process.env.KS_TOKEN! });
ks.interceptAll();
const openai = new OpenAI();
async function callWithRetry<T>(fn: () => Promise<T>, maxRetries = 3): Promise<T> {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (err: any) {
if (err.status === 402) throw new Error("BUDGET_EXCEEDED");
if (err.status === 429 && attempt < maxRetries) {
const wait = parseInt(err.headers?.["retry-after"] || "5", 10);
await new Promise((r) => setTimeout(r, wait * 1000));
continue;
}
if (attempt < maxRetries) {
await new Promise((r) => setTimeout(r, Math.pow(2, attempt) * 1000));
continue;
}
throw err;
}
}
throw new Error("Max retries exceeded");
}
async function main() {
try {
const result = await callWithRetry(() =>
openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Generate a daily ops summary." }],
})
);
console.log(result.choices[0].message.content);
} catch (err: any) {
if (err.message === "BUDGET_EXCEEDED") {
console.error("Agent paused — budget exceeded. Check dashboard.");
} else {
console.error("Agent error:", err.message);
}
} finally {
ks.restore();
}
}
main();Next steps
- Start with interceptAll() if you haven't set up vault routing yet
- Add database access for data-driven agents
- Try OpenClaw for prompt-only integration