GroqRequest AccessLLM

Groq

Ultra-fast LLM inference

Access Groq's LPU-powered inference for the fastest token generation available. Perfect for latency-sensitive agent workloads.

Features

Sub-second inference
OpenAI-compatible API
Llama & Mixtral models
Streaming support
JSON mode

Integration Example

Use Groq through Keystore with zero code changes. Keys are resolved from the vault and injected at request time.

groq-example.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import Keystore from "@keystore/sdk";
import OpenAI from "openai";

const ks = new Keystore({ agentToken: process.env.KS_TOKEN! });
ks.interceptAll();

// Groq uses an OpenAI-compatible API
const groq = new OpenAI({
  baseURL: "https://api.groq.com/openai/v1",
});
const completion = await groq.chat.completions.create({
  model: "llama-3.1-70b-versatile",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(completion.choices[0].message.content);

Use Cases

Real-time conversational agents
Low-latency code completion
High-throughput batch processing
Interactive AI assistants

Ready to use Groq?

Request access and our concierge team will provision credentials for you — usually within 24 hours. No setup on your end.

Request Access