Ship LLM features with routing, budgets and traces by default.
Latenza sits between your app and model providers. Keep your OpenAI SDK, swap the base URL, and manage production traffic through virtual keys, routing profiles, guardrails, webhooks and observability.
Gateway flow
Quickstart
Integrate in minutes
Use the official OpenAI SDKs. Latenza speaks the same Chat Completions shape and adds routing, failover, budget controls and traces behind the scenes.
from openai import OpenAI
client = OpenAI(
base_url="https://gateway.latenza.ai/v1",
api_key="lz_live_...", # virtual key from the console
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are concise."},
{"role": "user", "content": "Explain AI routing in one sentence."},
],
)
print(response.choices[0].message.content)Core concepts
The control plane around your LLM traffic
Access
Virtual keys
Attach rate limits, budgets, routing strategy and model allowlists to each key.
Read more
Runtime
Routing & failover
Pick by cost, speed or quality, then fail over automatically when a provider fails.
Read more
Debug
Observability
Inspect requests, providers, latency, cost, tokens and metadata from the console.
Read more
Safety
Guardrails
Block or flag unsafe input/output before it spreads through your product.
Read more
Virtual keys
Keys are the unit of control
A Latenza key can represent an app, environment, customer or internal team. Configure it once, then enforce limits at the gateway before requests hit providers.
- Rate limit: cap requests per minute.
- Monthly budget: block the key when spend reaches the ceiling.
- Model allowlist: restrict which models can be routed.
- Routing profile: inherit org defaults or override per key.
{
"name": "Production backend",
"rate_limit": 120,
"routing_strategy": "balanced",
"model_allowlist": ["gpt-4o", "claude-3-5-sonnet"],
"budget_monthly_eur": 250,
"budget_alert_pct": 80
}Budget enforcement is done before provider calls
Monthly key budgets are checked during key validation. If the key has already spent its budget for the month, the gateway rejects the request before any provider call.
Routing
Route by cost, speed or quality
Choose a profile at the organization, assistant or key level. Latenza filters by context window, model allowlist and active providers, then picks the best candidate for the selected strategy. If an upstream provider fails, failover keeps the request alive.
cheapest
Minimize token cost.
fastest
Prefer lower latency.
balanced
Blend cost, latency and score.
quality
Prioritize model quality.
Observability
Every request becomes debuggable
Traces capture messages, provider, model, cost, latency, status, fallbacks and metadata. Use metadata to map traffic back to users, environments or product surfaces.
Guardrails & webhooks
Control and react to production traffic
Guardrails run before and after model calls. Webhooks deliver operational events such as budget thresholds, quota alerts and delivery signals to your systems.
FAQ
Common questions
Will Latenza add latency?
The gateway adds a small network hop, then offsets it with routing, failover and semantic cache opportunities.
Do I need to change SDKs?
No. Use your OpenAI-compatible SDK and point base_url to the Latenza gateway.
Can I restrict models by app?
Yes. Configure a virtual key with a model allowlist and use that key in the app.
Where is the full API spec?
The interactive reference is generated from the deployed gateway OpenAPI schema.