OpenAI-compatible AI Gateway

Ship LLM features with routing, budgets and traces by default.

Latenza sits between your app and model providers. Keep your OpenAI SDK, swap the base URL, and manage production traffic through virtual keys, routing profiles, guardrails, webhooks and observability.

Gateway flow

1Virtual keyauth, budget, allowlist

2Policy enginerouting profile + guardrails

3Provider callOpenAI, Anthropic, Mistral, Google

4Tracecost, latency, tokens, metadata

Quickstart

Integrate in minutes

Use the official OpenAI SDKs. Latenza speaks the same Chat Completions shape and adds routing, failover, budget controls and traces behind the scenes.

from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.latenza.ai/v1",
    api_key="lz_live_...",  # virtual key from the console
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are concise."},
        {"role": "user", "content": "Explain AI routing in one sentence."},
    ],
)

print(response.choices[0].message.content)

Core concepts

The control plane around your LLM traffic

Access

Virtual keys

Attach rate limits, budgets, routing strategy and model allowlists to each key.

Runtime

Routing & failover

Pick by cost, speed or quality, then fail over automatically when a provider fails.

Debug

Observability

Inspect requests, providers, latency, cost, tokens and metadata from the console.

Safety

Guardrails

Block or flag unsafe input/output before it spreads through your product.

Virtual keys

Keys are the unit of control

A Latenza key can represent an app, environment, customer or internal team. Configure it once, then enforce limits at the gateway before requests hit providers.

Rate limit: cap requests per minute.
Monthly budget: block the key when spend reaches the ceiling.
Model allowlist: restrict which models can be routed.
Routing profile: inherit org defaults or override per key.

Virtual key configuration

{
  "name": "Production backend",
  "rate_limit": 120,
  "routing_strategy": "balanced",
  "model_allowlist": ["gpt-4o", "claude-3-5-sonnet"],
  "budget_monthly_eur": 250,
  "budget_alert_pct": 80
}

Budget enforcement is done before provider calls

Monthly key budgets are checked during key validation. If the key has already spent its budget for the month, the gateway rejects the request before any provider call.

Routing

Route by cost, speed or quality

Choose a profile at the organization, assistant or key level. Latenza filters by context window, model allowlist and active providers, then picks the best candidate for the selected strategy. If an upstream provider fails, failover keeps the request alive.

cheapest

Minimize token cost.

fastest

Prefer lower latency.

balanced

Blend cost, latency and score.

quality

Prioritize model quality.

Observability

Every request becomes debuggable

Traces capture messages, provider, model, cost, latency, status, fallbacks and metadata. Use metadata to map traffic back to users, environments or product surfaces.

Guardrails & webhooks

Control and react to production traffic

Guardrails run before and after model calls. Webhooks deliver operational events such as budget thresholds, quota alerts and delivery signals to your systems.

FAQ

Common questions

Will Latenza add latency?

The gateway adds a small network hop, then offsets it with routing, failover and semantic cache opportunities.

Do I need to change SDKs?

No. Use your OpenAI-compatible SDK and point base_url to the Latenza gateway.

Can I restrict models by app?

Yes. Configure a virtual key with a model allowlist and use that key in the app.

Where is the full API spec?

The interactive reference is generated from the deployed gateway OpenAPI schema.