Auriko vs OpenRouter: which LLM gateway should you use?

Both Auriko and OpenRouter are OpenAI-compatible LLM gateways that route requests across multiple providers through a single API. OpenRouter's strength is broader provider and model coverage plus a larger integration surface. Auriko is built for teams that care most about production inference cost economics: cache-aware cost modeling, deterministic expected-cost routing, pricing transparency, zero markup, and free BYOK at scale.

TL;DR

Choose OpenRouter if you need broader provider and model coverage, a larger third-party integration ecosystem, or access to niche models that are not yet available on Auriko.

Choose Auriko if production inference cost is the main constraint. Auriko's cost advantage comes from three mechanisms:

Cache-aware cost modeling. Auriko accounts for cache thresholds, block sizes, write premiums, read discounts, expiration windows, and multi-turn reuse.
Deterministic expected-cost routing. OpenRouter's default provider routing uses price-weighted load balancing across eligible providers with recent-outage filtering. Auriko computes expected cost for the actual request and selects the lowest-cost eligible route.
0% markup. OpenRouter charges a 5.5% platform fee on credit purchases. Auriko charges 0% markup on provider token prices.

In our cost comparison research, these differences produce 15-30% lower inference cost versus OpenRouter on representative workloads.

Both are OpenAI SDK compatible, support BYOK, and handle fallback chains.

What is OpenRouter

OpenRouter offers 400+ models across 60+ providers through a single OpenAI-compatible API, with a larger third-party integration surface across developer tools and frameworks. You buy credits with a 5.5% platform fee. Per-token rates match what providers charge directly.

What is Auriko

Auriko is an OpenAI-compatible gateway built for production inference cost economics. It supports 200+ models across 12 providers and is actively expanding provider and model coverage. Its core differentiation is a routing engine built by ex-quant traders that models provider pricing, cache mechanics, and user workload patterns. It uses those inputs to predict expected cost and choose the lowest-cost eligible provider. Auriko charges 0 markup on provider token prices.

What to compare

The dimensions that matter most depend on what you're building. For production LLM applications, the criteria that usually drive the decision are:

Cost structure. How each gateway charges and how that interacts with your usage patterns.
Model and provider coverage. Whether the models you need are available.
Routing approach. How each gateway selects a provider for a given request.
Ecosystem. SDK maturity, tool integrations, observability.
Operational visibility. What each gateway exposes about provider health, routing decisions, cost, and data handling.

The table below covers these and more. The deep-dive sections explain the dimensions where the gateways differ most.

Head-to-head

Dimension	OpenRouter	Auriko	Notes
Pricing model	5.5% platform fee on credit purchases	0% markup	OpenRouter's per-token rates match providers; the fee is on credit purchase
BYOK cost	Free under 1M req/month, 5% after	Free on all tiers
Model and provider coverage	400+ models across 60+ providers	200+ models across 12 providers
Routing approach	Distributes traffic across providers using price-weighted load balancing	Deterministically routes each request to the provider with the lowest expected cost	OpenRouter spreads traffic across providers; Auriko chooses the expected lowest-cost route
Cost modeling	Uses token prices for routing	Cache-aware cost modeling: models provider cache mechanics and user usage patterns	Core driver of Auriko's 15-30% cost reduction
Provider signals	Uses recent-outage filtering plus latency and throughput signals in routing controls	Tracks latency distributions, provider health, throughput, pricing changes, and capacity signals
Free tier limit	50 req/day, 20 RPM	1,000 RPM
Free tier BYOK	Not available	500 RPM, 10K req/month
Paid tier limit	Unlimited for paid models	Unlimited
Paid tier BYOK	1M req/month free, then 5%	Free at any volume
SDKs	Client SDKs + Vercel AI SDK provider	Client SDKs + Vercel AI SDK provider

Where OpenRouter Is the Better Fit

Broader model access. OpenRouter is the better fit when your priority is broader provider and model coverage, especially for niche models, newly launched models, or many model variants behind one API.

Existing integration surface. OpenRouter is the better fit when broader third-party integration coverage is the deciding factor for an existing workflow.

Feature breadth outside provider routing. OpenRouter includes features beyond core provider routing, including Auto Router for prompt-based model selection, web search tooling, file/PDF input support, embeddings, quantization filtering, and model variant suffixes.

Where Auriko Is the Better Fit

Better inference cost economics. Auriko is the better fit when cost efficiency is a production priority. In our cost comparison research, it reduces inference cost by 15-30% versus OpenRouter on representative workloads. The savings come from cache-aware cost modeling, deterministic expected-cost routing, and 0% markup.

Routing calibrated to your usage pattern. Auriko helps lower inference cost by routing based on how your production traffic actually behaves, not only provider list prices.

Provider-aware routing. Auriko helps reduce the risk of degraded routes by accounting for provider health, performance, throughput, and capacity signals before sending the request.

Zero markup, free BYOK, and durable credits. Auriko charges 0 markup on provider token prices, keeps BYOK free across tiers, and does not expire purchased credits.

Why Auriko Can Lower Cost vs OpenRouter

Cost modeling

Two providers quoting identical per-token rates can produce different bills on the same workload. The difference comes from caching mechanics that per-token rates don't capture:

Minimum token thresholds. Some providers won't cache content below a certain token count. A system prompt under the threshold pays full price on every request, regardless of the published cache discount.
Block granularity. Providers cache in fixed-size blocks. Tokens that don't fill a complete block pay full price.
Expiration windows. Cache durations vary by provider. A cached entry only saves money if matching requests arrive before it expires.
Write and storage costs. Some providers charge a premium for the initial cache write and for ongoing storage. These costs only pay off through subsequent reuse.
Tiered pricing. Rates shift based on context length, so the rate on the pricing page may not apply to a specific request.

OpenRouter's provider-routing docs describe default routing as price-weighted load balancing across eligible providers with recent-outage filtering. Auriko maintains a cost model built by ex-quant traders that tracks provider pricing, per-provider cache mechanics, and usage patterns from request metadata (token counts and context length, not prompts or content). It computes expected cost per provider per request and selects the lowest-cost eligible provider for the actual workload after accounting for pricing, cache behavior, and usage patterns.

For multi-turn conversations, the engine models cache economics across the full conversation. A provider that charges a write premium but offers deep read discounts can be cheaper over 10 turns than one with lower single-request rates.

Requests can set cost optimization strategies: "cost" balances affordability with performance, "cost-focus" minimizes cost aggressively. Users can also set hard constraints: max_ttft_ms for latency ceilings, max_cost_per_1m to exclude expensive providers, and only_byok to restrict routing to bring-your-own-key providers.

For the full breakdown of how provider caching mechanics affect cost, see The variables that affect your inference cost.

Provider-aware routing

After Auriko estimates expected cost, the routing engine still has to choose a provider that can serve the request under current operating conditions. It filters qualified provider candidates, then scores them across four dimensions: expected cost, time to first token, throughput, and capacity headroom.

For cost-focused strategies, Auriko keeps cost as the primary optimization goal while using performance and capacity signals to reduce the risk of choosing routes that look inexpensive from published price alone but are weaker under current performance or capacity conditions. Requests can also set hard constraints such as max_ttft_ms, max_cost_per_1m, and only_byok, or provide custom weights when a workload needs a different balance.

The engine runs on a provider data pipeline that tracks latency distributions, provider health, throughput, pricing changes, and capacity signals in real time. Routing strategies are backtested under stress scenarios including provider degradation, price changes, capacity shocks, and partial outages.

The direct provider-routing comparison is OpenRouter's default price-based load balancing. For a specified model, OpenRouter prioritizes providers without recent outages, weights low-cost candidates by inverse square of price, and uses remaining providers as fallbacks. Users can override that behavior with provider sorting by price, throughput, or latency, plus preferred latency and throughput thresholds.

Separately, OpenRouter offers Auto Router, powered by NotDiamond, for prompt-based model selection when the request uses openrouter/auto. That is a model-selection feature, not the default provider-routing path for a specified model.

Pricing and BYOK

OpenRouter charges a 5.5% platform fee on credit purchases. Per-token rates match provider rates, so the fee is on purchase, not per-token. OpenRouter reserves the right to expire unused credits after 365 days.

Auriko charges 0% markup and does not expire purchased credits. Unused balance stays available until you spend it.

OpenRouter's BYOK is free for the first 1 million requests per month. After that, requests are charged at 5%. Auriko's BYOK is free on all tiers with no volume threshold.

Who should use what

If you need...	Use
Broader provider and model coverage	OpenRouter
Existing third-party integrations	OpenRouter
Auto Router / prompt-based model selection	OpenRouter
Better production inference cost economics	Auriko
Routing calibrated to your usage pattern	Auriko
Provider-aware routing with health, performance, and capacity signals	Auriko
Zero markup pricing, free BYOK at scale, or non-expiring credits	Auriko

Switching

Both gateways are OpenAI SDK compatible. Switching is a base URL and API key change.

Python:

from openai import OpenAI

# OpenRouter
client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="sk-or-..."
)

# Auriko
client = OpenAI(
    base_url="https://api.auriko.ai/v1",
    api_key="ak_..."
)

TypeScript:

import OpenAI from "openai";

// OpenRouter
const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: "sk-or-...",
});

// Auriko
const client = new OpenAI({
  baseURL: "https://api.auriko.ai/v1",
  apiKey: "ak_...",
});

For standard chat completions with common parameters, the request and response handling is compatible. Response metadata fields, streaming metadata delivery, and provider-specific extension parameters differ between gateways. Test those paths when switching.

Get started

To try Auriko, see the quickstart guide. To compare pricing on your specific models, see the model directory.