Use the Response API with DeepSeek, Gemini, Claude, and every model on Auriko

The Response API format works across all 250+ models on Auriko, with streaming, tool calling, and multi-model routing.


The Responses API is a clean interface for agentic apps: typed inputs, typed outputs, streaming events, tool calls, structured output, and reasoning controls all live in one request format.

The problem is portability.

Most providers do not expose the Responses API natively. If you want to use Claude, Gemini, DeepSeek, Qwen, Llama, Grok, MiniMax, or other models, you need provider-specific APIs or a Chat Completions compatibility layer.

Auriko gives you one /v1/responses endpoint across the full model catalog. See the Response API reference for the full endpoint spec.

Write against the Responses API once, then run the same request against DeepSeek, Gemini, Grok, Claude, Qwen, Llama, MiniMax, and 250+ other models on Auriko.


Basic usage

from auriko import Client

client = Client()

response = client.responses.create(
    model="deepseek-v4-pro",
    input="Summarize this quarter's results.",
)

print(response.output_text)

To use another model, change the model field. The response format stays the same.

response = client.responses.create(
    model="gemini-2.5-flash",
    input="Summarize this quarter's results.",
)

print(response.output_text)

The point: your app does not need different response parsers for different providers.


Streaming

Streaming uses the Responses API event format.

with client.responses.create(
    model="gemini-2.5-flash",
    input="Explain prompt caching in two sentences.",
    stream=True,
) as stream:
    for event in stream:
        print(event)

Events include:

  • response.created
  • response.output_text.delta
  • response.function_call_arguments.delta
  • response.completed

Auriko returns the Responses API event stream even when the upstream provider only supports Chat Completions. Auriko translates the request into the provider's format and translates the response back into the Responses API event stream.


Route across multiple models in one request

Responses API requests on Auriko support gateway.models.

Pass a list of models instead of choosing one:

response = client.responses.create(
    input="Summarize this quarter's results.",
    gateway={
        "models": [
            "gpt-4o",
            "claude-sonnet-4-6",
            "deepseek-v4-pro",
        ]
    },
)

print(response.output_text)

Auriko picks from the list based on your routing preferences. The response format stays the same regardless of which model handles the request. Configure routing priorities in the routing options guide.


Agent frameworks

Agent frameworks increasingly expect a Responses API interface: typed messages, function calls, function call outputs, and streaming events.

Auriko serves /v1/responses, so frameworks that target the Responses API can work across the full model catalog instead of being locked to one provider.

See the OpenAI Agents SDK integration guide for setup.


Supported features

Auriko's Responses API support includes:

  • streaming
  • function tools
  • tool_choice
  • parallel_tool_calls
  • structured output with json_schema strict mode
  • reasoning effort control
  • vision with image URLs
  • typed input items: message, function_call, function_call_output

Prompt caching, cost routing, failover, and provider extensions work the same way they do on Chat Completions. See how cost routing works for details on per-request provider selection.


What Auriko does with the request

Auriko routes each request to an available provider based on your preferences and request metadata: token counts, context length, and capability requirements like streaming or tool use.

Auriko does not read, log, or store your prompts, responses, or content.


Get started