Infer

Models

9 models

Claude Haiku 4.5

Default Claude Haiku route for fast background work, summaries, and lightweight agent loops.

Anthropic|1M context|30 Credits input/150 Credits output/3 Credits cache read/37.5 Credits cache write/ 1M tokens

Claude Opus 4.6

Default Claude Opus route for complex planning and high-quality reasoning.

Anthropic|1M context|150 Credits input/750 Credits output/15 Credits cache read/187.5 Credits cache write/ 1M tokens

Claude Sonnet 4.6

Default Claude Sonnet route for balanced coding, analysis, and agent traffic.

Anthropic|1M context|90 Credits input/450 Credits output/9 Credits cache read/112.5 Credits cache write/ 1M tokens

DeepSeek V4 Pro

Infer DeepSeek Pro route for coding, math, and complex analysis workloads.

DeepSeek|1M context|174 Credits input/348 Credits output/ 1M tokens

Gemini 2.5 Flash

Discounted Gemini Flash route for fast multimodal-capable assistant traffic.

Google|1M context|7.5 Credits input/62.55 Credits output/1.875 Credits cache read/ 1M tokens

Gemini 2.5 Flash Image

Gemini image-capable route for visual prompts and image-token billing.

Google|1M context|7.5 Credits input/750 Credits output/ 1M tokens

GPT-5.3 Codex

Default Codex-oriented GPT route for developer agents and code tasks.

OpenAI|400K context|35 Credits input/280 Credits output/3.5 Credits cache read/ 1M tokens

GPT-5.4

Default GPT-5.4 route for cost-sensitive production agents.

OpenAI|1M context|50 Credits input/300 Credits output/5 Credits cache read/62.6 Credits cache write/ 1M tokens

Grok 4.1 Fast

Default Grok route for low-latency chat and production traffic.

xAI|2M context|13 Credits input/32.5 Credits output/3.25 Credits cache read/ 1M tokens