Frontier reasoning
The most capable model for hard multi-step problems — graduate-level science, long-horizon code agents, complex math.
premium tier · highest AA Intelligence Index score
Tops Artificial Analysis Intelligence Index v4.1 at 60 with adaptive reasoning + Opus 4.8 fallback. Wins on agentic benchmarks (Terminal-Bench, GDPval) and scientific reasoning (HLE, CritPt).
›2 alternatives

Claude Opus 4.8 (max effort)
Anthropic
AA Index 56; the workhorse if you don't need Fable 5's full reasoning stack

GPT-5.5 (xhigh)
OpenAI
AA Index 55; matches Opus 4.8 on most tasks at a similar tier
Frontier general-purpose
Best balance of capability + speed + tool use for everyday production workloads at the frontier tier.
$5 in / $30 out per 1M
Frontier scoring (AA Index 55 at xhigh effort), 1M-context multimodal with strong agentic computer-use. The default if you want one model that does everything well.
›2 alternatives

Claude Opus 4.8
Anthropic
Slightly better at coding + reasoning; same tier on price

Gemini 3.1 Pro
Google
2M context window; cheaper per-token but slightly behind on the index
Agentic coding
Best at autonomous code generation across a real repo — multi-file edits, terminal use, test-fixing.
$1.75 in / $14 out per 1M
SOTA on SWE-bench Pro at launch (Feb 2026). 400K context with strong Terminal-Bench 2.0 + OSWorld-Verified scores. The model Cursor / Devin / Codex CLI default to for hard code tasks.
›3 alternatives

Claude Opus 4.8
Anthropic
Best non-Codex generalist — 69.2% SWE-bench Pro

Qwen3-Coder-480B-A35B
Alibaba
Top open-weight agentic coder; ~$0.45/$1.80 hosted

Kimi K2.6 Code
Moonshot
K2.6 ties top closed models on coding at ~$0.60/$2.50 hosted
Cost-efficient reasoning
Best reasoning capability per dollar — the model to use when budgets matter but you still need real thinking.
$2.10 in / $4.40 out per 1M
AA Index 44 at max effort — frontier-adjacent capability at roughly a quarter of GPT-5.5's price. Open-weight 1T-param MoE, so you can self-host if you have the GPUs.
›2 alternatives

o3
OpenAI
Lower index (30 today) but $2/$8 — solid reasoning value after the June 2025 price cut

Gemini 3.5 Flash (high)
Google
AA Index 50 at $1.50/$9 — current Cost-of-Intelligence frontier champion
Cheapest model meeting Budget bar (CoI)
The lowest blended $/1M-token model whose AA Intelligence Index ≥ 25 — i.e. that still does real work.
$0.20 in / $1.25 out per 1M ($0.46 blended)
Current CoI budget-capable champion. AA Index 38 at xhigh effort, $0.46 blended — cheaper than any other model that clears the 25 capability gate on the AA scale. Updated by the engine daily.
›2 alternatives

Claude Haiku 4.5
Anthropic
AA Index 24 (just under bar) at $1.00/$5.00

Mistral Small 4 (reasoning)
Mistral
AA Index 21 at $0.15/$0.30 — cheaper but doesn't clear the bar
Best open-weight LLM
Highest-capability model whose weights are downloadable — for self-hosting, fine-tuning, on-prem deployment.

Kimi K2 Thinking
Moonshot AI
MIT license · 1T total / 32B active
Topped the AA Intelligence Index for open weights at launch (Nov 2025) with 33 on v4.1. Modified MIT license — commercial use OK. 256K context, agentic tool-use first-class.
›3 alternatives

DeepSeek V4 Pro
DeepSeek
Higher AA Index (44) but heavier infrastructure ask (1T params)

GLM-4.6 (reasoning)
Z.ai
AA Index 25, MIT license, 357B params — easier self-host

Qwen3.7 Plus
Alibaba
AA Index 39 at the time of writing — leading Qwen lineage
Long-context champion
Best model for genuinely long inputs — multi-document analysis, large codebases, hours of transcripts.
2M context · $2 in / $12 out per 1M
2M-token native context — twice GPT-5.5's window. Strong recall on long-context benchmarks (AA-LCR). At-tier pricing means it's the obvious default when context > 1M tokens.
›3 alternatives

Grok 4.20
xAI
2M context too; cheaper at $1.25/$2.50 if you tolerate slightly lower scores

GPT-5.5
OpenAI
1M context; better tool-use but half the window

Llama 4 Scout
Meta
10M context — biggest of any open weight; cheaper hosted
Cheapest model that's still serviceable
For high-volume / latency-sensitive uses where capability matters less — classification, routing, simple Q&A.
$0.035 in / $0.14 out per 1M ($0.06 blended)
Cheapest non-trivial model in the basket. 305 tokens/sec — among the fastest. AA Index 5 means "don't ask it to reason", but it nails high-volume routing / classification work for ~6¢ per million blended tokens.
›3 alternatives

Phi-4
Microsoft
$0.07/$0.14 — open-weight, slightly better at small tasks

GPT-4.1 Nano
OpenAI
$0.10/$0.40 — OpenAI's cheapest

Gemini 3 Flash
Google
$0.50/$3.00 — pricier but actually capable