Champions

The current best model for each task

A curated answer to “which model should I use for X?” across language, image, video, audio, music, and specialized tasks. 34categories, each with a current champion, why it's on top, and the closest alternatives.

Last reviewed 2026-06-19• updated when a major release moves a category

Language models

8 categories

Champions across reasoning, coding, long-context, cost-efficient, and open-weight LLM categories. The IFX basket and Cost-of-Intelligence index sit in this group.

Frontier reasoning

The most capable model for hard multi-step problems — graduate-level science, long-horizon code agents, complex math.

Claude Fable 5

Anthropic

premium tier · highest AA Intelligence Index score

Tops Artificial Analysis Intelligence Index v4.1 at 60 with adaptive reasoning + Opus 4.8 fallback. Wins on agentic benchmarks (Terminal-Bench, GDPval) and scientific reasoning (HLE, CritPt).

2 alternatives
  • Claude Opus 4.8 (max effort)

    Anthropic

    AA Index 56; the workhorse if you don't need Fable 5's full reasoning stack

  • GPT-5.5 (xhigh)

    OpenAI

    AA Index 55; matches Opus 4.8 on most tasks at a similar tier

source

2026-06-19

Frontier general-purpose

Best balance of capability + speed + tool use for everyday production workloads at the frontier tier.

GPT-5.5

OpenAI

$5 in / $30 out per 1M

Frontier scoring (AA Index 55 at xhigh effort), 1M-context multimodal with strong agentic computer-use. The default if you want one model that does everything well.

2 alternatives
  • Claude Opus 4.8

    Anthropic

    Slightly better at coding + reasoning; same tier on price

  • Gemini 3.1 Pro

    Google

    2M context window; cheaper per-token but slightly behind on the index

2026-06-19

Agentic coding

Best at autonomous code generation across a real repo — multi-file edits, terminal use, test-fixing.

GPT-5.3-Codex

OpenAI

$1.75 in / $14 out per 1M

SOTA on SWE-bench Pro at launch (Feb 2026). 400K context with strong Terminal-Bench 2.0 + OSWorld-Verified scores. The model Cursor / Devin / Codex CLI default to for hard code tasks.

3 alternatives
  • Claude Opus 4.8

    Anthropic

    Best non-Codex generalist — 69.2% SWE-bench Pro

  • Qwen3-Coder-480B-A35B

    Alibaba

    Top open-weight agentic coder; ~$0.45/$1.80 hosted

  • Kimi K2.6 Code

    Moonshot

    K2.6 ties top closed models on coding at ~$0.60/$2.50 hosted

2026-06-19

Cost-efficient reasoning

Best reasoning capability per dollar — the model to use when budgets matter but you still need real thinking.

DeepSeek V4 Pro

DeepSeek

$2.10 in / $4.40 out per 1M

AA Index 44 at max effort — frontier-adjacent capability at roughly a quarter of GPT-5.5's price. Open-weight 1T-param MoE, so you can self-host if you have the GPUs.

2 alternatives
  • o3

    OpenAI

    Lower index (30 today) but $2/$8 — solid reasoning value after the June 2025 price cut

  • Gemini 3.5 Flash (high)

    Google

    AA Index 50 at $1.50/$9 — current Cost-of-Intelligence frontier champion

2026-06-19

Cheapest model meeting Budget bar (CoI)

The lowest blended $/1M-token model whose AA Intelligence Index ≥ 25 — i.e. that still does real work.

GPT-5.4 Nano

OpenAI

$0.20 in / $1.25 out per 1M ($0.46 blended)

Current CoI budget-capable champion. AA Index 38 at xhigh effort, $0.46 blended — cheaper than any other model that clears the 25 capability gate on the AA scale. Updated by the engine daily.

2 alternatives
  • Claude Haiku 4.5

    Anthropic

    AA Index 24 (just under bar) at $1.00/$5.00

  • Mistral Small 4 (reasoning)

    Mistral

    AA Index 21 at $0.15/$0.30 — cheaper but doesn't clear the bar

source

2026-06-19

Best open-weight LLM

Highest-capability model whose weights are downloadable — for self-hosting, fine-tuning, on-prem deployment.

Kimi K2 Thinking

Moonshot AI

MIT license · 1T total / 32B active

Topped the AA Intelligence Index for open weights at launch (Nov 2025) with 33 on v4.1. Modified MIT license — commercial use OK. 256K context, agentic tool-use first-class.

3 alternatives
  • DeepSeek V4 Pro

    DeepSeek

    Higher AA Index (44) but heavier infrastructure ask (1T params)

  • GLM-4.6 (reasoning)

    Z.ai

    AA Index 25, MIT license, 357B params — easier self-host

  • Qwen3.7 Plus

    Alibaba

    AA Index 39 at the time of writing — leading Qwen lineage

2026-06-19

Long-context champion

Best model for genuinely long inputs — multi-document analysis, large codebases, hours of transcripts.

Gemini 3.1 Pro

Google

2M context · $2 in / $12 out per 1M

2M-token native context — twice GPT-5.5's window. Strong recall on long-context benchmarks (AA-LCR). At-tier pricing means it's the obvious default when context > 1M tokens.

3 alternatives
  • Grok 4.20

    xAI

    2M context too; cheaper at $1.25/$2.50 if you tolerate slightly lower scores

  • GPT-5.5

    OpenAI

    1M context; better tool-use but half the window

  • Llama 4 Scout

    Meta

    10M context — biggest of any open weight; cheaper hosted

2026-06-19

Cheapest model that's still serviceable

For high-volume / latency-sensitive uses where capability matters less — classification, routing, simple Q&A.

Amazon Nova Micro

Amazon

$0.035 in / $0.14 out per 1M ($0.06 blended)

Cheapest non-trivial model in the basket. 305 tokens/sec — among the fastest. AA Index 5 means "don't ask it to reason", but it nails high-volume routing / classification work for ~6¢ per million blended tokens.

3 alternatives
  • Phi-4

    Microsoft

    $0.07/$0.14 — open-weight, slightly better at small tasks

  • GPT-4.1 Nano

    OpenAI

    $0.10/$0.40 — OpenAI's cheapest

  • Gemini 3 Flash

    Google

    $0.50/$3.00 — pricier but actually capable

2026-06-19

Image generation

6 categories

Champions across photorealism, in-image typography, in-context editing, speed-cheap, aesthetic style, and open-weight.

Photorealism

Best at indistinguishable-from-photograph realism — faces, lighting, materials, fine detail.

Nano Banana Pro (Gemini 3 Pro Image)

Google DeepMind

premium per-image · 4K output

Widely cited as the strongest image model of late 2025/2026. Up to 4K, industry-leading text rendering, 14 reference images, 5-subject identity locking, Search grounding for factual realism. Won the late-2025 arena leaderboards.

3 alternatives
  • Imagen 4 Ultra

    Google DeepMind

    Photoreal flagship — sharper detail, slightly less editing flexibility

  • Seedream 5.0

    ByteDance

    Won AA's image arena at launch — reasoning + search grounding

  • FLUX.2 [pro]

    Black Forest Labs

    Strong photorealism, best in-context editing of any flagship

2026-06-19

Text-in-image / typography

Best at rendering legible, correctly-spelled, well-laid-out text inside generated images — posters, logos, slides.

Ideogram 3.0

Ideogram

from $15/mo · API ~$0.06/image

Best-in-class embedded text rendering since 2024 and still the default for typography / poster / logo work. Multiple style modes; speed tiers; clean API.

3 alternatives
  • GPT Image 2

    OpenAI

    ~99% character-level text accuracy across multiple scripts

  • Nano Banana Pro

    Google

    Strong on text but expensive; better when text + photoreal both matter

  • Qwen-Image 2.0

    Alibaba

    Leading open-weight for non-Latin scripts

2026-06-19

In-context image editing

Best at editing an existing image with text + image prompts — add/remove objects, change style, swap subjects.

FLUX.1 Kontext (Pro/Max)

Black Forest Labs

API per-image · Dev variant open weights

Dedicated in-context editing suite. Text + image prompted editing with strong consistency across edits — the standard for AI-assisted Photoshop-style workflows.

3 alternatives
  • Nano Banana (Gemini 2.5 Flash Image)

    Google

    Conversational multi-turn editing — viral late 2025

  • GPT Image 2

    OpenAI

    Multi-turn edits with strong text consistency

  • Firefly Generative Fill

    Adobe

    Dominant for production inpainting inside Photoshop

2026-06-19

Cheap & fast generation

Best $/image for high-volume gen — thumbnails, product variants, mockups, programmatic art.

Luma Photon Flash

Luma Labs

~$0.002 per 1080p image

Universal Transformer (non-diffusion) at radical cost-efficiency. ~$0.002 per 1080p image — orders of magnitude cheaper than Imagen Ultra or Nano Banana Pro, with quality that's competitive for non-portrait work.

3 alternatives
  • Nano Banana (Gemini 2.5 Flash Image)

    Google

    $0.039/image — solid quality, still cheap

  • FLUX.2 [klein]

    Black Forest Labs

    Open Apache-2.0 weights; free self-host; <1s multi-image

  • GPT Image 1 Mini

    OpenAI

    $0.005-$0.052/image — OpenAI's cheap tier

2026-06-19

Best open-weight image

Best image model you can download and run on your own GPU.

HiDream-O1-Image

HiDream AI

8B unified transformer · free self-host

May 2026 release; claimed to beat GPT Image 2 and Seedream 4.0 on quality benchmarks. Open weights — true SOTA self-host. 8B unified transformer architecture.

3 alternatives
  • FLUX.1 [dev]

    Black Forest Labs

    The workhorse that displaced SDXL — 12B DiT

  • Qwen-Image 2.0

    Alibaba

    Best open for non-Latin text; 20B MMDiT Apache 2.0

  • Stable Diffusion 3.5 Large

    Stability AI

    8B MMDiT — proven SD lineage, huge LoRA ecosystem

2026-06-19

Aesthetic / artistic style

Best at painterly, illustrative, stylized output — for art, concept work, distinctive look.

Midjourney V8.1

Midjourney

subscription only · Basic $10/mo

Aesthetic leader for 3+ years. V8.1 incrementally improves realism + consistency over V7 while keeping the distinctive Midjourney look. App-gated by design.

3 alternatives
  • Leonardo Phoenix

    Leonardo.ai

    Canva-owned; precise prompt + iterative editing

  • Playground v3

    Playground AI

    Strong graphic-design lean

  • Recraft V4

    Recraft

    Design-focused with real vector output too

2026-06-19

Video generation

5 categories

Champions across audio-visual co-generation, character/world consistency, cinematic HDR, low-cost per-second, and open-weight.

Audio-visual co-generation

Best at generating video + synchronized audio (dialogue, SFX, music) in a single pass — table stakes since Veo 3.

Veo 3.1

Google DeepMind

$0.40/s w/ audio · $0.03/s Lite

Native synchronized 48kHz audio with tight lip-sync; 4/6/8s clips at 24fps, 720p/1080p (4K premium). Made A/V co-generation table stakes; still the highest-quality result when audio matters.

3 alternatives
  • LTX-2

    Lightricks

    First production open-weight A/V — 4K@50fps, up to 20s

  • Kling 2.6

    Kuaishou

    Simultaneous audio-visual at 1080p@48fps, 10s

  • Wan 2.6

    Alibaba

    Open-weight; single-pass sync audio up to ~15s

2026-06-19

Character / world consistency

Best at keeping the same character / setting consistent across cuts, shots, and durations — the hard problem for narrative video.

Runway Gen-4

Runway

$0.01/credit · plans $12–$188/mo

Breakthrough single-image character/world consistency at Gen-4's launch (Mar 2025); Motion Brush and Director Mode camera control. Still the go-to when you need a recognizable face or setting across multiple shots.

3 alternatives
  • Seedance 2.0 Pro

    ByteDance

    Native multi-shot narrative with consistency across cuts

  • Higgsfield Cinema Studio

    Higgsfield

    Character locking + camera move stacking

  • Vidu Q2

    Shengshu

    Up to 7 reference images for multi-subject consistency

2026-06-19

Cinematic HDR

Best for cinematic camera work, depth, dynamic range — film-quality output, not social-clip output.

Luma Ray3.14

Luma Labs

Plus $30 / Pro $90 / Ultra $300 per mo

First model with native 16-bit HDR (EXR export). Reasoning-driven refinement, cinematic camera motion, durations up to 20s. HDR leader — though no native audio generation yet.

3 alternatives
  • Veo 3.1 Premium 4K

    Google

    4K with native audio; less HDR latitude

  • Moonvalley Marey

    Moonvalley

    First fully-licensed model; built for studio production

  • Runway Gen-4

    Runway

    Up to 4K; consistent character + cinematic motion

2026-06-19

Best $/sec for non-premium video

Best video model at the budget tier — for social clips, drafts, prototyping at scale.

Hailuo 02

MiniMax

~$0.045/s at 768p · ~$0.28/generation

Native 1080p, strong physics, consistently top-3 on AA's video arena. Cheapest mainstream video model at this quality level. 6s or 10s clip durations.

3 alternatives
  • Veo 3.1 Lite

    Google

    $0.03/s without audio — the only sub-cent-per-second contender at quality

  • Seedance 1.5 Lite

    ByteDance

    ~$0.18 per 720p 5s — cheap for multi-shot needs

  • Vidu Q1

    Shengshu

    $0.0375/s — cheapest with cinematic transitions

2026-06-19

Best open-weight video

Best video model you can run on your own hardware.

LTX-2

Lightricks

open weights · API also available

First production-ready truly open-weight model with single-pass synchronized audio. Native 4K@50fps, up to 20s. Asymmetric dual-stream transformer. The open answer to Veo-style A/V generation.

3 alternatives
  • Wan 2.6

    Alibaba

    Open weights, sync audio, 1.3B variant runs on 8GB VRAM

  • HunyuanVideo 1.5

    Tencent

    Lightweight 8.3B; 6s 720p in ~75s on a single RTX 4090

  • Mochi 1

    Genmo

    10B Apache-2.0; the early open video high-fidelity reference

2026-06-19

Text-to-speech

4 categories

Champions across expressive narration, real-time low-latency, multilingual coverage, voice cloning, and open-weight.

Most expressive narration

Best at conveying emotion, nuance, character voice — for audiobooks, narration, dialogue.

Eleven v3

ElevenLabs

character credits · API + app

Inline Audio Tags for fine-grained emotion control. 70+ languages. Best in class for expressive narration / dialogue work, especially with the v3 GA release in 2026.

3 alternatives
  • Hume Octave

    Hume AI

    First speech-LLM that understands script meaning; "Acting Instructions"

  • Rime Arcana

    Rime

    Autoregressive codec-based; highly expressive on customer-service speech

  • Gemini-TTS

    Google

    Natural-language style/emotion control; top AA Elo (~1,211)

2026-06-19

Real-time / lowest-latency

Best for live voice agents, conversational AI, call centers — where first-chunk latency matters more than expressiveness.

Cartesia Sonic 3

Cartesia

~$35/1M chars

Sub-100ms first-chunk latency. State-space model architecture. 42 languages, natural laughter, voice cloning. The latency leader for voice agents.

3 alternatives
  • Cartesia Sonic Turbo

    Cartesia

    ~40ms latency; cheaper at $0.0225/min

  • Eleven Flash v2.5

    ElevenLabs

    ~75ms latency; 32 languages; half ElevenLabs's per-char rate

  • Play 3.0 Mini

    PlayAI

    ~143ms TTFB; cheapest in Play's lineup

2026-06-19

Broadest multilingual coverage

Best when you need lots of languages with consistent voice quality — global apps, accessibility tools.

Azure AI Speech (Neural / HD)

Microsoft

~$15/1M chars

400+ neural voices across 140+ languages — broadest enterprise coverage. Custom neural voice for brand voices. Emotionally-aware HD voices for the latest tier.

3 alternatives
  • Eleven Multilingual v2

    ElevenLabs

    29 languages with high consistency for audiobooks/dubbing

  • Gemini-TTS

    Google

    70+ locales with natural-language style control

  • Chirp 3 HD

    Google

    30+ languages; voice cloning; production HD tier

2026-06-19

Best open-weight TTS

Best TTS you can self-host — for privacy-sensitive deployments, offline use, edge.

Kokoro-82M

hexgrad (open)

Apache-2.0 · <$1/1M chars hosted

82M params, 8 languages, 54 voices. Punches well above weight at this size. Apache-2.0 license — true commercial-friendly self-host.

1 alternative
  • Voxtral TTS

    Mistral AI

    4B streaming speech model; low-latency multilingual; Apache 2.0

2026-06-19

Speech-to-text

3 categories

Champions across batch accuracy, real-time streaming, multilingual coverage, and accent robustness.

Highest batch accuracy

Best for offline batch transcription where you need the lowest word error rate — legal, medical, archival.

ElevenLabs Scribe v2 (Batch)

ElevenLabs

premium per-min

Lowest WER on benchmarks; 90–99 languages; ~97% accuracy on English; built-in diarization. The accuracy leader when latency doesn't matter.

2 alternatives
  • AssemblyAI Universal-2

    AssemblyAI

    High-accuracy English + 99 languages; rich audio-intel add-ons

  • Deepgram Nova-3 batch

    Deepgram

    ~5.3% English WER; cheap at ~$0.0043/min

2026-06-19

Real-time streaming

Best for live captioning, voice agents, meeting bots — where low latency matters more than perfect accuracy.

Deepgram Nova-3

Deepgram

~$0.0077/min streaming

~5.3% English WER with sub-second streaming. Speed + price leader. Real-time customization for domain vocabularies.

2 alternatives
  • ElevenLabs Scribe v2 Realtime

    ElevenLabs

    Sub-150ms latency, ~93.5% across 30 languages

  • AssemblyAI Universal-3 Streaming

    AssemblyAI

    Ultra-low-latency; ~$0.15/hr

2026-06-19

Multilingual / open-weight

Best open or multilingual STT — when you need wide language coverage or offline operation.

NVIDIA Parakeet TDT 0.6B v3

NVIDIA

open weights (NeMo / HF)

#1 on HuggingFace Open ASR Leaderboard with the highest throughput. FastConformer architecture. The cheap-high-throughput batch option when you can self-host.

3 alternatives
  • Whisper large-v3

    OpenAI (open)

    ~99 languages; the open baseline most ASR benchmarks against; MIT license

  • NVIDIA Canary (Qwen 2.5B)

    NVIDIA

    Multilingual ASR + EN↔24-lang translation; ~5.6% WER

  • Voxtral Transcribe 2

    Mistral AI

    Beats Whisper large-v3; Apache 2.0; ~$0.003/min

2026-06-19

Music & sound

3 categories

Champions across song generation, instrumental composition, music editing, and text-to-SFX.

Song generation (vocals + instrumentation)

Best at producing a full song with coherent structure, vocals, and instrumentation.

Suno v5.5

Suno

Pro / Premier subscription · credit-based

Most expressive Suno yet. Personal voice cloning, custom-style fine-tuning, My Taste personalization. Leading consumer AI song generator with coherent multi-minute output.

2 alternatives
  • Udio (Allegro v1.5)

    Udio

    ~130-sec coherent songs; favored for audio quality

  • Eleven Music v2

    ElevenLabs

    Section-by-section composition with mid-track genre transitions

2026-06-19

Instrumental composition

Best for production-grade instrumental tracks — film scoring, background, ambient, royalty-free use.

Google Lyria 2

Google DeepMind

Vertex AI API

High-fidelity 48 kHz stereo instrumental. Broad genre control + SynthID watermarking. Enterprise-grade for commercial production.

3 alternatives
  • Stable Audio 2.5

    Stability AI

    Up to ~3-min tracks in <2s on H100; first enterprise-focused ultra-fast audio

  • Google Lyria RealTime

    Google

    Interactive streaming generation for live music steering

  • MusicGen

    Meta

    Open-weight (code MIT) — the foundational open music model

2026-06-19

Text-to-SFX / sound design

Best for generating sound effects, ambient soundscapes, foley from text prompts.

ElevenLabs Sound Effects

ElevenLabs

per-generation credits

Leading text-to-sound-effects generator. Strong for game audio, video sound design, ambient soundscapes.

1 alternative
  • AudioGen

    Meta (AudioCraft)

    Open-weight companion to MusicGen for environmental sound

2026-06-19

Multimodal & specialized

5 categories

Champions across OCR, embeddings, rerankers, 3D generation, vision/segmentation. The plumbing of modern AI stacks.

OCR / document understanding

Best for converting documents (PDFs, scans, screenshots) into structured text + tables + math.

Mistral OCR 3

Mistral AI

$2/1k pages ($1 batch)

Markdown output with HTML table reconstruction; handles forms, handwriting, math, multilingual. 74% win rate over OCR 2 at launch. Best price point for high-volume document work.

3 alternatives
  • Google Document AI

    Google

    Math OCR to LaTeX, checkbox/font detection — enterprise structured extraction

  • DeepSeek-OCR 2

    DeepSeek (open)

    91% on OmniDocBench; leading open page→markdown

  • olmOCR 2

    Ai2 (open)

    Strong on messy layouts + handwriting; Apache

2026-06-19

General-purpose embeddings

Best vector embeddings for semantic search, RAG, recommendation, similarity tasks.

voyage-3-large

Voyage AI

first 200M tokens free

SOTA general/multilingual. Matryoshka dims, 32K context, int8/binary quantization for cheap serving. Default for serious RAG pipelines in 2026.

3 alternatives
  • Gemini Embedding

    Google

    Default 3,072 dims (MRL to 256), 100+ languages; top-ranked multilingual general

  • Cohere Embed v4

    Cohere

    Multimodal text+image; MTEB 65.2; strong on visual documents

  • Nomic Embed Text v2 (MoE)

    Nomic

    Best quality-to-size open; CPU-runnable; Apache 2.0

2026-06-19

Reranker

Best for the rerank stage of a retrieval pipeline — second-pass scoring of query/document pairs.

Cohere Rerank 3.5

Cohere

$2.00 per 1,000 searches

Cross-encoder; 100+ languages, 4K context. The leading multilingual enterprise reranker with predictable pricing.

2 alternatives
  • Voyage rerank-2.5

    Voyage AI

    Strong accuracy/cost trade-off; jointly scores query+doc

  • Jina Reranker m0

    Jina AI

    Multimodal — reranks text AND image candidates

2026-06-19

3D generation

Best for generating textured 3D meshes from text or image — for games, AR, product viz.

Rodin Gen-2.5 (Hyper3D)

Deemos Tech

web + API (fal) · credit/subscription

Diffusion Transformer (CLAY), quad-mesh + PBR, sculpt-level detail above 10M polygons. Top quality-and-speed in 2025-2026 comparisons.

3 alternatives
  • Hunyuan3D 3.0

    Tencent

    Most-downloaded open 3D family; PBR textures; ~8–20s on A100/4090

  • Meshy 6

    Meshy AI

    Best for 3D printing — auto-rig, animation, STL/3MF export

  • TRELLIS.2-4B

    Microsoft Research

    Leading open production-grade 3D generator

2026-06-19

Image / video segmentation

Best for separating objects from backgrounds — masks, video object tracking, scene parsing.

SAM 3.1 (Segment Anything 3.1)

Meta AI

open weights · free

Unified image + video with text / exemplar / visual prompts. Real-time object multiplex tracking. The default for any vision pipeline needing masks. Open weights.

1 alternative
  • SAM 3

    Meta AI

    Prior gen — still excellent for non-real-time tasks

2026-06-19