Skip to main content

Terminal-Bench Hard — AI model leaderboard

AI models ranked by Terminal-Bench Hard, an aggregated third-party benchmark from artificial_analysis. Higher is better. Cross-referenced against our first-party meo scores and Effective Value (𝕍).

Ranking 79 models across the full field · as of 2026-06-07.

#ModelLabTerminal-Bench Hard
1OpenAI: GPT-5.5openai60.6%
2Anthropic: Claude Opus 4.8anthropic58.3%
3OpenAI: GPT-5.4openai57.6%
4Google: Gemini 3.1 Pro Previewgoogle53.8%
5OpenAI: GPT-5.3-Codexopenai53.0%
6OpenAI: GPT-5.4 Miniopenai52.3%
7Anthropic: Claude Opus 4.7anthropic51.5%
8Qwen: Qwen3.7 Maxqwen50.8%
9Kwaipilot: KAT-Coder-Pro V2kwaipilot49.2%
10Qwen: Qwen3.7 Plusqwen47.0%
11OpenAI: GPT-5.2 Chatopenai47.0%
12Anthropic: Claude Sonnet 4.6anthropic46.2%
13DeepSeek: DeepSeek V4 Prodeepseek46.2%
14OpenAI: GPT-5.1openai45.5%
15Qwen: Qwen3.6 Plusqwen43.9%
16Xiaomi: MiMo-V2.5-Proxiaomi43.2%
17Z.ai: GLM 5.1z-ai43.2%
18OpenAI: GPT-5.4 Nanoopenai42.4%
19MiniMax: MiniMax M3minimax42.4%
20Google: Gemini 3.5 Flashgoogle40.9%
21Qwen: Qwen3.5 397B A17Bqwen40.9%
22MiniMax: MiniMax M2.7minimax39.4%
23OpenAI: GPT-5 Codexopenai37.9%
24xAI: Grok 4.3x-ai37.9%
25OpenAI: o3openai37.1%
26OpenAI: GPT-5.2-Codexopenai37.1%
27Google: Gemma 4 31Bgoogle36.4%
28DeepSeek: DeepSeek V4 Flashdeepseek35.6%
29StepFun: Step 3.7 Flashstepfun35.6%
30Qwen: Qwen3.6 27Bqwen34.8%
31Qwen: Qwen3.6 35B A3Bqwen34.8%
32OpenAI: GPT-5.1-Codexopenai34.8%
33Tencent: Hy3 previewtencent34.1%
34Z.ai: GLM 5 Turboz-ai33.3%
35OpenAI: GPT-5 Miniopenai33.3%
36Mistral: Mistral Medium 3.5mistralai33.3%
37OpenAI: GPT-5.1-Codex-Miniopenai33.3%
38OpenAI: GPT-5openai32.6%
39Z.ai: GLM 5V Turboz-ai32.6%
40StepFun: Step 3.5 Flashstepfun32.6%
41Google: Gemini 3 Flash Previewgoogle31.8%
42Qwen: Qwen3.5-122B-A10Bqwen31.1%
43inclusionAI: Ling-2.6-1Tinclusionai31.1%
44inclusionAI: Ring-2.6-1Tinclusionai28.8%
45Google: Gemini 2.5 Progoogle26.5%
46Inception: Mercury 2inception26.5%
47Xiaomi: MiMo-V2-Flashxiaomi25.8%
48Google: Gemini 3.1 Flash Litegoogle24.2%
49Qwen: Qwen3.5-9Bqwen24.2%
50OpenAI: gpt-oss-120bopenai23.5%
51Arcee AI: Trinity Large Thinkingarcee-ai22.7%
52inclusionAI: Ling-2.6-flashinclusionai21.2%
53Qwen: Qwen3 Coder Nextqwen18.2%
54OpenAI: o4 Miniopenai15.2%
55Google: Gemma 4 26B A4B (free)google13.6%
56OpenAI: GPT-4.1openai13.6%
57OpenAI: o1openai12.9%
58OpenAI: GPT-5 Nanoopenai12.1%
59Google: Gemini 2.5 Flashgoogle12.1%
60OpenAI: gpt-oss-20bopenai10.6%
61Prime Intellect: INTELLECT-3prime-intellect9.1%
62OpenAI: GPT-4o (2024-08-06)openai8.3%
63OpenAI: GPT-4oopenai8.3%
64Upstage: Solar Pro 3upstage7.6%
65OpenAI: GPT-4.1 Miniopenai7.6%
66Google: Gemini 2.5 Flash Lite Preview 09-2025google7.6%
67OpenAI: o3 Miniopenai6.8%
68Meta: Llama 4 Maverickmeta-llama6.8%
69OpenAI: o3 Mini Highopenai6.1%
70OpenAI: GPT-4.1 Nanoopenai3.8%
71Microsoft: Phi 4microsoft3.8%
72Google: Gemma 3 27Bgoogle3.8%
73Meta: Llama 4 Scoutmeta-llama1.5%
74Cohere: Command Acohere0.8%
75Google: Gemma 3 12Bgoogle0.8%
76Google: Gemma 3 4Bgoogle0.8%
77IBM: Granite 4.1 8Bibm-granite0.0%
78Reka Flash 3rekaai0.0%
79Microsoft: Phi 4 Mini Instructmicrosoft0.0%

Artificial Analysis (artificialanalysis.ai). Redistribution requires an AA commercial license.

← All rankingsMethodology & 𝕍 →