Skip to main content

GPQA Diamond — AI model leaderboard

AI models ranked by GPQA Diamond, an aggregated third-party benchmark from artificial_analysis. Higher is better. Cross-referenced against our first-party meo scores and Effective Value (𝕍).

Ranking 82 models across the full field · as of 2026-06-07.

#ModelLabGPQA Diamond
1Google: Gemini 3.1 Pro Previewgoogle94.1%
2OpenAI: GPT-5.5openai93.5%
3MiniMax: MiniMax M3minimax92.9%
4Qwen: Qwen3.7 Maxqwen92.3%
5Google: Gemini 3.5 Flashgoogle92.2%
6Anthropic: Claude Opus 4.8anthropic92.0%
7OpenAI: GPT-5.4openai92.0%
8OpenAI: GPT-5.3-Codexopenai91.5%
9Anthropic: Claude Opus 4.7anthropic91.4%
10OpenAI: GPT-5.2 Chatopenai90.3%
11xAI: Grok 4.3x-ai90.1%
12Qwen: Qwen3.7 Plusqwen90.0%
13OpenAI: GPT-5.2-Codexopenai89.9%
14DeepSeek: DeepSeek V4 Flashdeepseek89.4%
15Qwen: Qwen3.5 397B A17Bqwen89.3%
16DeepSeek: DeepSeek V4 Prodeepseek88.8%
17Qwen: Qwen3.6 Plusqwen88.2%
18OpenAI: GPT-5.4 Miniopenai87.5%
19MiniMax: MiniMax M2.7minimax87.4%
20OpenAI: GPT-5.1openai87.3%
21Z.ai: GLM 5.1z-ai86.8%
22Tencent: Hy3 previewtencent86.7%
23Xiaomi: MiMo-V2.5-Proxiaomi86.6%
24OpenAI: GPT-5.1-Codexopenai86.0%
25inclusionAI: Ring-2.6-1Tinclusionai85.7%
26Google: Gemma 4 31Bgoogle85.7%
27Qwen: Qwen3.5-122B-A10Bqwen85.7%
28Kwaipilot: KAT-Coder-Pro V2kwaipilot85.5%
29OpenAI: GPT-5openai85.4%
30Z.ai: GLM 5 Turboz-ai84.7%
31OpenAI: o3 Proopenai84.5%
32Google: Gemini 2.5 Progoogle84.4%
33Qwen: Qwen3.6 27Bqwen84.2%
34Qwen: Qwen3.6 35B A3Bqwen84.1%
35OpenAI: GPT-5 Codexopenai83.7%
36OpenAI: GPT-5 Miniopenai82.8%
37OpenAI: o3openai82.7%
38StepFun: Step 3.5 Flashstepfun82.6%
39Google: Gemini 3.1 Flash Litegoogle82.2%
40OpenAI: GPT-5.4 Nanoopenai81.7%
41OpenAI: GPT-5.1-Codex-Miniopenai81.3%
42Google: Gemini 3 Flash Previewgoogle81.2%
43StepFun: Step 3.7 Flashstepfun80.9%
44Z.ai: GLM 5V Turboz-ai80.9%
45Qwen: Qwen3.5-9Bqwen80.6%
46Anthropic: Claude Sonnet 4.6anthropic79.9%
47Google: Gemma 4 26B A4B (free)google79.2%
48OpenAI: o4 Miniopenai78.4%
49OpenAI: gpt-oss-120bopenai78.2%
50OpenAI: o3 Mini Highopenai77.3%
51Inception: Mercury 2inception77.0%
52Prime Intellect: INTELLECT-3prime-intellect76.1%
53Arcee AI: Trinity Large Thinkingarcee-ai75.2%
54inclusionAI: Ling-2.6-1Tinclusionai75.2%
55Mistral: Mistral Medium 3.5mistralai74.8%
56OpenAI: o3 Miniopenai74.8%
57OpenAI: o1openai74.7%
58Qwen: Qwen3 Coder Nextqwen73.7%
59Upstage: Solar Pro 3upstage72.4%
60OpenAI: gpt-oss-20bopenai68.8%
61Google: Gemini 2.5 Flashgoogle68.3%
62OpenAI: GPT-5 Nanoopenai67.6%
63Meta: Llama 4 Maverickmeta-llama67.1%
64OpenAI: GPT-4.1openai66.6%
65OpenAI: GPT-4.1 Miniopenai66.4%
66Xiaomi: MiMo-V2-Flashxiaomi65.6%
67Google: Gemini 2.5 Flash Lite Preview 09-2025google65.1%
68inclusionAI: Ling-2.6-flashinclusionai59.3%
69Meta: Llama 4 Scoutmeta-llama58.7%
70Microsoft: Phi 4microsoft57.5%
71OpenAI: GPT-4oopenai54.3%
72Reka Flash 3rekaai52.9%
73Cohere: Command Acohere52.7%
74OpenAI: GPT-4o (2024-05-13)openai52.6%
75OpenAI: GPT-4o (2024-08-06)openai52.1%
76OpenAI: GPT-4.1 Nanoopenai51.2%
77IBM: Granite 4.1 8Bibm-granite43.3%
78Google: Gemma 3 27Bgoogle42.8%
79OpenAI: GPT-4o-miniopenai42.6%
80Google: Gemma 3 12Bgoogle34.9%
81Microsoft: Phi 4 Mini Instructmicrosoft33.1%
82Google: Gemma 3 4Bgoogle29.1%

Artificial Analysis (artificialanalysis.ai). Redistribution requires an AA commercial license.

← All rankingsMethodology & 𝕍 →