AIME — AI model leaderboard
AI models ranked by AIME, an aggregated third-party benchmark from artificial_analysis. Higher is better. Cross-referenced against our first-party meo scores and Effective Value (𝕍).
Ranking 25 models across the full field · as of 2026-06-07.
| # | Model | Lab | AIME |
|---|---|---|---|
| 1 | OpenAI: GPT-5 | openai | 95.7% |
| 2 | OpenAI: o4 Mini | openai | 94.0% |
| 3 | OpenAI: o3 | openai | 90.3% |
| 4 | Google: Gemini 2.5 Pro | 88.7% | |
| 5 | OpenAI: o3 Mini High | openai | 86.0% |
| 6 | OpenAI: o3 Mini | openai | 77.0% |
| 7 | OpenAI: o1 | openai | 72.3% |
| 8 | Reka Flash 3 | rekaai | 51.0% |
| 9 | Google: Gemini 2.5 Flash | 50.0% | |
| 10 | OpenAI: GPT-4.1 | openai | 43.7% |
| 11 | OpenAI: GPT-4.1 Mini | openai | 43.0% |
| 12 | Meta: Llama 4 Maverick | meta-llama | 39.0% |
| 13 | Meta: Llama 4 Scout | meta-llama | 28.3% |
| 14 | Google: Gemma 3 27B | 25.3% | |
| 15 | OpenAI: GPT-4.1 Nano | openai | 23.7% |
| 16 | Google: Gemma 3 12B | 22.0% | |
| 17 | OpenAI: GPT-4o | openai | 15.0% |
| 18 | OpenAI: GPT-4 Turbo | openai | 15.0% |
| 19 | Microsoft: Phi 4 | microsoft | 14.3% |
| 20 | OpenAI: GPT-4o (2024-08-06) | openai | 11.7% |
| 21 | OpenAI: GPT-4o-mini | openai | 11.7% |
| 22 | OpenAI: GPT-4o (2024-05-13) | openai | 11.0% |
| 23 | Cohere: Command A | cohere | 9.7% |
| 24 | Google: Gemma 3 4B | 6.3% | |
| 25 | Microsoft: Phi 4 Mini Instruct | microsoft | 3.0% |
Artificial Analysis (artificialanalysis.ai). Redistribution requires an AA commercial license.