MATH-500 — AI model leaderboard
AI models ranked by MATH-500, an aggregated third-party benchmark from artificial_analysis. Higher is better. Cross-referenced against our first-party meo scores and Effective Value (𝕍).
Ranking 25 models across the full field · as of 2026-06-07.
| # | Model | Lab | MATH-500 |
|---|---|---|---|
| 1 | OpenAI: GPT-5 | openai | 99.4% |
| 2 | OpenAI: o3 | openai | 99.2% |
| 3 | OpenAI: o4 Mini | openai | 98.9% |
| 4 | OpenAI: o3 Mini High | openai | 98.5% |
| 5 | OpenAI: o3 Mini | openai | 97.3% |
| 6 | Google: Gemini 2.5 Pro | 96.7% | |
| 7 | Google: Gemini 2.5 Flash | 93.2% | |
| 8 | OpenAI: GPT-4.1 Mini | openai | 92.5% |
| 9 | OpenAI: o1 | openai | 92.4% |
| 10 | OpenAI: GPT-4.1 | openai | 91.3% |
| 11 | Reka Flash 3 | rekaai | 89.3% |
| 12 | Meta: Llama 4 Maverick | meta-llama | 88.9% |
| 13 | Google: Gemma 3 27B | 88.3% | |
| 14 | Google: Gemma 3 12B | 85.3% | |
| 15 | OpenAI: GPT-4.1 Nano | openai | 84.8% |
| 16 | Meta: Llama 4 Scout | meta-llama | 84.4% |
| 17 | Cohere: Command A | cohere | 81.9% |
| 18 | Microsoft: Phi 4 | microsoft | 81.0% |
| 19 | OpenAI: GPT-4o (2024-08-06) | openai | 79.5% |
| 20 | OpenAI: GPT-4o (2024-05-13) | openai | 79.1% |
| 21 | OpenAI: GPT-4o-mini | openai | 78.9% |
| 22 | Google: Gemma 3 4B | 76.6% | |
| 23 | OpenAI: GPT-4o | openai | 75.9% |
| 24 | OpenAI: GPT-4 Turbo | openai | 73.7% |
| 25 | Microsoft: Phi 4 Mini Instruct | microsoft | 69.6% |
Artificial Analysis (artificialanalysis.ai). Redistribution requires an AA commercial license.