AI Model Leaderboard
As of 2026-06-08, the highest-accuracy AI model on the un-leaked meo benchmark is OpenAI: GPT-5.5 (73.2%), while Owl Alpha offers the best intelligence-per-dollar at $0.0000 per correct answer. The board fuses accuracy, speed, cost, and error-cascade into Effective Value (π) β and the ranking inverts with task depth: fast models win one-shot tasks, accurate models dominate deep agentic chains.
| Lab | Weights | Arena Elo | AA Intel | GPQA | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | OpenAI: GPT-5.5 | openai | 73.2% | 73.6 | 0.010 | $0.0435 | 29 | 1,380 | 1.1M | $5.00 | $30.00 | β | β | 60.2 | 94% |
| 2 | Anthropic: Claude Opus 4.8 | anthropic | 70.8% | 100.0 | -0.031 | $0.0584 | 27 | 2,210 | 1M | $5.00 | $25.00 | β | β | 61.4 | 92% |
| 3 | Google: Gemini 3.1 Pro Preview | 60.4% | 7.8 | 0.021 | $0.0968 | 69 | 7,867 | 1.0M | $2.00 | $12.00 | β | β | 57.2 | 94% | |
| 4 | Google: Gemini 3.5 Flash | 60.0% | 16.3 | β | $0.0705 | 45 | 7,643 | 1.0M | $1.50 | $9.00 | β | β | 55.3 | 92% | |
| 5 | Qwen: Qwen3.7 Max | qwen | 56.6% | 0.58 | β | $0.0320 | 201 | 8,309 | 1M | $1.25 | $3.75 | β | β | 56.6 | 92% |
| 6 | xAI: Grok 4.3 | x-ai | 56.6% | 21.2 | -0.031 | $0.0182 | 27 | 5,679 | 1M | $1.25 | $2.50 | β | β | 53.2 | 90% |
| 7 | DeepSeek: DeepSeek V4 Flash | deepseek | 56.2% | 0.30 | β | $0.0037 | 370 | 15,115 | 1.0M | $0.10 | $0.20 | Open | β | 46.5 | 89% |
| 8 | inclusionAI: Ring-2.6-1T | inclusionai | 56.0% | 5.9 | β | $0.0052 | 59 | 8,190 | 262K | $0.07 | $0.63 | β | β | 38.5 | 86% |
| 9 | MoonshotAI: Kimi K2.6 | moonshotai | 54.6% | 0.32 | β | $0.0440 | 251 | 11,698 | 262K | $0.68 | $3.42 | Open | β | β | β |
| 10 | DeepSeek: DeepSeek V4 Pro | deepseek | 52.2% | 0.17 | β | $0.0473 | 327 | 15,478 | 1.0M | $0.43 | $0.87 | Open | β | 51.5 | 89% |
| 11 | Xiaomi: MiMo-V2.5 | xiaomi | 50.4% | 0.68 | β | $0.0026 | 94 | 8,813 | 1.0M | $0.14 | $0.28 | Open | β | β | β |
| 12 | Qwen: Qwen3.7 Plus | qwen | 50.4% | 0.10 | β | $0.0119 | 213 | 7,269 | 1M | $0.40 | $1.60 | β | β | 53.3 | 90% |
| 13 | NVIDIA: Nemotron 3 Ultra | nvidia | 50.3% | 0.73 | β | $0.0496 | 147 | 19,710 | 1M | $0.50 | $2.50 | Open | β | β | β |
| 14 | MiniMax: MiniMax M3 | minimax | 50.2% | 0.22 | β | $0.0074 | 129 | 5,956 | 1.0M | $0.30 | $1.20 | β | β | 54.7 | 93% |
| 15 | Xiaomi: MiMo-V2.5-Pro | xiaomi | 47.8% | 0.095 | β | $0.0563 | 288 | 18,519 | 1.0M | $0.43 | $0.87 | Open | β | 53.8 | 87% |
| 16 | Owl Alpha | openrouter | 45.8% | 0.071 | β | $0.0000 | 146 | 6,134 | 1.0M | $0.00 | $0.00 | β | β | β | β |
| 17 | Perceptron: Perceptron Mk1 | perceptron | 45.3% | 0.26 | β | $0.0013 | 24 | 772 | 33K | $0.15 | $1.50 | β | β | β | β |
| 18 | Z.ai: GLM 5.1 | z-ai | 36.0% | 0.003 | β | $0.1058 | 434 | 26,241 | 203K | $0.98 | $3.08 | Open | β | 51.4 | 87% |
| 19 | Google: Gemini 3.1 Flash Lite | 35.5% | 0.052 | 0.021 | $0.0326 | 72 | 21,444 | 1.0M | $0.25 | $1.50 | β | β | 33.5 | 82% | |
| 20 | StepFun: Step 3.7 Flash | stepfun | 26.6% | <0.001 | β | $0.0256 | 191 | 22,065 | 256K | $0.20 | $1.15 | β | β | 42.6 | 81% |
| 21 | Tencent: Hy3 preview | tencent | 25.0% | <0.001 | β | $0.0140 | 579 | 57,712 | 262K | $0.06 | $0.21 | Open | β | 41.9 | 87% |
| 22 | Arcee AI: Trinity Large Thinking | arcee-ai | 21.1% | <0.001 | β | $0.0352 | 289 | 36,511 | 262K | $0.22 | $0.85 | Open | β | 31.9 | 75% |
Showing 22 of 341 models Β· as of 2026-06-08. Methodology & π β
Third-party benchmark values: Artificial Analysis (artificialanalysis.ai) β shown under their attribution terms; not redistributed in downloads.
Crowd preference (Arena Elo): LMArena / Chatbot Arena via mathewhe/chatbot-arena-elo (Apache-2.0).
Model metadata + pricing via OpenRouter. First-party scores + Effective Value (π) by the meo-benchmark project.