Skip to main content

AI Model Leaderboard

As of 2026-06-08, the highest-accuracy AI model on the un-leaked meo benchmark is OpenAI: GPT-5.5 (73.2%), while Owl Alpha offers the best intelligence-per-dollar at $0.0000 per correct answer. The board fuses accuracy, speed, cost, and error-cascade into Effective Value (𝕍) β€” and the ranking inverts with task depth: fast models win one-shot tasks, accurate models dominate deep agentic chains.

Columns
Effective Value (𝕍) β€” re-rank by task depth
LabWeightsArena EloAA IntelGPQA
1OpenAI: GPT-5.5openai73.2%73.60.010$0.0435291,3801.1M$5.00$30.00β€”β€”60.294%
2Anthropic: Claude Opus 4.8anthropic70.8%100.0-0.031$0.0584272,2101M$5.00$25.00β€”β€”61.492%
3Google: Gemini 3.1 Pro Previewgoogle60.4%7.80.021$0.0968697,8671.0M$2.00$12.00β€”β€”57.294%
4Google: Gemini 3.5 Flashgoogle60.0%16.3β€”$0.0705457,6431.0M$1.50$9.00β€”β€”55.392%
5Qwen: Qwen3.7 Maxqwen56.6%0.58β€”$0.03202018,3091M$1.25$3.75β€”β€”56.692%
6xAI: Grok 4.3x-ai56.6%21.2-0.031$0.0182275,6791M$1.25$2.50β€”β€”53.290%
7DeepSeek: DeepSeek V4 Flashdeepseek56.2%0.30β€”$0.003737015,1151.0M$0.10$0.20Openβ€”46.589%
8inclusionAI: Ring-2.6-1Tinclusionai56.0%5.9β€”$0.0052598,190262K$0.07$0.63β€”β€”38.586%
9MoonshotAI: Kimi K2.6moonshotai54.6%0.32β€”$0.044025111,698262K$0.68$3.42Openβ€”β€”β€”
10DeepSeek: DeepSeek V4 Prodeepseek52.2%0.17β€”$0.047332715,4781.0M$0.43$0.87Openβ€”51.589%
11Xiaomi: MiMo-V2.5xiaomi50.4%0.68β€”$0.0026948,8131.0M$0.14$0.28Openβ€”β€”β€”
12Qwen: Qwen3.7 Plusqwen50.4%0.10β€”$0.01192137,2691M$0.40$1.60β€”β€”53.390%
13NVIDIA: Nemotron 3 Ultranvidia50.3%0.73β€”$0.049614719,7101M$0.50$2.50Openβ€”β€”β€”
14MiniMax: MiniMax M3minimax50.2%0.22β€”$0.00741295,9561.0M$0.30$1.20β€”β€”54.793%
15Xiaomi: MiMo-V2.5-Proxiaomi47.8%0.095β€”$0.056328818,5191.0M$0.43$0.87Openβ€”53.887%
16Owl Alphaopenrouter45.8%0.071β€”$0.00001466,1341.0M$0.00$0.00β€”β€”β€”β€”
17Perceptron: Perceptron Mk1perceptron45.3%0.26β€”$0.00132477233K$0.15$1.50β€”β€”β€”β€”
18Z.ai: GLM 5.1z-ai36.0%0.003β€”$0.105843426,241203K$0.98$3.08Openβ€”51.487%
19Google: Gemini 3.1 Flash Litegoogle35.5%0.0520.021$0.03267221,4441.0M$0.25$1.50β€”β€”33.582%
20StepFun: Step 3.7 Flashstepfun26.6%<0.001β€”$0.025619122,065256K$0.20$1.15β€”β€”42.681%
21Tencent: Hy3 previewtencent25.0%<0.001β€”$0.014057957,712262K$0.06$0.21Openβ€”41.987%
22Arcee AI: Trinity Large Thinkingarcee-ai21.1%<0.001β€”$0.035228936,511262K$0.22$0.85Openβ€”31.975%

Showing 22 of 341 models Β· as of 2026-06-08. Methodology & 𝕍 β†’

Third-party benchmark values: Artificial Analysis (artificialanalysis.ai) β€” shown under their attribution terms; not redistributed in downloads.

Crowd preference (Arena Elo): LMArena / Chatbot Arena via mathewhe/chatbot-arena-elo (Apache-2.0).

Model metadata + pricing via OpenRouter. First-party scores + Effective Value (𝕍) by the meo-benchmark project.