AI Model Leaderboard

As of 2026-06-08, the highest-accuracy AI model on the un-leaked meo benchmark is OpenAI: GPT-5.5 (73.2%), while Owl Alpha offers the best intelligence-per-dollar at $0.0000 per correct answer. The board fuses accuracy, speed, cost, and error-cascade into Effective Value (𝕍) — and the ranking inverts with task depth: fast models win one-shot tasks, accurate models dominate deep agentic chains.

SearchLabWeights

Columns

Effective Value (𝕍) — re-rank by task depth

N — task depth: 10chain length: N=1 rewards speed, N≥10 rewards accuracyω — time weight: 1cost of time vs money (the ω≫C_f thesis)δ — debugging friction: 1.5how fast errors compound debugging cost

		Lab										Weights	Arena Elo	AA Intel	GPQA
1	OpenAI: GPT-5.5	openai	73.2%	73.6	0.010	$0.0435	29	1,380	1.1M	$5.00	$30.00	—	—	60.2	94%
2	Anthropic: Claude Opus 4.8	anthropic	70.8%	100.0	-0.031	$0.0584	27	2,210	1M	$5.00	$25.00	—	—	61.4	92%
3	Google: Gemini 3.1 Pro Preview	google	60.4%	7.8	0.021	$0.0968	69	7,867	1.0M	$2.00	$12.00	—	—	57.2	94%
4	Google: Gemini 3.5 Flash	google	60.0%	16.3	—	$0.0705	45	7,643	1.0M	$1.50	$9.00	—	—	55.3	92%
5	Qwen: Qwen3.7 Max	qwen	56.6%	0.58	—	$0.0320	201	8,309	1M	$1.25	$3.75	—	—	56.6	92%
6	xAI: Grok 4.3	x-ai	56.6%	21.2	-0.031	$0.0182	27	5,679	1M	$1.25	$2.50	—	—	53.2	90%
7	DeepSeek: DeepSeek V4 Flash	deepseek	56.2%	0.30	—	$0.0037	370	15,115	1.0M	$0.10	$0.20	Open	—	46.5	89%
8	inclusionAI: Ring-2.6-1T	inclusionai	56.0%	5.9	—	$0.0052	59	8,190	262K	$0.07	$0.63	—	—	38.5	86%
9	MoonshotAI: Kimi K2.6	moonshotai	54.6%	0.32	—	$0.0440	251	11,698	262K	$0.68	$3.42	Open	—	—	—
10	DeepSeek: DeepSeek V4 Pro	deepseek	52.2%	0.17	—	$0.0473	327	15,478	1.0M	$0.43	$0.87	Open	—	51.5	89%
11	Xiaomi: MiMo-V2.5	xiaomi	50.4%	0.68	—	$0.0026	94	8,813	1.0M	$0.14	$0.28	Open	—	—	—
12	Qwen: Qwen3.7 Plus	qwen	50.4%	0.10	—	$0.0119	213	7,269	1M	$0.40	$1.60	—	—	53.3	90%
13	NVIDIA: Nemotron 3 Ultra	nvidia	50.3%	0.73	—	$0.0496	147	19,710	1M	$0.50	$2.50	Open	—	—	—
14	MiniMax: MiniMax M3	minimax	50.2%	0.22	—	$0.0074	129	5,956	1.0M	$0.30	$1.20	—	—	54.7	93%
15	Xiaomi: MiMo-V2.5-Pro	xiaomi	47.8%	0.095	—	$0.0563	288	18,519	1.0M	$0.43	$0.87	Open	—	53.8	87%
16	Owl Alpha	openrouter	45.8%	0.071	—	$0.0000	146	6,134	1.0M	$0.00	$0.00	—	—	—	—
17	Perceptron: Perceptron Mk1	perceptron	45.3%	0.26	—	$0.0013	24	772	33K	$0.15	$1.50	—	—	—	—
18	Z.ai: GLM 5.1	z-ai	36.0%	0.003	—	$0.1058	434	26,241	203K	$0.98	$3.08	Open	—	51.4	87%
19	Google: Gemini 3.1 Flash Lite	google	35.5%	0.052	0.021	$0.0326	72	21,444	1.0M	$0.25	$1.50	—	—	33.5	82%
20	StepFun: Step 3.7 Flash	stepfun	26.6%	<0.001	—	$0.0256	191	22,065	256K	$0.20	$1.15	—	—	42.6	81%
21	Tencent: Hy3 preview	tencent	25.0%	<0.001	—	$0.0140	579	57,712	262K	$0.06	$0.21	Open	—	41.9	87%
22	Arcee AI: Trinity Large Thinking	arcee-ai	21.1%	<0.001	—	$0.0352	289	36,511	262K	$0.22	$0.85	Open	—	31.9	75%

Showing 22 of 341 models · as of 2026-06-08. Methodology & 𝕍 →

Third-party benchmark values: Artificial Analysis (artificialanalysis.ai) — shown under their attribution terms; not redistributed in downloads.

Crowd preference (Arena Elo): LMArena / Chatbot Arena via mathewhe/chatbot-arena-elo (Apache-2.0).

Model metadata + pricing via OpenRouter. First-party scores + Effective Value (𝕍) by the meo-benchmark project.