Best AI models for deep agentic chains

Ranked by Effective Value at task depth N=40 — long agentic chains where one error compounds. As depth grows the ranking shifts decisively toward accuracy: fast-but-flawed models collapse while accurate models pull ahead. This is the leaderboard for autonomous, multi-step work. Shown indexed to the top model at N=40 = 100.

Ranking 22 models across the meo-tested roster · as of 2026-06-08.

#	Model	Lab	𝕍 idx (N=40)
1	OpenAI: GPT-5.5	openai	100.0
2	Anthropic: Claude Opus 4.8	anthropic	37.2
3	Google: Gemini 3.5 Flash	google	0.011
4	Google: Gemini 3.1 Pro Preview	google	0.007
5	xAI: Grok 4.3	x-ai	0.002
6	inclusionAI: Ring-2.6-1T	inclusionai	<0.001
7	Qwen: Qwen3.7 Max	qwen	<0.001
8	DeepSeek: DeepSeek V4 Flash	deepseek	<0.001
9	MoonshotAI: Kimi K2.6	moonshotai	<0.001
10	NVIDIA: Nemotron 3 Ultra	nvidia	<0.001
11	Xiaomi: MiMo-V2.5	xiaomi	<0.001
12	DeepSeek: DeepSeek V4 Pro	deepseek	<0.001
13	MiniMax: MiniMax M3	minimax	<0.001
14	Qwen: Qwen3.7 Plus	qwen	<0.001
15	Xiaomi: MiMo-V2.5-Pro	xiaomi	<0.001
16	Perceptron: Perceptron Mk1	perceptron	<0.001
17	Owl Alpha	openrouter	<0.001
18	Google: Gemini 3.1 Flash Lite	google	<0.001
19	Z.ai: GLM 5.1	z-ai	<0.001
20	StepFun: Step 3.7 Flash	stepfun	<0.001
21	Tencent: Hy3 preview	tencent	<0.001
22	Arcee AI: Trinity Large Thinking	arcee-ai	<0.001

← All rankings Methodology & 𝕍 →