Skip to main content

Best AI models for deep agentic chains

Ranked by Effective Value at task depth N=40 β€” long agentic chains where one error compounds. As depth grows the ranking shifts decisively toward accuracy: fast-but-flawed models collapse while accurate models pull ahead. This is the leaderboard for autonomous, multi-step work. Shown indexed to the top model at N=40 = 100.

Ranking 22 models across the meo-tested roster Β· as of 2026-06-08.

#ModelLab𝕍 idx (N=40)
1OpenAI: GPT-5.5openai100.0
2Anthropic: Claude Opus 4.8anthropic37.2
3Google: Gemini 3.5 Flashgoogle0.011
4Google: Gemini 3.1 Pro Previewgoogle0.007
5xAI: Grok 4.3x-ai0.002
6inclusionAI: Ring-2.6-1Tinclusionai<0.001
7Qwen: Qwen3.7 Maxqwen<0.001
8DeepSeek: DeepSeek V4 Flashdeepseek<0.001
9MoonshotAI: Kimi K2.6moonshotai<0.001
10NVIDIA: Nemotron 3 Ultranvidia<0.001
11Xiaomi: MiMo-V2.5xiaomi<0.001
12DeepSeek: DeepSeek V4 Prodeepseek<0.001
13MiniMax: MiniMax M3minimax<0.001
14Qwen: Qwen3.7 Plusqwen<0.001
15Xiaomi: MiMo-V2.5-Proxiaomi<0.001
16Perceptron: Perceptron Mk1perceptron<0.001
17Owl Alphaopenrouter<0.001
18Google: Gemini 3.1 Flash Litegoogle<0.001
19Z.ai: GLM 5.1z-ai<0.001
20StepFun: Step 3.7 Flashstepfun<0.001
21Tencent: Hy3 previewtencent<0.001
22Arcee AI: Trinity Large Thinkingarcee-ai<0.001

← All rankingsMethodology & 𝕍 β†’