Best AI models for framework-application bias

Ranked by accuracy on the meo “framework-application bias” domain — one of eleven un-leaked, contamination-resistant domains. Per-domain strength reveals where a model genuinely reasons versus where it pattern-matches; the leaders differ sharply by domain.

Ranking 22 models across the meo-tested roster · as of 2026-06-08.

#	Model	Lab	accuracy
1	MiniMax: MiniMax M3	minimax	100%
2	OpenAI: GPT-5.5	openai	67%
3	Anthropic: Claude Opus 4.8	anthropic	67%
4	Qwen: Qwen3.7 Max	qwen	67%
5	DeepSeek: DeepSeek V4 Flash	deepseek	67%
6	inclusionAI: Ring-2.6-1T	inclusionai	67%
7	MoonshotAI: Kimi K2.6	moonshotai	67%
8	DeepSeek: DeepSeek V4 Pro	deepseek	67%
9	Xiaomi: MiMo-V2.5	xiaomi	67%
10	NVIDIA: Nemotron 3 Ultra	nvidia	67%
11	Xiaomi: MiMo-V2.5-Pro	xiaomi	67%
12	Owl Alpha	openrouter	67%
13	Perceptron: Perceptron Mk1	perceptron	67%
14	Google: Gemini 3.1 Flash Lite	google	67%
15	Arcee AI: Trinity Large Thinking	arcee-ai	67%
16	Google: Gemini 3.1 Pro Preview	google	33%
17	Google: Gemini 3.5 Flash	google	33%
18	Qwen: Qwen3.7 Plus	qwen	33%
19	Z.ai: GLM 5.1	z-ai	33%
20	StepFun: Step 3.7 Flash	stepfun	33%
21	Tencent: Hy3 preview	tencent	33%
22	xAI: Grok 4.3	x-ai	0%

← All rankings Methodology & 𝕍 →