AI Benchmarks Catalog

The third-party AI benchmarks we aggregate and cross-reference against our first-party meo scores: Artificial Analysis indices, GPQA, MMLU-Pro, AIME, SWE-style coding, LMArena Elo, and more — each with what it measures, its license, and a per-benchmark model leaderboard.

We treat these as attributed secondary signals and cross-check them against our un-leaked first-party meo benchmark. Public benchmarks are contamination-prone; crowd arenas are gameable — see the methodology for why.

Artificial Analysis

Benchmark	Type	License	Leaderboard
AA Coding Index	index	proprietary-attribution	View ranking →
AA Intelligence Index	index	proprietary-attribution	View ranking →
AA Math Index	index	proprietary-attribution	View ranking →
AIME	accuracy	proprietary-attribution	View ranking →
AIME 2025	accuracy	proprietary-attribution	View ranking →
GPQA Diamond	accuracy	proprietary-attribution	View ranking →
Humanity's Last Exam	accuracy	proprietary-attribution	View ranking →
IFBench	accuracy	proprietary-attribution	View ranking →
LCR (long-context reasoning)	accuracy	proprietary-attribution	View ranking →
LiveCodeBench	accuracy	proprietary-attribution	View ranking →
MATH-500	accuracy	proprietary-attribution	View ranking →
MMLU-Pro	accuracy	proprietary-attribution	View ranking →
SciCode	accuracy	proprietary-attribution	View ranking →
τ²-bench	accuracy	proprietary-attribution	View ranking →
Terminal-Bench Hard	accuracy	proprietary-attribution	View ranking →
Median output throughput (tokens/s)	tok_s	proprietary-attribution	View ranking →
Median time to first token (s)	seconds	proprietary-attribution	View ranking →

LMArena (Chatbot Arena)

Benchmark	Type	License	Leaderboard
LMArena Elo (Chatbot Arena)	elo	apache-2.0	View ranking →

Artificial Analysis (artificialanalysis.ai). Redistribution requires an AA commercial license.

Benchmark concepts

Plain-language explainers of key benchmarking terms.

What is Ai benchmarks?What is Ai rankings?What is Artificial analysis?What is Gpqa diamond?What is Hle benchmark?What is Large language model news today?What is Llm benchmark?What is Llm benchmark leaderboard?What is Llm benchmark news?What is Llm leaderboard?What is Swe bench?