Skip to main content

AI Benchmarks Catalog

The third-party AI benchmarks we aggregate and cross-reference against our first-party meo scores: Artificial Analysis indices, GPQA, MMLU-Pro, AIME, SWE-style coding, LMArena Elo, and more — each with what it measures, its license, and a per-benchmark model leaderboard.

We treat these as attributed secondary signals and cross-check them against our un-leaked first-party meo benchmark. Public benchmarks are contamination-prone; crowd arenas are gameable — see the methodology for why.

Artificial Analysis

BenchmarkTypeLicenseLeaderboard
AA Coding Indexindexproprietary-attributionView ranking →
AA Intelligence Indexindexproprietary-attributionView ranking →
AA Math Indexindexproprietary-attributionView ranking →
AIMEaccuracyproprietary-attributionView ranking →
AIME 2025accuracyproprietary-attributionView ranking →
GPQA Diamondaccuracyproprietary-attributionView ranking →
Humanity's Last Examaccuracyproprietary-attributionView ranking →
IFBenchaccuracyproprietary-attributionView ranking →
LCR (long-context reasoning)accuracyproprietary-attributionView ranking →
LiveCodeBenchaccuracyproprietary-attributionView ranking →
MATH-500accuracyproprietary-attributionView ranking →
MMLU-Proaccuracyproprietary-attributionView ranking →
SciCodeaccuracyproprietary-attributionView ranking →
τ²-benchaccuracyproprietary-attributionView ranking →
Terminal-Bench Hardaccuracyproprietary-attributionView ranking →
Median output throughput (tokens/s)tok_sproprietary-attributionView ranking →
Median time to first token (s)secondsproprietary-attributionView ranking →

LMArena (Chatbot Arena)

BenchmarkTypeLicenseLeaderboard
LMArena Elo (Chatbot Arena)eloapache-2.0View ranking →

Artificial Analysis (artificialanalysis.ai). Redistribution requires an AA commercial license.

Benchmark concepts

Plain-language explainers of key benchmarking terms.