Skip to main content

AI Model Rankings

Every way to rank AI models: best Effective Value, cheapest per correct answer, fastest, most robust, best for agentic work, open-weights, per-domain, and per-benchmark leaderboards — each with an explainer of what the metric means in production.

By value

By efficiency

By access

By reasoning domain

Best AI models for base-rate bias
Ranked by accuracy on the meo “base-rate bias” domain — one of eleven un-leaked, contamination-resistant domains. Per-domain strength reveals where a model genuinely reasons versus where it pattern-matches; the leaders differ sharply by domain.
Best AI models for unsatisfiable constraints
Ranked by accuracy on the meo “unsatisfiable constraints” domain — one of eleven un-leaked, contamination-resistant domains. Per-domain strength reveals where a model genuinely reasons versus where it pattern-matches; the leaders differ sharply by domain.
Best AI models for perceptual illusions
Ranked by accuracy on the meo “perceptual illusions” domain — one of eleven un-leaked, contamination-resistant domains. Per-domain strength reveals where a model genuinely reasons versus where it pattern-matches; the leaders differ sharply by domain.
Best AI models for logic, math & CS
Ranked by accuracy on the meo “logic, math & CS” domain — one of eleven un-leaked, contamination-resistant domains. Per-domain strength reveals where a model genuinely reasons versus where it pattern-matches; the leaders differ sharply by domain.
Best AI models for long arithmetic
Ranked by accuracy on the meo “long arithmetic” domain — one of eleven un-leaked, contamination-resistant domains. Per-domain strength reveals where a model genuinely reasons versus where it pattern-matches; the leaders differ sharply by domain.
Best AI models for regex automata
Ranked by accuracy on the meo “regex automata” domain — one of eleven un-leaked, contamination-resistant domains. Per-domain strength reveals where a model genuinely reasons versus where it pattern-matches; the leaders differ sharply by domain.
Best AI models for tape-machine simulation
Ranked by accuracy on the meo “tape-machine simulation” domain — one of eleven un-leaked, contamination-resistant domains. Per-domain strength reveals where a model genuinely reasons versus where it pattern-matches; the leaders differ sharply by domain.
Best AI models for critical thinking (Watson-Glaser)
Ranked by accuracy on the meo “critical thinking (Watson-Glaser)” domain — one of eleven un-leaked, contamination-resistant domains. Per-domain strength reveals where a model genuinely reasons versus where it pattern-matches; the leaders differ sharply by domain.
Best AI models for theory of mind
Ranked by accuracy on the meo “theory of mind” domain — one of eleven un-leaked, contamination-resistant domains. Per-domain strength reveals where a model genuinely reasons versus where it pattern-matches; the leaders differ sharply by domain.
Best AI models for framework-application bias
Ranked by accuracy on the meo “framework-application bias” domain — one of eleven un-leaked, contamination-resistant domains. Per-domain strength reveals where a model genuinely reasons versus where it pattern-matches; the leaders differ sharply by domain.
Best AI models for multi-step state tracking
Ranked by accuracy on the meo “multi-step state tracking” domain — one of eleven un-leaked, contamination-resistant domains. Per-domain strength reveals where a model genuinely reasons versus where it pattern-matches; the leaders differ sharply by domain.

By third-party benchmark

Per-benchmark leaderboards over the full field. Benchmark catalog →