Independent Evaluation, Unbiased Benchmarks

Testing AI on Real-World Tasks

We benchmark the world's leading AI models on rigorous, domain-specific tasks in finance, law, software, healthcare, and more. We run all of our own evaluations and create many of our benchmarks in house

Vals AI Updates

Fresh updates from our testing queue

model
04/16/2026

Claude Opus 4.7 is the new SOTA

Claude Opus 4.7 is the new SOTA

View Details

Benchmarks

Accuracy

Rankings

71.47%

± 1.80
1/ 40

70.51%

± 1.41
1/ 28

68.38%

± 0.43
4/ 47

66.08%

± 0.93
6/ 97

64.37%

± 2.79
1/ 45

54.86%

± 2.21
3/ 51

82.95%

± 1.98
13/ 51

70.27%

± 0.90
1/ 69

96.25%

± 0.52
7/ 96

89.90%

± 1.56
5/ 99
Contact us
Or send us an email at contact@vals.ai
Proprietary Benchmarks (contact us to get access)
Academic Benchmarks

Read about our methodology.

Industry Leaderboard

Independent benchmarks for industry-specific AI performance.

Industry
Benchmark

Model Performance Over Time

Tracking how foundation models improve with each release

80%66%52%38%24%10%
Feb '25May '25Jul '25Sep '25Dec '25Feb '26May '26