Independent Evaluation, Unbiased Benchmarks

Testing AI on Real-World Tasks

We benchmark the world's leading AI models on rigorous, domain-specific tasks in finance, law, software, healthcare, and more. We run all of our own evaluations and create many of our benchmarks in-house.

Vals AI Updates

Fresh updates from our testing queue

model
05/22/2026

Alibaba's Qwen 3.7 Max evaluated on the Vals Index

Alibaba's Qwen 3.7 Max evaluated on the Vals Index

View Details

Benchmarks

Accuracy

Rankings

57.29%

± 1.58
5/ 19

63.71%

± 0.95
23/ 107

48.35%

± 0.48
5/ 20

38.75%

± 2.20
32/ 59

79.40%

± 1.91
21/ 59

26.00%

± 4.41
8/ 33

75.31%

± 0.84
7/ 113

90.15%

± 1.50
8/ 107

46.75%

± 10.26
4/ 54
Contact us
Or send us an email at contact@vals.ai
Proprietary Benchmarks (contact us to get access)
Academic Benchmarks

Read about our methodology.

Industry Leaderboard

Independent benchmarks for industry-specific AI performance.

Industry
Benchmark