Independent Evaluation, Unbiased Benchmarks

Testing AI on Real-World Tasks

We benchmark the world's leading AI models on rigorous, domain-specific tasks in finance, law, software, healthcare, and more. We run all of our own evaluations and create many of our benchmarks in-house.

Vals AI Updates

Fresh updates from our testing queue

model
05/22/2026

Alibaba's Qwen 3.7 Max evaluated across our benchmark suite

Alibaba's Qwen 3.7 Max evaluated across our benchmark suite

View Details

Benchmarks

Accuracy

Rankings

51.64%

± 1.80
9/ 19

63.71%

± 0.95
23/ 107

42.16%

± 0.03
11/ 19

26.00%

± 4.41
8/ 32

41.51%

± 8.09
12/ 45

42.70%

± 0.65
26/ 64
Contact us
Or send us an email at contact@vals.ai
Proprietary Benchmarks (contact us to get access)
Academic Benchmarks

Read about our methodology.

Industry Leaderboard

Independent benchmarks for industry-specific AI performance.

Industry
Benchmark