Independent Evaluation, Unbiased Benchmarks

Testing AI on Real-World Tasks

We benchmark the world's leading AI models on rigorous, domain-specific tasks in finance, law, software, healthcare, and more. We run all of our own evaluations and create many of our benchmarks in-house.

Vals AI Updates

Fresh updates from our testing queue

model
06/04/2026

NVIDIA's Nemotron 3 Ultra evaluated across our benchmark suite

NVIDIA's Nemotron 3 Ultra evaluated across our benchmark suite

View Details

Benchmarks

Accuracy

Rankings

43.99%

± 1.26
18/ 24

65.46%

± 0.94
16/ 110

37.53%

± 0.27
18/ 24

38.62%

± 2.00
35/ 62

73.10%

± 0.87
34/ 116

86.11%

± 1.74
23/ 110

85.98%

± 0.99
12/ 115
Contact us
Or send us an email at contact@vals.ai
Proprietary Benchmarks (contact us to get access)
Academic Benchmarks

Read about our methodology.

Industry Leaderboard

Independent benchmarks for industry-specific AI performance.

Industry
Benchmark