Independent Evaluation, Unbiased Benchmarks

Testing AI on Real-World Tasks

We benchmark the world's leading AI models on rigorous, domain-specific tasks in finance, law, software, healthcare, and more. We run all of our own evaluations and create many of our benchmarks in-house.

Vals AI Updates

Fresh updates from our testing queue

model
06/30/2026

Anthropic's Claude Sonnet 5 evaluated on the Vals Index

Anthropic's Claude Sonnet 5 evaluated on the Vals Index

View Details

Benchmarks

Accuracy

Rankings

68.61%

± 1.00
3/ 31
Contact us
Or send us an email at contact@vals.ai

License type:

Proprietary (contact us to get access)
Industry Partner
Academic

Read our methodology.

Industry Leaderboard

Independent benchmarks for industry-specific AI performance.

Industry
Benchmark

Model Performance Over Time

Tracking how foundation models improve with each release