Independent Evaluation, Unbiased Benchmarks

Testing AI on Real-World Tasks

We benchmark the world's leading AI models on rigorous, domain-specific tasks in finance, law, software, healthcare, and more. We run all of our own evaluations and create many of our benchmarks in-house.

Vals AI Updates

Fresh updates from our testing queue

model
05/19/2026

Google's Gemini 3.5 Flash evaluated across our benchmark suite

Google's Gemini 3.5 Flash evaluated across our benchmark suite

View Details

Benchmarks

Accuracy

Rankings

62.05%

± 1.61
3/ 17

62.29%

± 1.47
3/ 14

64.69%

± 0.94
19/ 105

57.86%

± 0.23
1/ 18

55.83%

± 2.11
3/ 57

76.57%

± 1.92
32/ 57

68.12%

± 0.91
12/ 74

29.00%

± 4.56
7/ 31

92.68%

± 1.46
3/ 105

87.60%

± 0.95
3/ 110
Contact us
Or send us an email at contact@vals.ai
Proprietary Benchmarks (contact us to get access)
Academic Benchmarks

Read about our methodology.

Industry Leaderboard

Independent benchmarks for industry-specific AI performance.

Industry
Benchmark