Vals AI in Media

View All News →

The Winners (and Losers) of This New Vibe-Coding Benchmark Will Surprise You

OpenAI's Less-Flashy Rival Might Have a Better Business Model

Vals AI Report Shows Gen AI Tools Outperforming Lawyers on Legal Research Tasks

We tested which AI gave the best answers without making stuff up. One beat ChatGPT.

Industry Leaderboard

Select industry:

xAI

Updates

model

03/17/2026

MiniMax M2.7 Evaluated!

View Details

Benchmarks

Accuracy

Rankings

Vals Index

0.0%

± 1.99

12/ 36

0.0%

± 1.99

12/ 36

CaseLaw (v2)

0.0%

± 0.77

18/ 41

0.0%

± 0.77

18/ 41

CorpFin

0.0%

± 0.96

24/ 92

0.0%

± 0.96

24/ 92

Finance Agent (v1.1)

0.0%

± 2.79

19/ 40

0.0%

± 2.79

19/ 40

MedCode

0.0%

± 1.98

30/ 47

0.0%

± 1.98

30/ 47

MedScribe

0.0%

± 1.86

16/ 47

0.0%

± 1.86

16/ 47

ProofBench

0.0%

± 1.71

22/ 22

0.0%

± 1.71

22/ 22

TaxEval (v2)

0.0%

± 0.92

70/ 100

0.0%

± 0.92

70/ 100

AIME

0.0%

± 0.73

22/ 92

0.0%

± 0.73

22/ 92

GPQA

0.0%

± 1.95

11/ 95

0.0%

± 1.95

11/ 95

LiveCodeBench

0.0%

± 1.10

34/ 101

0.0%

± 1.10

34/ 101

LegalBench

0.0%

± 0.41

14/ 113

0.0%

± 0.41

14/ 113

MMLU Pro

0.0%

± 0.39

48/ 93

0.0%

± 0.39

48/ 93

SWE-bench

0.0%

± 2.00

10/ 62

0.0%

± 2.00

10/ 62

Terminal-Bench 2.0

0.0%

± 5.32

12/ 46

0.0%

± 5.32

12/ 46

Academic Benchmarks

Proprietary Benchmarks (contact us to get access)

Public Enterprise LLM Benchmarks

MiniMax M2.7 Evaluated!

Best Performing Models

Best Open Weight Models

Pareto Efficient Models

Vals AI in Media

Industry Leaderboard

Updates

MiniMax M2.7 Evaluated!

MiniMax M2.7 Evaluated!

Join our mailing list to receive benchmark updates