Public Enterprise LLM Benchmarks

03/17/2026
Model

MiniMax M2.7 Evaluated!

View Model Results

Best Performing Models

Top performing models from the Vals Index. Includes a range of tasks across finance, coding and law.

All Top Performing Models

Vals Index

3/17/2026
Vals Logo
0.00%
Anthropic
Anthropic
Claude Sonnet 4.6
Vals Index Score: 66.82%
Google
Google
Gemini 3.1 Pro Preview (02/26)
Vals Index Score: 64.86%
OpenAI
OpenAI
GPT 5.4
Vals Index Score: 64.59%
1Claude Sonnet 4.6
66.82%
2Gemini 3.1 Pro Preview (02/26)
64.86%
3GPT 5.4
64.59%

Best Open Weight Models

Top performing open weight models from the Vals Index. Includes a range of tasks across finance, coding and law.

All Top Open Weight Models

Vals Index

3/17/2026
Vals Logo
0.00%
zAI
zAI
GLM 5
Vals Index Score: 60.67%
MiniMax
MiniMax
MiniMax-M2.7
Vals Index Score: 60.14%
Moonshot AI
Moonshot AI
Kimi K2.5
Vals Index Score: 59.42%
1GLM 5
60.67%
2MiniMax-M2.7
60.14%
3Kimi K2.5
59.42%

Pareto Efficient Models

The top performing models from the Vals Index which are cost efficient.

View full Pareto curve

Vals Index

3/17/2026
x-axis: cost per test
y-axis: accuracy
Claude Sonnet 4.6
Anthropic
Claude Sonnet 4.6
Accuracy: 66.82%
Cost per test: $0.78
Gemini 3.1 Pro Preview (02/26)
Google
Gemini 3.1 Pro Preview (02/26)
Accuracy: 64.86%
Cost per test: $0.57
MiniMax-M2.7
MiniMax
MiniMax-M2.7
Accuracy: 60.14%
Cost per test: $0.15
1Claude Sonnet 4.6
66.82% | $0.78
2Gemini 3.1 Pro Preview (02/26)
64.86% | $0.57
3MiniMax-M2.7
60.14% | $0.15

Industry Leaderboard

Select industry:
Vals Logo

Updates

View more
model
03/17/2026

MiniMax M2.7 Evaluated!

MiniMax M2.7 Evaluated!

View Details

Benchmarks

Accuracy

Rankings

Vals Index

0.0%

± 1.99

12/ 36

CaseLaw (v2)

0.0%

± 0.77

18/ 41

CorpFin

0.0%

± 0.96

24/ 92

Finance Agent (v1.1)

0.0%

± 2.79

19/ 40

MedCode

0.0%

± 1.98

30/ 47

MedScribe

0.0%

± 1.86

16/ 47

ProofBench

0.0%

± 1.71

22/ 22

TaxEval (v2)

0.0%

± 0.92

70/ 100

AIME

0.0%

± 0.73

22/ 92

GPQA

0.0%

± 1.95

11/ 95

LiveCodeBench

0.0%

± 1.10

34/ 101

LegalBench

0.0%

± 0.41

14/ 113

MMLU Pro

0.0%

± 0.39

48/ 93

SWE-bench

0.0%

± 2.00

10/ 62

Terminal-Bench 2.0

0.0%

± 5.32

12/ 46

Academic Benchmarks
Proprietary Benchmarks (contact us to get access)
Vals Logo

Join our mailing list to receive benchmark updates

Model benchmarks are seriously lacking. With Vals AI, we report how language models perform on the industry-specific tasks where they will be used.

By subscribing, I agree to Vals' Privacy Policy.