Claude 3.7 Sonnet

Performance by Benchmark

Benchmarks

Accuracy

Rankings

CorpFin

65.5%

( 6 / 31 )

65.5%

6 / 31

CaseLaw

80.7%

( 16 / 50 )

80.7%

16 / 50

ContractLaw

68.1%

( 19 / 57 )

68.1%

19 / 57

TaxEval

75.9%

( 8 / 37 )

75.9%

8 / 37

MortgageTax

80.6%

( 1 / 18 )

80.6%

1 / 18

Math500

76.8%

( 17 / 33 )

76.8%

17 / 33

AIME

22.5%

( 13 / 29 )

22.5%

13 / 29

MGSM

92.4%

( 6 / 31 )

92.4%

6 / 31

LegalBench

78.1%

( 18 / 55 )

78.1%

18 / 55

GPQA

67.4%

( 9 / 30 )

67.4%

9 / 30

MMLU Pro

80.7%

( 5 / 30 )

80.7%

5 / 30

MMMU

71.6%

( 6 / 17 )

71.6%

6 / 17

Academic Benchmarks

Proprietary Benchmarks (contact us to get access)

Overview

Important: This evaluation was performed with Thinking Mode disabled. For results with Thinking Mode enabled, see Claude 3.7 Sonnet (Thinking).

This ensures the model was evaluated under the same conditions as other non-reasoning models.

Claude 3.7 Sonnet is Anthropic’s latest model, succeeding Claude 3.5 Sonnet Latest which was released in October 2024.

What sets Claude 3.7 apart from its predecessors and competitors is its hybrid architecture, which makes thinking capabilities optional and fully configurable. Users can specify the number of thinking tokens independently from output tokens. These thinking tokens are preserved after generation, enabling users to examine and analyze the model’s reasoning process.

Key Specifications

Context Window: 200,000 tokens
Max Output Tokens: 8,192 tokens
Extended Thinking: 64,000 tokens
Training Cutoff: October 2024
Pricing:
- Input: $3.00 / 1M tokens
- Output: $15.00 / 1M tokens

Performance by Benchmark

Cost Analysis

Overview

Key Specifications

Join our mailing list to receive benchmark updates on