gpt-o1-preview o1 Preview

Latest o1 model snapshot.

Released Date: 9/12/2024

Avg. Accuracy:

79.6%

Latency:

12.09s

Performance by Benchmark

Benchmarks

Accuracy

Rankings

LegalBench

81.7%

( 1 / 24 )

CorpFin

76.4%

( 1 / 26 )

CaseLaw

87.3%

( 1 / 18 )

ContractLaw

69.0%

( 9 / 25 )

TaxEval

83.5%

( 1 / 25 )

Cost Analysis

Input Cost

$15.00 / M Tokens

Output Cost

$60.00 / M Tokens

Cost Per Test

$5.28 / 100 tests

Overview

o1 Preview represents OpenAI’s latest breakthrough in language model capabilities. It demonstrates unprecedented performance across our benchmarks, particularly excelling in complex reasoning tasks and mathematical computations. While it comes at a premium price point, it sets new standards for what’s possible in language model performance.

Key Specifications

  • Context Window: 128,000 tokens
  • Output Limit: 32,768 tokens
  • Training Cutoff: October 2023
  • Pricing:
    • Input: $15.00 per million tokens
    • Cached Input: $7.50 per million tokens
    • Output: $60.00 per million tokens

Performance Highlights

  • Mathematical Reasoning: Exceptional performance in numerical tasks
  • Legal Analysis: Top performer across all legal benchmarks
  • Complex Logic: Superior handling of multi-step reasoning
  • Consistency: Most reliable outputs among all tested models

Benchmark Results

Leads performance across our benchmarks:

  • TaxEval: Highest accuracy in tax computation and reasoning
  • LegalBench: Top performance in legal analysis
  • ContractLaw: Superior contract interpretation capabilities
  • CaseLaw: Best-in-class understanding of legal precedents

Use Case Recommendations

Best suited for:

  • High-stakes analysis
  • Complex legal reasoning
  • Tax computation and analysis
  • Tasks requiring highest possible accuracy
  • Research and development applications

Limitations

  • Highest cost among all models, and may be cost-prohibitive for many applications
  • Sometimes produces overly verbose outputs
  • Much harder to control - does not support system prompts, temperature, etc.

Comparison with Other Models

  • Significantly more expensive than GPT-4o
  • Higher performance ceiling than all other models
  • Better reasoning capabilities than Claude 3 Opus
  • Superior mathematical abilities compared to all competitors
Join our mailing list to receive benchmark updates on

Stay up to date as new benchmarks and models are released.