Llama 3.1 Instruct Turbo, 405B parameters with FP8 quantization and reduced context.

Released Date: 7/23/2024

Avg. Accuracy:

76.0%

Latency:

16.39s

Performance by Benchmark

Benchmarks

Accuracy

Rankings

ContractLaw

75.2%

( 1 / 51 )

TaxEval

66.3%

( 19 / 31 )

Math500

71.4%

( 21 / 27 )

LegalBench

79.0%

( 9 / 49 )

MedQA

88.2%

( 9 / 29 )

Academic Benchmarks
Proprietary Benchmarks (contact us to get access)

Cost Analysis

Input Cost

$3.50 / M Tokens

Output Cost

$3.50 / M Tokens

Input Cost (per char)

$0.84 / M chars

Output Cost (per char)

$0.94 / M chars

Overview

Llama 3.1 405B represents Meta’s most powerful open-source model, marking a significant leap in open-source AI capabilities. It demonstrates performance competitive with proprietary models while maintaining the benefits of open-source deployment flexibility and lower costs.

Key Specifications

  • Context Window: 131,072 tokens
  • Output Limit: 4,096 tokens
  • Training Cutoff: December 2023
  • Pricing:
    • Input: $1.50 per million tokens
    • Output: $1.50 per million tokens

Performance Highlights

  • Scale Benefits: Largest Llama model shows significant improvements
  • Legal Understanding: Strong performance in legal reasoning tasks
  • Cost Efficiency: Excellent performance/cost ratio
  • Deployment Flexibility: Can be run on-premise or through providers

Benchmark Results

Strong performance across benchmarks:

  • TaxEval: Competitive with closed-source models
  • LegalBench: Strong performance in legal reasoning
  • ContractLaw: Effective contract analysis capabilities
  • CaseLaw: Good understanding of legal precedents

Use Case Recommendations

Best suited for:

  • Enterprise deployments requiring model control
  • High-volume applications
  • Cost-sensitive production environments
  • Legal and financial analysis at scale
  • Organizations preferring open-source solutions

Limitations

  • Slightly behind top closed-source models
  • Requires significant compute resources for self-hosting
  • Less consistent than some proprietary alternatives
  • May require more prompt engineering

Comparison with Other Models

  • More powerful than smaller Llama 3.1 variants
  • More cost-effective than GPT-4 series
  • Competitive with Claude 3.5 Sonnet
  • Better performance than previous Llama generations
Join our mailing list to receive benchmark updates on

Stay up to date as new benchmarks and models are released.