Llama 3.1 Instruct Turbo, 405B parameters with FP8 quantization and reduced context.

Released Date: 7/23/2024

Avg. Accuracy:

68.5%

Latency:

1.47s

Performance by Benchmark

Benchmarks

Accuracy

Rankings

LegalBench

79.0%

( 4 / 24 )

CorpFin

61.8%

( 13 / 26 )

ContractLaw

75.2%

( 1 / 25 )

TaxEval

57.8%

( 9 / 25 )

Cost Analysis

Input Cost

$3.50 / M Tokens

Output Cost

$3.50 / M Tokens

Cost Per Test

$0.19 / 100 tests

Overview

Llama 3.1 405B represents Meta’s most powerful open-source model, marking a significant leap in open-source AI capabilities. It demonstrates performance competitive with proprietary models while maintaining the benefits of open-source deployment flexibility and lower costs.

Key Specifications

  • Context Window: 131,072 tokens
  • Output Limit: 4,096 tokens
  • Training Cutoff: December 2023
  • Pricing:
    • Input: $1.50 per million tokens
    • Output: $1.50 per million tokens

Performance Highlights

  • Scale Benefits: Largest Llama model shows significant improvements
  • Legal Understanding: Strong performance in legal reasoning tasks
  • Cost Efficiency: Excellent performance/cost ratio
  • Deployment Flexibility: Can be run on-premise or through providers

Benchmark Results

Strong performance across benchmarks:

  • TaxEval: Competitive with closed-source models
  • LegalBench: Strong performance in legal reasoning
  • ContractLaw: Effective contract analysis capabilities
  • CaseLaw: Good understanding of legal precedents

Use Case Recommendations

Best suited for:

  • Enterprise deployments requiring model control
  • High-volume applications
  • Cost-sensitive production environments
  • Legal and financial analysis at scale
  • Organizations preferring open-source solutions

Limitations

  • Slightly behind top closed-source models
  • Requires significant compute resources for self-hosting
  • Less consistent than some proprietary alternatives
  • May require more prompt engineering

Comparison with Other Models

  • More powerful than smaller Llama 3.1 variants
  • More cost-effective than GPT-4 series
  • Competitive with Claude 3.5 Sonnet
  • Better performance than previous Llama generations
Join our mailing list to receive benchmark updates on

Stay up to date as new benchmarks and models are released.