Updates

12/11/2024

News

Refresh to Vals AI

We’ve just implemented a re-design of this benchmarking website!

Apart from being easier on the eyes, this new version of the site is much more useful.

  1. Models cards are displayed on their own dedicated pages, showing results across all benchmarks.
  2. Every Benchmark page is time-stamped and updated with changelogs.
  3. Our Methodology page now shares more details around our approach and plan.

11/10/2024

Model

Results for the new 3.5 Sonnet (Upgraded) model

  • On Legalbench, it’s now exactly tied with GPT 4o, and beats 4o on CorpFin and CaseLaw
  • It usually, but not always, performs a few percentage points better than the previous version - for example, on Legalbench (+1.3%), ContractLaw Overall (+0.5%), and CorpFin (+0.8%).
  • There are some instances where it experienced a performance regression - including TaxEval Free Response (-3.2%) and CaseLaw Overall (-0.1%).
  • Although it’s competitive with 4o, it’s still not at the level of GPT o1, which still claims the top spots on almost all of our leaderboards.

10/31/2024

News

Vals AI Legal Report Announced

Vals AI and Legaltech Hub are partnering with leading law firms and top legal AI vendors to conduct a first-of-its-kind benchmark.

The study will evaluate the platforms across eight legal tasks including Document Q&A, Legal Research, EDGAR Research. All data will be collected from the law firms, to ensure it’s representative of real legal work.

The report will be published in early 2025.

Join our mailing list to receive benchmark updates on

Stay up to date as new benchmarks and models are released.