12/11/2024
News
Refresh to Vals AI
We’ve just implemented a re-design of this benchmarking website!
Apart from being easier on the eyes, this new version of the site is much more useful.
- Models cards are displayed on their own dedicated pages, showing results across all benchmarks.
- Every Benchmark page is time-stamped and updated with changelogs.
- Our Methodology page now shares more details around our approach and plan.
11/10/2024
Model
Results for the new 3.5 Sonnet (Upgraded) model
- On Legalbench, it’s now exactly tied with GPT 4o, and beats 4o on CorpFin and CaseLaw
- It usually, but not always, performs a few percentage points better than the previous version - for example, on Legalbench (+1.3%), ContractLaw Overall (+0.5%), and CorpFin (+0.8%).
- There are some instances where it experienced a performance regression - including TaxEval Free Response (-3.2%) and CaseLaw Overall (-0.1%).
- Although it’s competitive with 4o, it’s still not at the level of GPT o1, which still claims the top spots on almost all of our leaderboards.
10/31/2024
News
Vals AI Legal Report Announced
Vals AI and Legaltech Hub are partnering with leading law firms and top legal AI vendors to conduct a first-of-its-kind benchmark.
The study will evaluate the platforms across eight legal tasks including Document Q&A, Legal Research, EDGAR Research. All data will be collected from the law firms, to ensure it’s representative of real legal work.
The report will be published in early 2025.