We found the newest release from z.AI to represent a step up for open-source models, especially on Terminal-Bench 2, on which the model places seventh overall and beats Kimi K2.5 for top open-weight model by a huge margin (9%).
The model surpasses Kimi as leading the open-weight category in several other benchmarks as well, most notably on Finance Agent, on which it breaks 50% and places 11th overall.
The biggest story here is price - the model is several times cheaper than leading closed-source models from OpenAI and Anthropic. However, Kimi K2.5 is 2.5x cheaper still on our Vals Index, on which it places first due to superior performance on CorpFin.
We evaluated the model with the following parameters:
- temperature 0.7 and low max output tokens on agentic coding benchmarks (16K on SWE-bench and 8K on Terminal-Bench 2)
- 65K max output tokens on CaseLaw and CorpFin
- temperature 1 and 130K max output tokens otherwise