🤖 AI Summary
Z.ai released GLM‑4.6, an incremental upgrade over GLM‑4.5 with measurable gains across eight public benchmarks for agents, reasoning, and coding. The model shows clear improvements versus GLM‑4.5 and is competitive with domestic/international peers like DeepSeek‑V3.2‑Exp and Claude Sonnet 4, though it still trails Claude Sonnet 4.5 on coding-specific benchmarks. Importantly, Z.ai extended CC‑Bench to include more realistic, multi‑turn tasks executed by human evaluators inside isolated Docker environments (front‑end dev, tool building, data analysis, testing, algorithms). In that setting GLM‑4.6 approaches parity with Claude Sonnet 4 (48.6% win rate) and outperforms other open‑source baselines, while completing tasks with ~15% fewer tokens—indicating both capability and token‑efficiency gains. All evaluation traces and trajectories are publicly available for inspection.
GLM‑4.6 and GLM‑4.6‑Air are available through the Z.ai API (docs and integration guides provided) and via OpenRouter, and the model is already integrated into several coding agents (Claude Code, Kilo Code, Roo Code, Cline). GLM Coding Plan subscribers will be auto‑upgraded; new subscribers are offered a lower‑cost, higher‑quota tier. The announcement also notes public availability of GLM‑4.5 and GLM‑4.5‑Air weights on HuggingFace/ModelScope with deployment recipes (vLLM, SGLang); the release did not specify public weights for GLM‑4.6. The open CC‑Bench dataset link enables reproducibility and further community evaluation.
Loading comments...
login to comment
loading comments...
no comments yet