Z.ai’s GLM-5 arrives with a focus on long-horizon, agentic tasks
Z.ai unveiled GLM-5, positioning the model as a step away from “vibe coding” and toward agentic engineering for multi-step systems work. The release highlights expanded scale and training data, targeted benchmark wins, and a staggered rollout for paid subscribers.
What changed under the hood
- Scales from 355B params (32B active) to 744B (40B active) across variants.
- Pre-training data increased from 23T to 28.5T tokens.
Those figures frame GLM-5 as a materially larger iteration compared with GLM-4.5, with architecture and training adjustments aimed at sustained, tool-driven workflows rather than single-turn code completions.
Benchmarks and claims
Z.ai reports GLM-5 beating GLM-4.7 on an internal suite, CC-Bench-V2, with wins across frontend, backend, and long-horizon tasks (benchmark thread). On the simulated economic task Vending Bench 2, GLM-5 finished top among open-source models with a final balance of $4,432, approaching closed-source models on long-term planning metrics (Vending Bench 2 tweet).
Early community tests and comparisons are varied: several users praise coding and multi-agent behavior, while others call out latency and throughput issues — notably slower tokens-per-second performance in some deployments.
Availability, rollout, and weights
Access is being phased in for GLM Coding Plan subscribers due to compute limits:
- Max plan users can enable GLM-5 immediately by changing the model name to "GLM-5" (example: ~/.claude/settings.json for Claude Code).
- Other tiers will gain access progressively; requests to GLM-5 consume more quota than GLM-4.7. (Rollout note: https://t.co/Nk8Y98Il7s)
Weights and model files are publicly linked:
- Weights posted by Z.ai: https://t.co/DteNDHjSEh
- ModelScope mirror: https://t.co/SSPREfjt9f
Additional resources linked from the release include a tech blog (tech blog), an OpenRouter listing (previously Pony Alpha, https://t.co/7Khf64Lxg6), and a “try it now” endpoint in Z.ai’s ecosystem (https://t.co/WCqWT0raFJ).
Community reaction and ecosystem activity
Response to GLM-5 is mixed. Positive notes include rapid third-party support (day-zero Baseten integration: https://t.co/pw8FuUjx4V), community uploads of GGUFs for local runs (https://t.co/2h7UIRq9wJ), and free access pathways promoted by smaller providers. Criticisms center on pricing and plan changes, reported plan downgrades, and slower inference speeds; some users flagged steep price increases and quota concerns. There are also requests for deeper reliability evaluations on multi-step agent workflows and tool-call robustness.
Developer implications
GLM-5’s stated emphasis on agentic engineering and extended tool-call sessions suggests a shift toward models designed for sustained workflows: planning, tool orchestration, and iterative debug-fix-verify loops. Important developer-facing points:
- Scale and training corpus increases that may improve multi-step coherence.
- Higher quota cost for GLM-5 calls, relevant for integration planning.
- Availability of weights and community GGUFs enables local experimentation and quantization workstreams.
Further validation will depend on independent benchmarks covering throughput, degradation under heavy tool-call loads, and behavior across real-world agent scenarios.
Original source: https://x.com/Zai_org/status/2021638634739527773