Z.ai's GLM-5 Targets Agentic, Long-Horizon Code and Tool Workflows

Z.ai’s GLM-5 arrives with a focus on long-horizon, agentic tasks

Z.ai unveiled GLM-5, positioning the model as a step away from “vibe coding” and toward agentic engineering for multi-step systems work. The release highlights expanded scale and training data, targeted benchmark wins, and a staggered rollout for paid subscribers.

What changed under the hood

Scales from 355B params (32B active) to 744B (40B active) across variants.
Pre-training data increased from 23T to 28.5T tokens.

Those figures frame GLM-5 as a materially larger iteration compared with GLM-4.5, with architecture and training adjustments aimed at sustained, tool-driven workflows rather than single-turn code completions.

Benchmarks and claims

Z.ai reports GLM-5 beating GLM-4.7 on an internal suite, CC-Bench-V2, with wins across frontend, backend, and long-horizon tasks (benchmark thread). On the simulated economic task Vending Bench 2, GLM-5 finished top among open-source models with a final balance of $4,432, approaching closed-source models on long-term planning metrics (Vending Bench 2 tweet).

Early community tests and comparisons are varied: several users praise coding and multi-agent behavior, while others call out latency and throughput issues — notably slower tokens-per-second performance in some deployments.

Availability, rollout, and weights

Access is being phased in for GLM Coding Plan subscribers due to compute limits:

Max plan users can enable GLM-5 immediately by changing the model name to "GLM-5" (example: ~/.claude/settings.json for Claude Code).
Other tiers will gain access progressively; requests to GLM-5 consume more quota than GLM-4.7. (Rollout note: https://t.co/Nk8Y98Il7s)

Weights and model files are publicly linked:

Weights posted by Z.ai: https://t.co/DteNDHjSEh
ModelScope mirror: https://t.co/SSPREfjt9f

Additional resources linked from the release include a tech blog (tech blog), an OpenRouter listing (previously Pony Alpha, https://t.co/7Khf64Lxg6), and a “try it now” endpoint in Z.ai’s ecosystem (https://t.co/WCqWT0raFJ).

Community reaction and ecosystem activity

Response to GLM-5 is mixed. Positive notes include rapid third-party support (day-zero Baseten integration: https://t.co/pw8FuUjx4V), community uploads of GGUFs for local runs (https://t.co/2h7UIRq9wJ), and free access pathways promoted by smaller providers. Criticisms center on pricing and plan changes, reported plan downgrades, and slower inference speeds; some users flagged steep price increases and quota concerns. There are also requests for deeper reliability evaluations on multi-step agent workflows and tool-call robustness.

Developer implications

GLM-5’s stated emphasis on agentic engineering and extended tool-call sessions suggests a shift toward models designed for sustained workflows: planning, tool orchestration, and iterative debug-fix-verify loops. Important developer-facing points:

Scale and training corpus increases that may improve multi-step coherence.
Higher quota cost for GLM-5 calls, relevant for integration planning.
Availability of weights and community GGUFs enables local experimentation and quantization workstreams.

Further validation will depend on independent benchmarks covering throughput, degradation under heavy tool-call loads, and behavior across real-world agent scenarios.

Original source: https://x.com/Zai_org/status/2021638634739527773