Z.ai's GLM-5 Targets Agentic, Long-Horizon Code and Tool Workflows

Z.ai’s GLM-5 ups scale and training data to prioritize agentic, long-horizon code and tool workflows. Its phased paid rollout, public weights, and mixed community feedback highlight trade-offs in performance, quota costs, and agentic reliability.

Z.ai's GLM-5 Targets Agentic, Long-Horizon Code and Tool Workflows

TL;DR

  • Positioning: GLM-5 targets agentic engineering for long-horizon, tool-driven, multi-step systems work
  • Scale and data: Variants range 355B (32B active) → 744B (40B active); pre-training data 23T → 28.5T tokens
  • Benchmarks: Internal CC-Bench-V2 wins vs GLM-4.7; Vending Bench 2 top open-source at $4,432
  • Performance feedback: Mixed community reports; some praise coding/multi-agent behavior, others cite latency/throughput issues
  • Rollout and quotas: Phased access for GLM Coding Plan; Max plan can switch model name to “GLM-5”; higher quota cost than GLM-4.7
  • Weights and ecosystem: Weights posted (https://t.co/DteNDHjSEh), ModelScope mirror (https://t.co/SSPREfjt9f), day-zero Baseten (https://t.co/pw8FuUjx4V), community GGUFs (https://t.co/2h7UIRq9wJ)

Z.ai’s GLM-5 arrives with a focus on long-horizon, agentic tasks

Z.ai unveiled GLM-5, positioning the model as a step away from “vibe coding” and toward agentic engineering for multi-step systems work. The release highlights expanded scale and training data, targeted benchmark wins, and a staggered rollout for paid subscribers.

What changed under the hood

  • Scales from 355B params (32B active) to 744B (40B active) across variants.
  • Pre-training data increased from 23T to 28.5T tokens.

Those figures frame GLM-5 as a materially larger iteration compared with GLM-4.5, with architecture and training adjustments aimed at sustained, tool-driven workflows rather than single-turn code completions.

Benchmarks and claims

Z.ai reports GLM-5 beating GLM-4.7 on an internal suite, CC-Bench-V2, with wins across frontend, backend, and long-horizon tasks (benchmark thread). On the simulated economic task Vending Bench 2, GLM-5 finished top among open-source models with a final balance of $4,432, approaching closed-source models on long-term planning metrics (Vending Bench 2 tweet).

Early community tests and comparisons are varied: several users praise coding and multi-agent behavior, while others call out latency and throughput issues — notably slower tokens-per-second performance in some deployments.

Availability, rollout, and weights

Access is being phased in for GLM Coding Plan subscribers due to compute limits:

  • Max plan users can enable GLM-5 immediately by changing the model name to "GLM-5" (example: ~/.claude/settings.json for Claude Code).
  • Other tiers will gain access progressively; requests to GLM-5 consume more quota than GLM-4.7. (Rollout note: https://t.co/Nk8Y98Il7s)

Weights and model files are publicly linked:

Additional resources linked from the release include a tech blog (tech blog), an OpenRouter listing (previously Pony Alpha, https://t.co/7Khf64Lxg6), and a “try it now” endpoint in Z.ai’s ecosystem (https://t.co/WCqWT0raFJ).

Community reaction and ecosystem activity

Response to GLM-5 is mixed. Positive notes include rapid third-party support (day-zero Baseten integration: https://t.co/pw8FuUjx4V), community uploads of GGUFs for local runs (https://t.co/2h7UIRq9wJ), and free access pathways promoted by smaller providers. Criticisms center on pricing and plan changes, reported plan downgrades, and slower inference speeds; some users flagged steep price increases and quota concerns. There are also requests for deeper reliability evaluations on multi-step agent workflows and tool-call robustness.

Developer implications

GLM-5’s stated emphasis on agentic engineering and extended tool-call sessions suggests a shift toward models designed for sustained workflows: planning, tool orchestration, and iterative debug-fix-verify loops. Important developer-facing points:

  • Scale and training corpus increases that may improve multi-step coherence.
  • Higher quota cost for GLM-5 calls, relevant for integration planning.
  • Availability of weights and community GGUFs enables local experimentation and quantization workstreams.

Further validation will depend on independent benchmarks covering throughput, degradation under heavy tool-call loads, and behavior across real-world agent scenarios.

Original source: https://x.com/Zai_org/status/2021638634739527773

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community