GLM-4.7: Z.AI’s $3/month Coding AI with 200K Context & Thinking Modes

Overview

Z.AI’s GLM-4.7 has been introduced as the core model in the GLM Coding Plan subscription, positioned for AI-assisted coding workflows and integrated into tools such as Claude Code, Cline, OpenCode, and Roo Code. The plan is listed as starting at $3/month. GLM-4.7 emphasizes end-to-end task delivery across multi-step development scenarios while offering expanded context and output capacity.

Key technical details

Input/Output: text-only
Context length: 200K
Maximum output tokens: 128K
Notable capabilities: thinking modes, real-time streaming, function calling, context caching, structured output

These capabilities aim to support more complex interactions, longer conversations, and integration with external toolchains.

What GLM-4.7 focuses on

GLM-4.7 shifts emphasis from single-turn code snippets toward complete task completion: understanding requirements, decomposing solutions, and producing executable, multi-file or multi-stack code when needed. The model introduces a “think before acting” pattern in programming contexts to stabilize multi-step reasoning and agent-based execution. Several thinking/reasoning strategies are highlighted:

Interleaved reasoning — performs reasoning before each response or tool invocation to improve instruction compliance.
Retention-based reasoning — preserves reasoning blocks across multi-turn sessions to improve cache hit rates and reduce computation for long tasks.
Round-level reasoning — allows control over reasoning overhead per session round (e.g., toggling for latency vs. accuracy trade-offs).

Usage scenarios

GLM-4.7 is presented for a range of developer-focused and creative tasks:

Agentic coding: full-stack prototypes and demos where frontend, backend, and peripheral interactions must interoperate.
Multimodal real-time apps: camera input, gesture control, and interactive controls combined with application logic.
Web UI generation: automated layout and styling with more consistent default aesthetics.
High-quality dialogue and collaboration: improved context retention for multi-turn exchanges.
Creative content and office assets: enhanced prose for narratives and better PPT/poster layout generation.

The documentation highlights improved default frontend aesthetics—layout structures, color harmony, and component styling—to reduce manual fine-tuning for prototypes and low-code tools. For office outputs, PPT 16:9 compatibility reportedly rose from 52% to 91%.

Benchmarks and improvements

GLM-4.7 reports measurable gains against prior GLM releases and other models across multiple benchmarks and internal evaluations:

Programming & agent skills: claims of parity with Claude Sonnet 4.5 on several coding benchmarks.
Tool invocation: 67 on BrowseComp and 84.7 on the τ²-Bench interactive tool invocation benchmark (open-source SOTA in the document).
Reasoning: 42.8% on HLE (a reported 41% improvement vs. GLM-4.6), and stated to surpass GPT-5.1 on that metric.
Coding benchmarks: top open-source scores on SWE-bench (73.8% Verified, a +5.8% lift), SWE-bench Multilingual (66.7%, +12.9%), LiveCodeBench V6 (open-source SOTA 84.9), and Terminal Bench 2.0 (41%, +16.5%).
Code Arena: ranked first among open-source and domestic models in blind tests cited in the documentation.

All metric values and comparisons are taken from the source documentation.

Developer quick start

The documentation provides ready examples for calling the chat/completion endpoint using cURL, the official Python and Java SDKs, and the OpenAI-style Python SDK. The calls typically demonstrate enabling the thinking mode by setting a thinking parameter to "enabled" and optionally using streaming (stream: true) for incremental responses.

A developer-focused API reference link is provided for implementation details: API Documentation. The GLM-4.7 quick-start sections include sample cURL requests, Python and Java snippets, and SDK installation instructions.

Resources

Thinking mode and related capabilities: Thinking Mode documentation
API reference for chat completions: API Documentation
Subscription and plan details: GLM Coding Plan

Original source: https://docs.z.ai/guides/llm/glm-4.7