Zed has published a case for local AI models in a new blog post, arguing that running models on local hardware can offer "absolute certainty" on privacy, more predictable costs, and less dependence on provider pricing or policy changes. The company also points to its own usage data, noting that local model use in Zed’s agent has grown 3x over the last 10 weeks.
The post presents local models as a practical choice for many tasks, while still acknowledging that frontier systems remain the better option when maximum capability is needed. It also frames Zed as platform-agnostic on the AI front, noting support for Codex over ACP, a user’s own API key, or a direct Zed Pro subscription. Zed links that position to its earlier post, “We’re Not Building AI Features for the Money”.
On the plus side, the company argues that local models can keep data on hardware under direct control, reduce exposure to unexpected pricing changes, and give more direct control over system prompts, feature flags, and context windows. The post also suggests that local models are "always available," unlike cloud services that can become harder to rely on if pricing or access changes.
The limits are just as clear. Zed notes that the hardware needed for frontier models remains out of reach for most consumers, and that models suitable for local use are generally less capable and slower. Even so, the post claims that a modern developer laptop can still produce "good results," though not frontier-level ones.
For setup, the post points to free and open source runtimes including LM Studio, Ollama, and llama.cpp, all of which Zed supports. It uses Qwen 3.6 35B A3B as an example model and breaks down the naming: 35B refers to parameter count, A3B to an MoE setup with about 3 billion active parameters per token, and Q4 to quantization that reduces memory use. The post estimates that model at roughly 17.5GB of VRAM, plus additional overhead.
Once a runtime is set up, Zed says the local provider can be added in its config, including an LM Studio endpoint at [http://localhost:1234/api/v0,](http://localhost:1234/api/v0`,) with the runtime started via lms server start. For Ollama, llama.cpp, and other OpenAI-compatible systems, the post says Zed’s built-in Ollama provider can be used. After that, downloaded models should appear in the agent’s model selector.
Zed also advises a more careful workflow when using smaller local models. According to the post, they are less "clever" than frontier systems and usually operate with smaller context windows, so it recommends editing earlier messages rather than piling on corrections, leaning more on subagents for minor work, and experimenting with different models, temperatures, and context-window sizes. The post ends by inviting readers to share setups in the company’s Discord.
Source: Zed


