Qwen 3.5 Small Model Series is now out in four sizes—Qwen3.5-0.8B, 2B, 4B, and 9B—framing the release around a familiar developer tradeoff: getting more capability out of fewer parameters and less compute. Alongside the chat-oriented releases, Qwen also says it’s shipping Base models for the same sizes, positioning the lineup for research and fine-tuning work rather than only drop-in assistant usage.
What Qwen is shipping
The series is presented as a set of “small models” built on the same Qwen3.5 foundation, with native multimodal support, an improved architecture, and scaled RL.
Qwen’s own sizing guidance is straightforward:
- 0.8B / 2B: positioned as tiny and fast, aimed at edge device scenarios
- 4B: described as a strong multimodal baseline for lightweight agents
- 9B: framed as compact, while “closing the gap” with much larger models
Distribution links were shared for both major model hubs: Hugging Face and ModelScope.
Local-first tooling shows up quickly
Notably, the models also landed in Ollama the same day. In a separate post, Ollama said the Qwen 3.5 small models are available there and that all models support native tool calling, thinking, and multimodal capabilities in Ollama, with run commands for each size:
ollama run qwen3.5:9bollama run qwen3.5:4bollama run qwen3.5:2bollama run qwen3.5:0.8b
Early community signals: edge, agents, and practical throughput
The initial replies and quotes quickly centered on local hosting and agent workflows, especially around the 4B and 9B sizes. One developer called out the 9B as “solid for local hosting,” while another highlighted the 4B as an “embedded agent” sweet spot for multi-step tool calls.
There were also early performance anecdotes. Petri Kuittinen reported running Qwen3.5-9B at ~30 token/s on an “AMD Ryzen™ AI Max+ 395” using Q4_K_XL quantization and a 256k context window, adding that it required less than 16 GB VRAM in that setup. Separately, Unsloth AI said Qwen3.5 Small models can be run locally on a 6GB Mac / RAM device via their GGUFs, linking here: https://t.co/7Jmp13uYfU.
As often happens with small-model drops, the thread also filled with practical questions—minimum hardware, GPU speed expectations, and Mac support—suggesting immediate interest in deploying these models outside the datacenter.
Source: https://x.com/Alibaba_Qwen/status/2028460046510965160
