Qwen launches Qwen3.5 Small models: 0.8B to 9B

Qwen 3.5 Small Model Series is now out in four sizes—Qwen3.5-0.8B, 2B, 4B, and 9B—framing the release around a familiar developer tradeoff: getting more capability out of fewer parameters and less compute. Alongside the chat-oriented releases, Qwen also says it’s shipping Base models for the same sizes, positioning the lineup for research and fine-tuning work rather than only drop-in assistant usage.

What Qwen is shipping

The series is presented as a set of “small models” built on the same Qwen3.5 foundation, with native multimodal support, an improved architecture, and scaled RL.

Qwen’s own sizing guidance is straightforward:

0.8B / 2B: positioned as tiny and fast, aimed at edge device scenarios
4B: described as a strong multimodal baseline for lightweight agents
9B: framed as compact, while “closing the gap” with much larger models

Distribution links were shared for both major model hubs: Hugging Face and ModelScope.

Local-first tooling shows up quickly

Notably, the models also landed in Ollama the same day. In a separate post, Ollama said the Qwen 3.5 small models are available there and that all models support native tool calling, thinking, and multimodal capabilities in Ollama, with run commands for each size:

ollama run qwen3.5:9b
ollama run qwen3.5:4b
ollama run qwen3.5:2b
ollama run qwen3.5:0.8b

Early community signals: edge, agents, and practical throughput

The initial replies and quotes quickly centered on local hosting and agent workflows, especially around the 4B and 9B sizes. One developer called out the 9B as “solid for local hosting,” while another highlighted the 4B as an “embedded agent” sweet spot for multi-step tool calls.

There were also early performance anecdotes. Petri Kuittinen reported running Qwen3.5-9B at ~30 token/s on an “AMD Ryzen™ AI Max+ 395” using Q4_K_XL quantization and a 256k context window, adding that it required less than 16 GB VRAM in that setup. Separately, Unsloth AI said Qwen3.5 Small models can be run locally on a 6GB Mac / RAM device via their GGUFs, linking here: https://t.co/7Jmp13uYfU.

As often happens with small-model drops, the thread also filled with practical questions—minimum hardware, GPU speed expectations, and Mac support—suggesting immediate interest in deploying these models outside the datacenter.

Source: https://x.com/Alibaba_Qwen/status/2028460046510965160

Qwen launches Qwen3.5 Small models: 0.8B to 9B

TL;DR

What Qwen is shipping

Local-first tooling shows up quickly

Early community signals: edge, agents, and practical throughput

Continue the conversation on Slack