Qwen debuts 3.5 Medium models with Flash and 1M context

Qwen has just rolled out its Qwen 3.5 Medium Model Series, featuring open-weight releases and a hosted Flash option built for production. Highlights include a 1M context window, built-in tools, and “medium” sizes aimed at local inference.

Qwen debuts 3.5 Medium models with Flash and 1M context

TL;DR

  • Introduced Qwen 3.5 Medium Model Series: Qwen3.5-Flash, 35B-A3B, 122B-A10B, 27B
  • Positioning: more capability at lower compute, plus hosted option for production deployments
  • Qwen3.5-35B-A3B claimed to surpass Qwen3-235B-A22B-2507 and Qwen3-VL-235B-A22B
  • Distribution: open weights on Hugging Face and ModelScope; Qwen3.5-Flash API https://t.co/82ESSpaqAF
  • Qwen3.5-Flash: hosted production variant aligned with 35B-A3B, 1M context by default, official built-in tools

One week after the announcement of the Qwen Plus model, Qwen has introduced the Qwen 3.5 Medium Model Series, a set of new models positioned around a familiar developer-friendly theme: more capability at lower compute, plus a hosted option aimed at production deployments.

The lineup includes Qwen3.5-Flash, Qwen3.5-35B-A3B, Qwen3.5-122B-A10B, and Qwen3.5-27B. In its announcement, Qwen highlighted that Qwen3.5-35B-A3B surpasses Qwen3-235B-A22B-2507 and Qwen3-VL-235B-A22B, framing it as evidence that architecture, data quality, and RL can matter as much as scaling parameter counts.

What’s in the Qwen 3.5 Medium series

The emphasis this time is split between two tracks:

  • Open-weight releases via model hubs
  • A hosted “Flash” variant meant to be used directly as an API-backed model

Qwen links to distribution on both Hugging Face and ModelScope, alongside a dedicated Qwen3.5-Flash API endpoint at https://t.co/82ESSpaqAF.

Qwen3.5-Flash: long context and built-in tools

For teams thinking in terms of agents and tool-using workflows, Qwen3.5-Flash is described as the hosted production version aligned with 35B-A3B. Two details stand out in the published bullet points:

  • 1M context length by default
  • Official built-in tools

Qwen also provided links to run the models in Qwen Chat, including Flash (https://t.co/UkTL3JZxIK), 27B (https://t.co/haKxG4lETy), 35B-A3B (https://t.co/Oc1lYSTbwh), and 122B-A10B (https://t.co/hBMODXmh1o).

Why developers are paying attention: “medium” models for local inference

A notable thread in replies centers on practicality: sizes that fit into local workflows without abandoning ambitious tasks. One response from Unsloth AI claims Qwen3.5-35B-A3B can run locally on a 24GB Mac/RAM device via GGUFs, pointing to https://t.co/5G4DYTCGtL. Others asked whether GGUFs could be shipped alongside the main release to shorten the time between official weights and widely usable local quantizations.

There were also repeated questions about smaller “tiny” variants (1B/3B/7B), along with general curiosity about how these models compare in agentic coding and tool-heavy settings—areas where Qwen explicitly says the 122B-A10B and 27B are “narrowing the gap” with frontier models, “especially in more complex agent scenarios.”

Source: https://x.com/Alibaba_Qwen/status/2026339351530188939

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community