Qwen 3.5 Max Preview climbs leaderboards, hits #3 Math

Alibaba’s Qwen 3.5 Max Preview is surfacing in new leaderboard chatter, including #3 in Math and a Top 15 overall placement. Developers are pressing for real-world availability in Qwen Chat/Code, API access, benchmarks, and faster coding performance.

qwen cover

TL;DR

  • Leaderboard placements: #3 in Math, Top 10 in Arena Expert, Top 15 overall for Qwen 3.5 Max Preview
  • Positioning: Preview build; Qwen working on “optimizing the preview experience,” with “sharper performance” later
  • Access questions: Requests for availability in Qwen Chat, Qwen Code, Alibaba API, and Alibaba Cloud
  • Evidence requests: Calls for a blog post and detailed benchmarks alongside leaderboard ranks
  • Developer focus: Questions on improved coding logic, complex API integrations, and nested function calls
  • Operational concern: Complaints about token generation speed in a paid coding plan being slow

Qwen 3.5 Max Preview is starting to show up in leaderboard chatter again, with the Qwen team pointing to fresh placements that—at least on paper—suggest a meaningful step up in reasoning-heavy evaluations.

In a post on X, the official Qwen account said the “Max Preview” variant recently reached #3 in Math, landed in the Top 10 in “Arena Expert”, and placed Top 15 overall. The same post framed the release as a preview build and noted that work is already underway on “optimizing the preview experience,” with “sharper performance” promised later.

What the rankings are (and what people are asking for)

The announcement itself focused on rank positions rather than detailed methodology, but the replies quickly converged on practical questions—especially around where this model actually shows up in day-to-day tooling.

Several developers and users asked whether Qwen 3.5 Max Preview is available in Qwen Chat and Qwen Code, and whether it’s already accessible through an Alibaba API or Alibaba Cloud. Others asked for a blog post and benchmarks to accompany the leaderboard placements.

Coding performance comes up immediately

Even with a strong Math placement, multiple replies pressed on whether the preview model improves “actual coding logic” over prior versions. Another thread asked about performance on complex API integrations and nested function calls, pointing to context switching and multi-step coding workflows as the place where many models still stumble.

One reply also flagged a more operational concern: token generation speed, with a complaint that output throughput in a paid coding plan felt “ridiculously slow.”

Open source strategy remains a recurring question

A notable slice of the replies wasn’t about math or arena scores at all—it was about distribution and licensing. People asked whether Qwen will continue open-sourcing models, and whether there will be an open source Qwen 4. The thread, as posted, doesn’t include answers to those questions.

A cautious read on “Math #3” in a preview

Not everyone treated the overall rank as the headline. One reply argued that #3 in Math is the part to watch, since math-style evaluations can correlate with structured reasoning that tends to matter in coding, analysis, and multi-step tasks—while also noting that “Preliminary” leaderboard labels and confidence intervals can shift.

For now, what’s concrete is limited to those reported placements and the fact that Qwen is iterating on the preview.

Source: https://x.com/Alibaba_Qwen/status/2034658901321560549

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community