Hugging Face cofounder touts Qwen 27B on MacBook Pro

Hugging Face cofounder Julien Chaumond says running Qwen3.6 27B locally via Llama.cpp in Pi felt “pretty magical,” nearing Claude Opus for real code tasks. Replies quickly honed in on RAM, speed, and battery life trade-offs.

qwen cover

TL;DR

  • Local coding stack: Qwen3.6 27B in Pi coding agent via Llama.cpp on a MacBook Pro
  • Offline coding claim: Near-latest Opus in Claude Code for non-trivial Hugging Face tasks in “full airplane mode”
  • “Second revolution of AI” framing: powerful local models for efficiency, security, privacy, sovereignty
  • User questions: RAM, quantization choice, throughput, battery life, and specific MacBook Pro configuration
  • Reported trade-offs: 27B slow on 128GB M4 Max Studio; multi-file refactors hit context limits quickly
  • Community performance datapoint: ~7 tokens/sec on a 32GB M4 Mac Mini; usable if time permits

In a post on X, Hugging Face cofounder Julien Chaumond claimed that running Qwen3.6 27B inside Pi coding agent via Llama.cpp on a MacBook Pro felt “pretty magical,” and that, for non-trivial tasks on Hugging Face codebases, it came “very, very close” to the latest Opus in Claude Code while operating in “full airplane mode.”

Chaumond went on to describe this as part of what he called the “second revolution of AI,” centered on “powerful local models for efficiency, security, privacy, sovereignty.” That assertion drew a wave of replies from users who questioned the laptop setup, RAM requirements, quantization choice, battery life, and throughput. One commenter joked that the battery would be “drained in 20min max,” while others asked what MacBook Pro configuration was being used.

Several replies also pointed to the trade-offs that still appear to limit local coding models. One user mentioned that a 27B model felt slow even on a 128GB M4 Max Studio, though it might be acceptable on a plane without network access. Another noted that local models can be useful for quick iterations, but that sustained multi-file refactors still hit context limits quickly. A different commenter reported roughly 7 tokens per second on a 32GB M4 Mac Mini, calling it usable if time is not an issue.

The thread suggests a renewed interest in local AI setups with promising outcomes in the medium term, but it also shows that performance, battery life and session length remain the practical questions around them, according to the users discussing Chaumond’s post.

Source: Julien Chaumond on X

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community