Qwen 3 now supports ARM and MLX (www.alizila.com)

🤖 AI Summary
Alibaba’s Qwen3 family — its first “hybrid reasoning” models — has expanded platform support to Apple’s MLX (for Apple silicon) and Arm CPUs, and released 32 open-source variants available in 4-bit, 6-bit, 8-bit and BF16 quantizations. These lightweight builds let developers run Qwen3 on Mac Studio, MacBook and iPhone devices, cutting memory, power use and latency for on‑device inference. Major chip partners have also integrated Qwen3: NVIDIA demonstrated up to 16.04x higher throughput for Qwen3-4B using TensorRT-LLM BF16 (with frameworks like Ollama, SGLang and vLLM), AMD supports large models (Qwen3-235B/32B/30B) on Instinct MI300X, Arm optimized small models (0.6B/1.7B/4B) with KleidiAI + Alibaba’s MNN, and MediaTek’s Dimensity 9400+ uses SpD+ to boost agent inference ~20%. The practical effect is broader, cheaper edge AI and smoother scaling to data centers: quantized Qwen3 variants enable faster, lower-cost deployment for multimodal reasoning, tool-calling and multilingual tasks, while ecosystem integrations unlock optimized inference stacks across GPUs, CPUs and mobile SoCs. Enterprise uptake — Lenovo’s Baiying agent and FAW’s OpenMind, plus ~290,000 Model Studio customers — shows real-world adoption in consumer electronics, automotive, healthcare and robotics, lowering barriers to production-grade LLMs on-device and in hybrid cloud/edge scenarios.
Loading comments...
loading comments...