Running Claude Code Offline on an M3 Pro with Qwen3.6 (har-ki.github.io)

🤖 AI Summary
A new technical setup allows users to run Claude Code offline on an Apple M3 Pro chip, utilizing the Qwen3.6 model. Initially, the model struggled to deliver results due to timeout issues, but after implementing four crucial software fixes, it successfully conducted an investigation and generated a pull request—all while ensuring data privacy by keeping operations air-gapped. This achievement emphasizes the importance of local processing in regulated environments, allowing users to maintain control over sensitive data without cloud dependency. Key technical details revealed that the M3 Pro’s 18 GPU cores and 36 GiB of unified memory significantly influence the system's performance, impacting metrics like prefill times and memory constraints. The model employs a mixture-of-experts (MoE) technique, allowing a 35B model to function efficiently by activating only 3B parameters at a time, thus balancing performance and resource usage. Each session's duration is primarily affected by prefill time, which constitutes over 90% of the overall processing duration. The implications for the AI/ML community are profound, as this setup demonstrates a viable path for offline AI model execution, offering enhanced speed and privacy for complex incident investigations in environments where cloud connectivity is either impractical or undesirable.
Loading comments...
loading comments...