Running a Local LLM Coding Server on MacBook Pro M5 Pro 48 GB (blog.kulman.sk)

🤖 AI Summary
A tech enthusiast successfully set up a local AI coding server on their MacBook Pro M5 Pro with 48 GB of memory, circumventing cloud dependencies and API costs. They aimed to deploy an OpenAI-compatible API endpoint using the Qwen 3.6 model, which achieved performance metrics comparable to leading cloud models while running entirely on local hardware. Initial attempts with the mlx-lm server led to frequent crashes due to a known memory management issue in long conversations, prompting a switch to the more stable Ollama platform. The final setup utilized the Qwen 3.6 35B model with a Metal-optimized mxfp8 quantization, resulting in a significant improvement in code generation quality over earlier attempts. With Ollama’s fixed context size to manage memory more effectively, the server operated smoothly without crashes, making it a reliable option for developers. This experience highlights the potential for effectively harnessing local resources in AI applications while encountering and overcoming challenges in memory management and model optimization on consumer hardware.
Loading comments...
loading comments...