Running a 35B MoE model on a 2017 AMD RX 580 8GB via Vulkan (no ROCm/CUDA) (github.com)

🤖 AI Summary
In a groundbreaking demonstration, AIVisionsLab has successfully run a 35 billion parameter mixture of experts (MoE) model on a 2017 AMD RX 580 GPU using the Vulkan API, bypassing traditional platforms like CUDA and ROCm. This achievement is particularly notable as the mainstream AI ecosystem had largely abandoned support for older AMD GPUs like the RX 580, often discouraging users from utilizing such hardware for AI workloads. By compiling models directly with Vulkan support, the team has managed to enable impressive performance metrics, achieving around 17 tokens per second for large language model (LLM) inference and generating images in approximately 72 seconds with Stable Diffusion. This project is significant for the AI/ML community as it demonstrates the potential of outdated hardware in AI scenarios, stressing that innovative software solutions can maximize compute resources without the need for expensive modern GPUs. The findings highlight an efficient dual-path architecture that routes workloads between the GPU and CPU, overcoming the limitations of VRAM availability. The repository chronicling this effort not only serves as a technical guide but also emboldens the community to explore alternatives to mainstream solutions, especially for personal and budget-conscious developers aiming to leverage AI technology on existing hardware.
Loading comments...
loading comments...