RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8 (imil.net)

0 points 11 hours ago ago | visit original

🤖 AI Summary

In a recent setup demonstration, an AI enthusiast successfully combined an RTX 5080 and a refurbished RTX 3090 to achieve impressive performance metrics while running the Qwen 3.6 model. By utilizing a specialized Asus Prime X570-Pro motherboard, which allowed both GPUs to operate efficiently, the user achieved token processing speeds surpassing 80 tokens per second (tok/s) with proper BIOS configurations and kernel parameters. This setup highlights the potential for leveraging different generations of GPUs in tandem to enhance AI performance, particularly in local language model deployments. This achievement is significant for the AI/ML community as it showcases the practical benefits of utilizing both established and newer GPU technologies for optimizing machine learning workloads. The detailed technical specifications and configurations required, such as enabling Above 4G Decoding and ensuring link modes are set to Gen 4, provide valuable insights for developers and researchers looking to maximize their hardware resources. The success of running a large language model efficiently on mixed GPU hardware also opens the door for further exploration into multi-GPU setups, potentially reducing costs while boosting inference speeds.

Loading comments...

loading comments...