GPT-OSS 20B running on a phone (simonwillison.net)

0 points 1 day ago ago | visit original

🤖 AI Summary

Nexa AI demonstrated in a short video that the 20-billion-parameter GPT-OSS model can run locally on a Snapdragon Gen 5 phone using their Nexa Studio Android app. They say GPT-OSS 20B delivers results comparable to OpenAI’s o3‑mini on common benchmarks and the phone run requires at least 16 GB of RAM. The trick enabling this is Snapdragon’s ability to share system RAM between CPU and GPU—similar to Apple Silicon’s unified memory—so the device can allocate enough contiguous memory to host the model; by contrast, the iPhone 17 Pro Max’s 12 GB of RAM is likely insufficient. This is significant because it pushes a 20B LLM into true edge-device territory, pointing toward more private, low-latency, and cost-efficient on-device inference without constant server calls. Key technical implications include the importance of unified memory architectures and adequate RAM on mobile SoCs for hosting large models, and that mainstream 20B models can be feasible on phones if the hardware exposes sufficient shared memory. It doesn’t eliminate constraints—16 GB is still a high bar and the demonstration doesn’t detail quantization or runtime optimizations—but it signals rapid progress toward practical, high-quality LLM inference on consumer devices.

Loading comments...

loading comments...