Show HN: iPhone ANE holds LLM tok/s while MLX and LiteRT thermal-throttle (github.com)

0 points 2 hours ago ago | visit original

🤖 AI Summary

A new on-device benchmark for evaluating local large language models (LLMs) on Apple Silicon was announced, showcasing the capabilities of the latest iPhone 17 Pro, iPad, and Mac models. The benchmark, housed in the repository apple-silicon-llm-bench, provides a neutral comparison across various runtimes such as MLX Swift, llama.cpp, CoreML, and Apple's Foundation Models. A standout result showed that Google’s LiteRT-LM, optimized for Apple's hardware, outperformed MLX Swift in decoding speed, while also consuming significantly less memory. This benchmark marks a crucial advancement as it offers real-world performance metrics rather than idealized server-based benchmarks. The findings are significant for the AI/ML community as they highlight the intricacies of on-device LLM performance under actual device constraints, revealing that runtime efficiency varies dramatically based on the model and conditions. For instance, while MLX Swift and LiteRT-LM excelled in raw decode speeds, CoreML leveraged the Neural Engine to minimize memory usage, showcasing the trade-off between throughput and efficiency. The implications of these results suggest that developers can optimize their applications by selecting appropriate LLM strategies depending on whether they prioritize performance or resource efficiency on mobile devices. This initiative not only sets a benchmark for future developments but also emphasizes the potential of on-device processing in AI.

Loading comments...

loading comments...