OMLX v0.3.9 Stable Merges Native MTP (Multi-Token Prediction) (github.com)

🤖 AI Summary
The latest release of OMLX v0.3.9 introduces significant advancements, notably the native Multi-Token Prediction (MTP) feature. This upgrade enhances models like Qwen3.5/3.6, Gemma 4, and DeepSeek-V4, allowing them to predict multiple tokens simultaneously, leading to faster decoding speeds. Users can enable this feature per model in the admin settings; however, it remains off by default. The integration of MTP into Gemma 4 also accelerates image and text request processing on the vision path, demonstrating a clear improvement in efficiency for AI applications. Significantly, this release addresses previous stability issues, particularly on low-memory devices, by implementing a memory enforcer that helps prevent server crashes under heavy loads. Enhanced caching systems and sophisticated prefill methods now facilitate concurrent requests and maintain smooth operations even with high-demand workloads. The update also supports new features such as one-command coding agents and multi-tasking for chat interfaces, broadening the functionality for developers. Collectively, these enhancements position OMLX v0.3.9 as a more robust, user-friendly tool that empowers the AI/ML community to develop innovative applications with improved speed and reliability.
Loading comments...
loading comments...