14× faster embeddings: how we rebuilt the ONNX path in Manticore (manticoresearch.com)

🤖 AI Summary
Manticore has introduced a revamped ONNX Runtime backend in its Manticore Search 27.1.5 release, significantly enhancing the speed of its Auto Embeddings feature, which automatically converts text columns into vector embeddings. The new path boasts an impressive average speed improvement of around 14 times compared to the previous SentenceTransformers/Candle setup, achieving processing rates of 70–230 documents per second, compared to the previous 5–11 documents per second. This change optimally leverages the full CPU capabilities of affordable server hardware, improving the efficiency of both single and concurrent processing. The significance of this upgrade lies in its potential to enhance database ingest throughput directly tied to embedding speed—a crucial factor for applications relying on real-time data processing. By eliminating the bottlenecks associated with locking mechanisms, reducing overhead from batching, and turning off unnecessary CPU spinning during calls, Manticore has embraced a new concurrency model that preserves responsiveness and maximizes resource utilization. This evolution not only improves user experience by speeding up insert operations but also positions ONNX as a more suitable choice for production scenarios involving small encoder models, facilitating better embedding quality with reduced computational demands.
Loading comments...
loading comments...