Show HN: IgniteMS – batch text embeddings at 253K msg/s on 8x A100 (github.com)

🤖 AI Summary
IgniteMS, a new batch text embedding engine, boasts an impressive performance of 253,000 messages per second on an 8x A100 GPU setup, making it up to three times faster than Hugging Face's Text Embedding Interface (TEI) on the same hardware. Designed specifically for high-throughput scenarios such as vector database reindexing and large-scale text processing, IgniteMS can sustain 357,893 messages per second in a real production environment with workload-specific optimizations. This substantial speed advantage allows organizations to efficiently manage embedding tasks for millions of texts, particularly useful in applications involving frequently updated models. Key technical features of IgniteMS include its Rust-based architecture, which eliminates Python during runtime to improve performance, and TensorRT compilation for optimized GPU execution. The engine leverages innovative techniques such as bucketed batching to minimize padding waste and maintains a CPU-side pipeline that synchronously handles tokenization and GPU dispatch. The multi-GPU capability allows for effective load distribution, further enhancing throughput. With cost efficiency in mind, IgniteMS operates at around $0.01 per million messages embedded, significantly lower than competing solutions like OpenAI's text-embedding-3-small. Overall, IgniteMS represents a significant advancement in the landscape of high-speed text embeddings, challenging existing benchmarks and offering substantial performance improvements for AI/ML workloads.
Loading comments...
loading comments...