Show HN: Rust BPE tokenizer for Qwen models that's 12x faster than HuggingFace (github.com)

0 points 1 day ago ago | visit original

🤖 AI Summary

Sweep AI released bpe-qwen, a Rust-implemented BPE tokenizer tuned for Qwen models that claims major speed wins over HuggingFace’s tokenizers. By leveraging the rust-gems BPE crate, a linear-time tokenization algorithm, and an optimized two-pass pretokenization tailored to Qwen’s pattern, bpe-qwen achieves about 6x faster encoding out of the box and up to ~12x faster with parallelism (6.4M vs 1.02M tokens/sec sequential; 33.08M vs 2.64M tokens/sec parallel). It also reports ~2x faster decoding and 100% token consistency across a comprehensive test suite (including special tokens), making it a low-friction, drop-in replacement for HF tokenizers via Python bindings (PyO3) and an AutoLinearTokenizer that works with transformers. Technically, the project supports native BPE format (vocab.json + merges.txt), batch processing, explicit SIMD intrinsics, custom allocators, early stopping, and other production-oriented optimizations to reduce latency and CPU/memory overhead in large-scale ML pipelines. Integration is straightforward (pip install bpe-qwen or build with maturin) but requires vocab.json/merges.txt (not tokenizer.json) and has a known caveat: some multi-byte UTF‑8 characters aren’t handled correctly yet. For teams facing tokenization bottlenecks in inference or preprocessing, this offers a practical, high-performance alternative with verified consistency.

Loading comments...

loading comments...