SymSpell C99: Building the Fastest Spell Checker in Pure C (suman-pokhrel.com.np)

🤖 AI Summary
SymSpell C99 is the first pure C99 implementation of Wolf Garbe’s SymSpell algorithm, released open source to deliver ultra-low-latency spell checking with zero dependencies and POSIX portability. Built as a compact ~700-line library with an 86,060-word dictionary, it precomputes ~688,710 deletion keys into a custom hash table (xxHash3 + open addressing, linear probing) to turn expensive candidate generation into a few O(1) lookups. The practical result: correctly spelled words hit a 0.7µs fast path, misspelling corrections complete in ~30µs worst-case, and a real-world average lookup time of ~5µs; memory footprint is ~45MB and measured correction accuracy is ~82–84% across standard misspelling corpora. Technically, SymSpell C99 trades memory for lookup speed by storing deletions of dictionary terms and ranking candidates by edit distance, frequency, and an IWF score. The project includes a production-ready toolchain—dictionary builder, benchmarks, tests (Valgrind-clean, -Wall/-Wextra/-Werror), and a documented C API ideal for FFI into Python/Rust/Go. Notable engineering work: a portable POSIX compatibility layer, ARM64 inline-assembly fixes that improved performance ~34.6%, and careful load-factor tuning for optimal hash performance. Repository and usage examples are available at github.com/sumanpokhrel-11/symspell-c99.
Loading comments...
loading comments...