Awesome Streaming Collection (github.com)

🤖 AI Summary
Awesome Streaming Collection is a curated, regularly updated directory of open-source streaming engines, libraries, databases, tooling and readings assembled at manuzhang.github.io/awesome-streaming. It aggregates a broad cross-section of projects—from battle-tested JVM systems (Apache Flink, Spark Streaming, Kafka Streams) and messaging platforms (Apache Kafka, Pulsar) to modern Rust engines (Arroyo, RisingWave), Python frameworks (Bytewax, Faust), edge/IoT runtimes (Kuiper, YoMo), streaming databases (HStreamDB, RisingWave) and specialized tooling like CocoIndex for real-time AI indexing. Categories cover engines, libraries, DSLs, streaming SQL, online ML, toolkits, benchmarks and closed-source enterprise offerings. For AI/ML practitioners this map is a practical cheat-sheet for building low-latency, stateful pipelines and real-time feature/ETL infrastructure. The collection highlights technical patterns you’ll care about: stateful operations (windows, joins), checkpointing and exactly-once semantics, CDC and sub-second analytics (RisingWave), Timely Dataflow models (Arroyo), Kubernetes-native platforms (Numaflow), CPU/GPU hybrid processing (SABER), and edge-first runtimes for geographically distributed inference. It’s useful when selecting a stack for fresh feature stores, online learning, streaming SQL analytics or LLM-serving pipelines—letting teams compare language ecosystems (Java/Scala/Python/Rust/Go/C++), runtime guarantees and deployment targets (cloud, edge, k8s) to match latency, throughput and operational constraints.
Loading comments...
loading comments...