Show HN: RDMA/Infiniband Distributed Cache for Fast Inference and Training (github.com)

0 points 3 days ago ago | visit original

🤖 AI Summary

Blackbird is an open-source, RDMA-first distributed cache announced on Show HN that targets low-latency ML training/inference, HPC, real-time analytics and feature stores. It combines UCX (RoCE/InfiniBand) zero-copy transfers with a multi-tier cache (GPU memory → CPU DRAM → NVMe), a minimal control plane (Keystone) backed by etcd for discovery and leader election, and topology-aware placement/load balancing. The project aims to deliver sub-microsecond fast paths and high throughput via batched APIs, while reducing CPU overhead thanks to RDMA offload—positioning itself as a lightweight alternative to Redis/Memcached (which lack RDMA/tiering) and to heavier systems like Alluxio. Key technical points: UCX (RDMA) with TCP fallback and registered memory/region descriptors for remote keys; Keystone provides object metadata, placement, TTL/soft-pin semantics, garbage collection and health heartbeats; metrics and observability are Prometheus-compatible. The stack is C++20 (GCC ≥10/Clang ≥12), CMake ≥3.20, UCX ≥1.12 and etcd ≥3.4. It’s currently alpha (APIs may change) and targets 100+ node scale; roadmap items include placement policies, GPU/DRAM/NVMe tier managers, benchmark suite, and security (mTLS/ACLs/encryption). For ML teams this means potential SLO gains, lower CPU load, and cost/space tradeoffs via policy-driven tiering—at the cost of requiring RDMA-capable networking for full benefit.

Loading comments...

loading comments...