Achieving 1.2 TB/s Aggregate Bandwidth by Optimizing Distributed Cache Network (juicefs.com)

🤖 AI Summary
JuiceFS Enterprise Edition 5.2 ships a suite of network and I/O optimizations for its distributed cache that pushed aggregate read bandwidth to 1.2 TB/s on a 100-node cluster of 100 Gbps GCP instances and drastically cut CPU overhead (client CPU down >50%, cache-node CPU to ~1/3). That result effectively saturates TCP/IP NIC bandwidth in large-scale deployments—critical for ML training and inference workflows where many clients repeatedly access the same “hot” data—and signals that careful software-level network engineering can unlock much higher real-world throughput without needing exotic interconnects. The improvements are practical and technical: in Go, they implemented connection multiplexing with dynamic connection scaling and small-packet merging (sender batches up to 4 KiB) to reduce per-connection limits and syscalls; they tune epoll behavior by setting SO_RCVLOWAT (example: 512 KiB) to cut edge-triggered wakeups ~90% (to ≈40k/sec), yielding stable CPU usage (~1 core per GB/s). Zero-copy transfers use splice/sendfile paths to keep data in kernel space and avoid copies, and CRC overheads are reduced by reusing on-disk 32 KiB-segment CRCs merged via a lookup-table into a single packet CRC so the sender skips recomputation. Together these measures improve NIC utilization and reduce network-stack CPU pressure; JuiceFS plans RDMA support next to exploit 200/400 Gb links further.
Loading comments...
loading comments...