TernFS – An exabyte scale, multi-region distributed filesystem (www.xtxmarkets.com)

🤖 AI Summary
XTX has open‑sourced TernFS, its in‑house distributed filesystem built to serve exabyte‑scale ML and trading workloads. Designed starting in 2022 and in production since 2023, TernFS now houses >500PB across 30,000 HDDs, 10,000 flash drives and three datacenters, serving multiple TB/s at peak with zero data loss reported. It’s hardware‑agnostic, requires only C++/Go and a few libraries (notably RocksDB), exposes a TCP/UDP API plus a Linux kernel module, and intentionally targets large, mostly immutable files common in ML pipelines. Architecturally, metadata is split into 256 logical shards (each with a leader + 4 followers) backed by LogsDB (a Raft‑like consensus) and RocksDB; directories are assigned to shards round‑robin, simplifying horizontal scaling without rebalancing. Cross‑shard operations (directory create/remove/move) are handled by a stateful Cross‑Directory Coordinator (CDC), which boosts parallelism but is a throughput bottleneck for directory ops. File contents are chunked into blocks served by per‑drive block services (Go processes writing to local FS), and a registry provides service discovery via IPv4. Multi‑region replication is asynchronous with a metadata primary per location (writes from secondaries incur extra latency); proactive and on‑demand content replication ensure eventual global convergence. Key tradeoffs: immutable files, poor fit for tiny files (median size ~2MB), constrained directory operation throughput, and no built‑in permissions. The open‑source release gives the community a battle‑tested reference for large, geo‑distributed storage tuned for ML workflows.
Loading comments...
loading comments...