How to reach 100M ops/sec with a REST API and Python on AWS (www.rondb.com)

🤖 AI Summary
RonDB’s team demonstrated a single-cluster REST API pipeline that sustained over 100 million primary-key lookups/sec on AWS — 104.5M/sec for records with five integer features and 96.4M/sec for ten-feature records — using Python clients (Locust) issuing batched REST requests. They reported dramatic cost advantages versus DynamoDB and latencies suitable for real-time AI: in a reduced-load latency test (61.5M/sec) average latency was 1.93 ms, p95 2 ms and p99 3 ms. The tests delivered up to 125 Gbit/s (15.6 GB/s) of JSON to clients and used 4,221 active client threads with 150-row batch requests (100-row batches for latency runs). The engineering work combined system-level tuning, a new C++ REST API using simdjson (≈30× faster JSON parsing), and careful AWS provisioning. The cluster used 6 data nodes (64 vCPUs each), 36 REST servers (16 vCPUs), and Graviton4 instances for best throughput; total partitioning was increased to 384 partitions to remove table-level contention. Key optimizations: raise REST server threads (16→64), disable CPU-heavy compression, tune REST-to-cluster connections (16 vCPU REST nodes with 2 cluster connections → ~3.5M lookups/sec each), and use Network Load Balancers across AZs to avoid per-AZ NIC limits. Practical bottlenecks noted include per-instance network caps (c8g.16xlarge = 30 Gbit/s) and client/tool scaling; overall this shows that a well-tuned feature-store REST stack can deliver millions of low-latency batch fetches per second for online recommendation and inference systems.
Loading comments...
loading comments...