A New Wave: From Big Data to Small Data (www.fabi.ai)

🤖 AI Summary
At Small Data SF, organizers and speakers from MotherDuck, Turso and Ollama launched a practical counterpoint to the “Big Data” era: more often than not enterprises don’t need massive distributed stacks. The Small Data manifesto argues that local machines (or single large cloud nodes) plus modern, efficient libraries can handle the vast majority of real-world analytics. Evidence cited includes Redshift fleet analysis showing median queries scan ~100 MB (99.9th percentile ~300 GB), modern laptops with many cores and tens of GBs of RAM, and cloud VMs (e.g., 32 vCPUs at low hourly cost). Tools like DuckDB and columnar engines, plus Polars (reported ~10× faster than pandas), let analysts process gigabytes easily on a single node. For the AI/ML community this matters because it shifts engineering and tooling priorities: fewer heavyweight distributed systems, more focus on fast, local ad-hoc analysis workflows and AI-assisted exploration. AI is making one-off, patchy-data analyses far easier, reducing the need for rigid BI semantic layers for many use cases. Practically, teams can lower cost and complexity by embracing columnar local compute, semi-local compute over object stores (Iceberg-style storage), and interactive AI/SQL tooling—while still reserving massive distributed infrastructure for true scale outliers. Expect a migration over the next 5–10 years toward small-data-first analytics augmented by AI.
Loading comments...
loading comments...