Supporting our AI overlords: Redesigning data systems to be Agent-first (muratbuffalo.blogspot.com)

🤖 AI Summary
A Berkeley systems-group paper argues that LLM-driven agents will soon dominate database workloads and that their behavior—"agentic speculation"—looks nothing like human analysts: bursts of schema poking, partial aggregates, speculative joins, many redundant attempts and heavy rollbacks. Experiments on BIRD with DuckDB and models like GPT-4o-mini show success rates rise 14–70% with more queries, but fewer than 10–20% of subplans are unique; a multi-database case shows metadata exploration overlaps messily with partial/full queries and that simple grounding hints can cut queries by >20%. Those results make clear both the problem (flooding and repetition) and the opportunity (recognizable patterns to exploit). Technically, the paper proposes an "agent-first" stack: agents send probes (bundled SQL + natural-language briefs) to an agentic interpreter and a probe optimizer that prioritizes satisficing/approximate answers, multi-query optimization, and incremental/early-stop evaluation. A semantic agentic memory store (cached results, embeddings, column encodings) and a speculative, branch-aware transaction manager enable reuse and rollback-heavy workflows; the DB can also return proactive feedback (cost hints, schema nudges) and embed similarity operators for semantic discovery. Open questions remain about multi-tenancy, privacy, staleness, distribution, and whether we should instead train agents to be more schema-aware. The paper reframes DB design around LLM interaction, trading strict exactness for faster, shared, approximate interactions tuned to agent behavior.
Loading comments...
loading comments...