What Makes 5% of AI Agents Work in Production? (www.motivenotes.ai)

0 points 15 hours ago ago | visit original

🤖 AI Summary

At a San Francisco panel called Beyond the Prompt (600+ attendees), engineers from Uber, WisdomAI, EvenUp and others argued that the hard work in production AI isn’t prompting — it’s context selection. Panelists said roughly 95% of agent deployments fail not because models are weak, but because scaffolding — retrieval/semantic layers, memory orchestration, governance, and model routing — is missing or naive. Practical takeaways: RAG often suffices but must be curated (index everything → noise; index too little → starvation); treat context as versioned, testable feature engineering (selective pruning, context validation, embedding metadata); and combine semantic vector search with metadata filters (timestamps, doc type, access policies). Text-to-SQL failures highlight the need for business glossaries, constrained templates, and validation layers rather than blind schema dumps. Operationally, teams must design memory as an architectural layer (user/team/org scopes, composable and versioned), implement lineage and row-level access so outputs vary by permission, and orchestrate multiple models by task, latency, cost or regulation (local models for trivial queries, frontier models for complex reasoning, judge+responder fallbacks). Missing primitives — context observability, portable/composable memory, domain-aware DSLs and latency-aware UX — are where builders can win. The common pattern across the successful 5%: human-in-the-loop workflows, auditable context pipelines, and adaptive model routing that together create trustworthy, scalable agent behavior.

Loading comments...

loading comments...