Building Enterprise Agents 90x Cheaper with Prompt Optimization (www.databricks.com)

🤖 AI Summary
Databricks announced Agent Bricks’ automated prompt optimization toolkit (including a new optimizer, GEPA) and empirical results showing it can dramatically improve enterprise agent performance while slashing serving costs. On IE Bench — a realistic information-extraction suite with 100+ page documents, hierarchical schemas and >70 fields — GEPA-optimized gpt-oss-120b matches or exceeds closed‑source frontier models: it edges past Claude Sonnet 4 and Claude Opus 4.1 in accuracy while remaining roughly 20–22x and 90x cheaper to serve. Applying the same optimization to proprietary models further raises the ceiling (Claude Opus 4.1 +6.4 points, Sonnet +4.8), establishing new quality-cost Pareto frontiers for production deployments. Technically, prompt optimization here is an iterative, structured search that uses feedback signals and an “optimizer” LLM to mutate and select better prompts for a target “student” model. Databricks evaluated MIPROv2, SIMBA and GEPA (GEPA—combining language-based reflection with evolutionary search—performed best), and found using a stronger optimizer model (Claude Sonnet 4 → gpt-oss-120b) gave the biggest lift (+4.3 points versus baseline). GEPA’s search has higher runtime overhead (~2–3 hours, ~3× more LLM calls) but yields superior quality-cost tradeoffs; it matches or outperforms supervised fine-tuning while reducing serving costs (~20%) and can be combined with SFT. For enterprises, this means frontier-level IE accuracy can be achieved more cost-effectively using optimized open-source models in production.
Loading comments...
loading comments...