Show HN: DSPy on a Pi: Cheap Prompt Optimization with GEPA and Qwen3 (leebutterman.com)

🤖 AI Summary
A Show HN case study demonstrates cheap, effective prompt optimization for chat-to-SQL on a Raspberry Pi using DSPy, GEPA, synthetic data and small Qwen3 models. Starting from a tiny production model (Qwen3 0.6B or Qwen3 4‑bit) and a simple starter prompt, the team generated synthetic training examples (via a large model), ran the production model on those inputs, scored SQL outputs by executing them against a read‑only paper_authorships table (rows/columns match, timeouts = failures), then used a midsize dense model (Qwen3 4B Thinking 2507 at 4‑bit) to iteratively rewrite and merge prompts. GEPA samples execution traces, reflects on failures, proposes prompt edits, and explores the Pareto frontier of prompt variants; 6–18 full dataset evaluations are typical. In under 16 hours on a Pi, this pipeline boosted success from 7.3% to 28.5%. By contrast the larger 4B model starts ≈60% and can exceed 85% with further refinement. Why it matters: this shows automated, data-driven prompt evolution can materially improve small on‑device LLMs, making low-cost, privacy-friendly local deployment for structured‑query tasks feasible. Key technical takeaways: separate task signature (DSPy) from prompt text, generate and sanity‑check synthetic labels with a large model, score on grounded execution (not just token overlap), prefer dense midsize models for refinement (Mixture‑of‑Experts unusable on slow SD storage), and use GEPA’s reflection+merging strategy to discover complementary prompt heuristics. The approach generalizes to other sub‑10B models and edge hardware, lowering the barrier for production LLM services without needing massive inference footprints.
Loading comments...
loading comments...