🤖 AI Summary
Yichao “Peak” Ji’s post distills practical lessons from building Manus, arguing that production AI agents should prioritize context engineering over fine-tuning. After iterating on their agent framework four times (what they call “Stochastic Graduate Descent”), Manus found the KV‑cache hit rate is the single most important production metric: agent loops produce very large prefix-to-output ratios (~100:1), so keeping prefixes stable (no timestamps), making contexts append‑only with deterministic serialization, and inserting explicit cache breakpoints drastically reduces latency and cost (cached tokens can be ~10x cheaper in Ji’s example). He also warns against dynamic tool deletion because any change near the front of context invalidates the cache and confuses the model.
Manus’ other core patterns are practical: mask unavailable tools by constraining token logits (using response prefill modes—Auto/Required/Specified in Hermes format) instead of removing definitions; treat the file system as persistent, restorable external memory (store URLs/paths, not full blobs) to avoid irreversible compression; and use “recitation” (e.g., a todo.md that’s updated each step) to bias recent attention and prevent goal drift. Finally, keep failed actions in context so the model updates priors, and beware few‑shot patterns that encourage blindly repeating past behavior. These methods collectively make agents cheaper, more robust, and easier to iterate—techniques that matter now as LLMs scale but agentic complexity explodes.
Loading comments...
login to comment
loading comments...
no comments yet