🤖 AI Summary
A recent benchmark using MemBench (2025) and gpt-5-nano compared a long-context baseline to two popular “memory” systems—Mem0 (vector) and Zep (Graphiti)—on 4,000 conversational cases. The baseline (no fancy memory) achieved 84.6% precision with 7.8s average latency and $1.98 total cost per 4k cases. Mem0 dropped to 49.3% precision, 154.5s latency, and $24.88 cost; Zep similarly delivered 51.6% precision but exploded latency and cost (partial run aborted after 1,730 cases averaged ~1,028 LLM calls per case, totalling ~1.17M tokens per case and ~$152.6). The author open-sourced the harness used to run these tests.
The root cause is an “LLM-on-Write” architecture: every incoming message spawns background LLM jobs (summarization, fact extraction, contradiction checks) and graph traversals, producing N+1 and recursive inference storms. That design multiplies latency, cost, and non-deterministic write-time hallucinations—corrupting stored facts before retrieval. The takeaway for ML engineers: there is no universal memory—semantic memory (fuzzy, long-term preferences) and working memory (lossless, temporal agent state) have different requirements. Don’t substitute semantic memory systems for execution-state storage; instead, treat them as distinct systems to avoid prohibitive cost, latency, and reliability failures.
Loading comments...
login to comment
loading comments...
no comments yet