Improving Agent with Semantic Search (cursor.com)

0 points 1 day ago ago | visit original

🤖 AI Summary

Cursor announced that adding semantic search to its coding agents—alongside traditional tools like grep—substantially improves their ability to understand and modify large codebases. Using a custom embedding model and fast indexing pipelines, semantic retrieval raised question-answering accuracy by an average of 12.5% (6.5%–23.5% depending on the model), produced code changes that are more likely to be retained, and reduced the number of iterations users need to reach correct solutions. An A/B test showed code retention rose 0.3% overall and 2.6% for repositories with 1,000+ files, while dissatisfied follow-up requests increased by 2.2% when semantic search was disabled. Technically, Cursor built an evaluation suite (Cursor Context Bench) to compare agents with and without semantic search across popular coding models (including their Composer model), and found consistent gains for semantic retrieval. Their embedding model is trained from agent session traces: when agents search and open files during real tasks, those traces are used to generate retrospective rankings via an LLM; the embedding model is then trained to align similarity scores with those LLM rankings. The result is a retrieval system tuned to how agents actually solve problems, and the company reports the best outcomes come from combining semantic search with grep rather than replacing it.

Loading comments...

loading comments...