Run Agents Twice (futuresearch.ai)

🤖 AI Summary
Researchers have discovered that running large language model (LLM) agents twice can significantly enhance their forecasting accuracy. In a study involving Claude Opus 4.6, the initial runs on the BTF-2 benchmark produced a consistent score of 0.130. However, when researchers averaged the outcomes of multiple agents, including Gemini 3.1 Pro and GPT-5.4, the score improved to 0.125, indicating a roughly 5% increase in accuracy across the board. This "wisdom of the crowds" technique is particularly beneficial in forecasting tasks where agents' varying approaches can capture different aspects of a problem, allowing for a more comprehensive analysis when their findings are synthesized. The significance of this approach lies in its cost-effectiveness and efficiency. Running two agents incurs a modest expense (approximately $0.55 per question), making it accessible for users with subscription plans or in contexts where repeated queries are feasible. By leveraging the diverse paths that individual agents take during research, this strategy minimizes random errors and uncovers insights that one run may overlook. Ultimately, this method offers a straightforward way to enhance model performance without necessitating complex upgrades or increased computational resources, thus encouraging practitioners to prioritize simple solutions before investing in more sophisticated improvements.
Loading comments...
loading comments...