AI agents failed at real-world consulting tasks — but Mercor's CEO says they're still on track to replace consultants (www.businessinsider.com)

0 points 130 days ago ago | visit original

🤖 AI Summary

New research from Mercor reveals that while AI agents currently struggle with real-world consulting tasks, their rapid improvement suggests they could replace human consultants in the near future. The study, utilizing the APEX-Agents benchmark, indicates that leading AI models, including OpenAI's GPT 5.2 and Anthropic's Opus 4.6, achieved task completion rates of only 23% and 33% respectively on their first attempts. The AI models excelled at research and data analysis but found it challenging to manage long-horizon tasks that typically require human judgment and multi-step planning. Mercor's CEO, Brendan Foody, remains optimistic, predicting that with ongoing enhancements in training, these models could reach a 50% success rate by year's end. He emphasized that the consulting sector is ripe for disruption as AI agents begin to handle complex tasks previously thought too nuanced for automation. Firms like McKinsey are already integrating AI into their operations, with reports indicating a significant portion of their workforce is now AI-driven. As the landscape of consulting evolves, the next iteration of Mercor's benchmarks aims to evaluate AI's impact on the entire professional services value chain, potentially transforming the future of consulting jobs.

Loading comments...

loading comments...