AI Agents Are Terrible Freelance Workers (www.wired.com)

🤖 AI Summary
Researchers from Scale AI and the Center for AI Safety released the Remote Labor Index, a new benchmark that tests frontier AI agents on real freelance tasks sourced from verified Upwork workers (jobs included graphic design, video editing, game development and administrative scraping). Agents were given job descriptions, project files and human example outputs; performance was scored by potential earnings. The best agent, Manus, along with Grok, Claude, ChatGPT and Gemini, collectively managed under 3% of the available work, earning $1,810 out of $143,991—showing current systems struggle to complete economically meaningful, multi-step freelance projects end-to-end. The result is a corrective to optimistic claims that AI will soon automate large swathes of white‑collar work (for example, OpenAI’s GDPval benchmark and public statements forecasting rapid coding automation). The study highlights concrete technical limits: fragile tool use and orchestration, poor long-term memory and continual learning, difficulty acquiring on‑the‑job skills and executing complex, multi-stage workflows. The authors acknowledge the benchmark isn’t exhaustive and that humans will likely use AI as productivity tools, not direct replacements. For the AI/ML community, the takeaway is pragmatic: advancing robust tool integration, persistent memory, error recovery and multi-step planning should be priorities before claiming broad economic substitution.
Loading comments...
loading comments...