The Remote Labor Index: Measuring the Automation of Work (scale.com)

🤖 AI Summary
Scale and the Center for AI Safety launched the Remote Labor Index (RLI), the first public benchmark and leaderboard that tests whether AI agents can autonomously complete paid freelance projects end-to-end. The RLI uses 240 real-world projects across 23 domains (median human time ≈11.5 hours, median value $200) so each task is a complete, economically meaningful unit. Results show the best agent (Manus) successfully automated just 2.5% of projects — agents earned $1,720 versus $143,991 for the original human contractors — providing a concrete, data-driven baseline for tracking AI’s real-world impact on work. The RLI also diagnoses why automation fails: 45.6% of failed submissions were low quality, 35.7% produced incomplete/malformed deliverables, 17.6% had file integrity problems, and 14.8% showed inconsistencies; many failures combined several of these faults (e.g., ignoring an input file and submitting amateurish, mismatched outputs). Successful tasks skewed toward creative generative work (images, audio, simple data/writing), while agents struggled with complex editing, tool use, and multi-step briefs. The takeaway: current models are strong at creative generation but lack the reliability and procedural competence for broad professional automation — suggesting near-term impact will be augmentation not mass job displacement. The RLI provides an economically grounded yardstick to measure future progress in capability, reliability, and scale.
Loading comments...
loading comments...