Remote Labor Index: Measuring AI Automation of Remote Work (arxiv.org)

🤖 AI Summary
Researchers introduced the Remote Labor Index (RLI), a multi-sector benchmark made of real-world, economically valuable projects designed to measure AI agents’ end-to-end ability to automate remote work. Unlike isolated academic tasks, RLI evaluates complete project workflows—planning, tool use, information retrieval, and execution—so success requires finishing tasks to an applied standard. Current agents perform near the floor on the RLI: the top system attained only a 2.5% automation rate (the fraction of projects an agent can fully complete autonomously), highlighting a large gap between performance on standard research benchmarks and real-world economic automation. The RLI is significant because it creates an empirical, comparable baseline for tracking AI’s labor impacts across sectors and time, informing stakeholders from policymakers to product teams. Technically, the results imply that advances in model knowledge and reasoning do not yet translate into reliable long-horizon orchestration, robust tool integration, or dependable error handling needed for end-to-end work. By focusing on project-level outcomes, RLI shifts evaluation toward practical capabilities and exposes priorities for research and deployment: improving multi-step planning, tool chaining, contextual grounding, and safety/verification mechanisms. The index can guide investment, risk assessment, and policy decisions as AI systems mature.
Loading comments...
loading comments...