🤖 AI Summary
A modern take on Moravec’s paradox argues that the next frontier — and biggest shortfall — for AGI is not math or coding but computer-use agents (CUAs): software that can act autonomously inside real apps and websites. In just two years CUAs moved from niche research to a battleground for OpenAI, Google, Anthropic, Microsoft and startups, yet practical performance lags expectations. Benchmarks (OSWorld, WebArena, Online‑Mind2Web) show typical end‑to‑end task success around 30–60%, far below the ≳95% reliability many real-world flows require. This matters because pretraining on static internet artifacts is running out of signal; CUAs offer the next Internet-scale data source by exposing the cognitive processes (planning, iterative problem‑solving, error recovery) that produced those artifacts. Digital agents also avoid robotics’ physical constraints, promising huge economic value by automating many high‑friction, previously uneconomic tasks.
Technically, computer use is far deeper than “clicks and typing.” CUAs must infer hidden program semantics behind GUIs, maintain long-horizon hierarchical plans, do perceptual grounding and state estimation, leverage episodic memory and tacit procedural knowledge, and handle extreme idiosyncrasy (naming drift, divergent UI grammars, flow variation). These realities make monolithic next‑token models brittle; researchers are pursuing richer RL gyms and universal reward modeling, but the argument here is that scalable solutions will need continual on‑the‑job learning, modular/adaptive systems, and datasets capturing the messy cognition of real users — not just polished web outputs. Solving CUAs could therefore be the critical pathway toward robust, generalizable AGI.
Loading comments...
login to comment
loading comments...
no comments yet