Smuggled Intelligence (every.to)

🤖 AI Summary
AI capabilities are clearly advancing: GPT-5 Pro recently produced a clean proof for Yu Tsumura’s 554th abstract-algebra problem in 15 minutes and reportedly contributed a key step to a Scott Aaronson proof. OpenAI’s new GDPval benchmark, which frames expert-level tasks across 44 occupations (e.g., a wholesale sales analyst auditing an Excel file with explicit business rules and column definitions), found GPT-5 reached or exceeded human professional performance 40.6% of the time and Anthropic’s Claude Opus 4.1 did so 49% of the time. Those results spawned headlines claiming AI is catching up to human experts. But the deeper takeaway is that much of this capability depends on “smuggled intelligence”: humans choosing the tasks, designing detailed prompts, specifying data structures and deliverables, and evaluating outputs. Benchmarks require careful curation and prompt engineering—often encoding domain knowledge and boundary conditions that make tasks tractable for models. That means AI is likely to complement rather than replace many roles: new human work will focus on scoping problems, crafting prompts, judging ambiguous outcomes, and managing shifting contexts. Practically, organizations should invest in AI-native teams and workflows that leverage models’ strengths while preserving human oversight, evaluation, and context inference.
Loading comments...
loading comments...