We let OpenAI’s “Agent Mode” surf the web for us—here’s what happened (arstechnica.com)

0 points 7 days ago ago | visit original

🤖 AI Summary

OpenAI this week unveiled Atlas, a web browser with built-in ChatGPT integration that lets you “chat with a page,” and — in a prominent new addition — a preview “Agent Mode” that can actively browse for you by clicking, scrolling and reading across multiple tabs. Agent Mode isn’t a novel research idea (OpenAI previewed a web-browsing Operator agent in January and a generalized ChatGPT agent in July), but folding these agentic capabilities into a consumer-facing browser signals OpenAI’s push to put automated web action in front of everyday users. The article’s hands-on test plan is straightforward: give Agent Mode a task-oriented prompt, watch it interact with live pages, and score its success on a 1–10 scale. The immediate technical implication is practical web automation driven by a large language model that interprets page content and issues DOM-like actions — a step toward delegating tedious internet work (data collection, form-filling, multi-tab research). The author’s first trial (get a high score on the 2048 web game) is a small functional test of perception and control; later experiments aim to probe productivity gains and failure modes. For the AI/ML community this matters because mainstreaming agentic interfaces raises questions about reliability, safety, and the robustness of LLM-driven UI control, while also offering a fertile use case for research into grounding, action planning, and human-agent workflows.

Loading comments...

loading comments...