Speed Matters: How We Achieve the Fastest Web Agent (browser-use.com)

0 points 5 hours ago ago | visit original

🤖 AI Summary

Browser Use (BU) 1.0 was announced with a demonstrated 65.7% accuracy on OnlineMind2Web — matching Gemini 2.5 Computer Use and beating other baselines — but its real headline is speed. By adapting evaluation to use both screenshots and DOM state (since DOM-based agents can read non-visible elements), BU 1.0 completes tasks in 68 seconds on average (≈3s per step), far faster than Gemini 2.5 (225s) and other top models. In a concrete example, BU completed “find the most recent PR” on a GitHub repo in 15s versus Gemini’s 75s, putting agent latency into the range of human workers. The team credits four optimizations: (1) a prompt structure that places agent history before fresh browser state so KV cache hits reduce latency and cost; (2) capturing screenshots only when necessary (each screenshot adds ≈0.8s of inference time); (3) a targeted extract tool that queries page markdown with a separate LLM call to retrieve only relevant tokens for long pages, avoiding context bloat; and (4) an extremely concise action space to minimize output tokens (measured: 1,000 input tokens ≈29.1ms vs 10 output tokens ≈62.6ms; output tokens are ~215× costlier per unit time). BU 1.0 is available via Browser Use 0.8.0 and an LLM gateway, with pricing that favors input and cached tokens over costly output tokens — a practical shift toward DOM-first, low-latency web agents.

Loading comments...

loading comments...