Snag web pages like a polite robot with a browser (github.com)

0 points 21 hours ago ago | visit original

🤖 AI Summary

Snag is a new CLI utility that fetches web pages through a real Chromium-based browser and returns clean, token-efficient outputs designed for AI agents. By default it renders pages and outputs Markdown (claimed to use ~70% fewer tokens than raw HTML), but also supports raw HTML, plain text, PDF and PNG screenshots. It handles JavaScript, lazy loading and dynamic SPA content (with --wait-for selector), can run headless or open a visible browser, and can auto-detect or launch Chromium/Chrome/Edge/Brave. Install options include Homebrew, go install or building from source. Its standout features for the AI/ML community are session-aware authenticated fetching (reuse a logged-in browser session or dedicated profile), tab management (--list-tabs, -t, --all-tabs), programmatic timeouts/verbose debugging, and safe handling of binary outputs (auto-generated timestamped filenames). That makes snag useful for building knowledge bases from private docs, feeding cleaned docs into LLM pipelines, capturing dynamic docs for analysis, and CI-friendly documentation snapshots. Because it integrates with real browser DevTools and outputs agent-friendly Markdown, snag simplifies reliable, token-efficient web scraping for downstream model consumption and automated agent workflows.

Loading comments...

loading comments...