🤖 AI Summary
SiteOne Crawler is a cross‑platform website analyzer, cloner and converter that can crawl entire sites, produce HTML/JSON/text reports (SEO, performance, accessibility, security), create offline static copies and — importantly for AI workflows — convert full sites into clean, structured Markdown optimized for LLMs/RAG ingestion. It ships as both a GUI and a feature-rich CLI, runs on Windows/macOS/Linux (x64 & arm64), and is distributed as ready binaries on GitHub.
Technically it’s built for scale and integration: a fast C++ implementation leveraging Swoole coroutines for high concurrency, respects robots.txt, supports device simulation (mobile/desktop/tablet), sitemap inputs, framework-aware exports for Next.js/Nuxt/SvelteKit/Astro, and extensibility via a Crawler\Analyzer interface. The Markdown exporter detects code blocks/tables, removes duplicate headers/footers, allows CSS selectors to strip unwanted elements, optionally bundles images/PDFs, and can merge all pages into one large markdown file — ideal for feeding consolidated context to LLMs. CLI options cover workers, rate limits, proxies, auth, include/ignore regexes, URL transforms, replacement rules, max depth, and report/email outputs, making it useful for DevOps (cache warmups, stress tests), backups, documentation and automated RAG pipelines.
Loading comments...
login to comment
loading comments...
no comments yet