Dynamic Denial of Crawlers (overengineer.dev)

🤖 AI Summary
A site operator observed a mysterious, low-volume but massively distributed crawling pattern over ~206 hours: about 2.7M requests (≈3.6 req/s average) with distinct plateaus at ~5/10/15 req/s. During peaks there were ~1.3M distinct client IPs (many making only 1–2 requests), requests fetched full page responses (so not a classic DDoS), and crawlers followed redirects. User‑Agent strings are common browser signatures but likely spoofed; traffic comes from ~176 countries and ~15k ISPs, dominated by Brazil, Vietnam, Argentina, Ecuador and Indonesia and largely residential/mobile networks. Top UAs logged and raw counts were provided, and the operator found no coherent URL pattern—just random user content and login redirects—suggesting wide, indiscriminate scraping. This matters because it looks like a global, noisy crawler/botnet probably harvesting web content for LLMs or other large-scale scraping tasks, and it’s designed to evade simple defenses (IP diversity, UA spoofing, low per‑node rates). Operationally it wastes bandwidth, pollutes analytics, and makes automated defenses harder; technically it underlines the limits of IP/UA-based blocking and the need for behavioral detection, anomaly rate‑limiting, challenge pages, and coordinated telemetry sharing. A security researcher consulted by the author also thinks it’s likely an LLM-oriented crawling network—highlighting growing tension between content ops and large‑scale model data collection.
Loading comments...
loading comments...