Zip-Bombs vs. Aggressive AI Crawlers: Defensive Tactics for Sites (jsdev.space)

0 points 11 hours ago ago | visit original

🤖 AI Summary

Websites are seeing massive, AI-driven scraping spikes as Retrieval-Augmented Generation (RAG) crawlers and scrapers aggressively harvest web content. Fastly data shows peaks up to ~40,000 requests/min and an ~87% year‑over‑year rise in scraper traffic in 2025; today roughly 80% of AI-bot traffic comes from AI crawlers. Real-world oversampling examples include ClaudeBot issuing millions of requests to single sites and GPTBot downloading terabytes from small sites. Problematic behaviours include rotating user‑agents, ignoring robots.txt, fetching from unexpected IP ranges, refusing compressed content (a sign of zip‑bomb probing), and high sustained request rates — all of which create real cost and reliability pressure on servers, CDNs and caches. Defenders are combining conventional mitigations (rate limits, CAPTCHAs, user‑agent/IP reputation, JS/browser checks) with heavier measures designed to shift cost back to the crawler: proof‑of‑work/client puzzles (Hashcash‑style SHA256 nonce challenges), fingerprinting heuristics, and even serving highly compressible “zip‑bomb” payloads that expand massively on decompression (examples include ~10 MB gzips expanding to ~10 GB). Tools like Anubis reportedly raise crawler compute costs and have broad adoption, but these tactics can be bypassed by distributed fleets, break legitimate tooling, carry legal and environmental risk, and harm third‑party infrastructure. Operators should prefer non‑destructive controls first, reserve aggressive tactics for measurable, persistent bot load (eg. >50% traffic), and document safeguards to avoid collateral damage.

Loading comments...

loading comments...