The Atlantic's AI bot blocking strategy (digiday.com)

0 points 21 hours ago ago | visit original

🤖 AI Summary

The Atlantic has implemented a data-driven “scorecard” to decide which AI crawlers may access its site: only bots that demonstrably drive referral traffic or subscriptions get through. Using Cloudflare’s bot controls and a custom dashboard, CEO Nick Thompson and CPO Gitesh Gohel track hits from AI platforms (Anthropic, ChatGPT, Perplexity, Mistral, etc.), flag headless/third‑party scrapers, and block those that provide no measurable value — including one crawler that recrawled the site 564,000 times in a week. The publisher keeps its threshold flexible (e.g., 1,000 subscribers would represent ~$80k in annual revenue) and retains a licensing deal with OpenAI while blocking non‑performing crawlers to push AI companies toward paid access or negotiation. The move matters because publishers are trying to reclaim leverage over content used to train LLMs and to avoid fueling competitors without compensation. Technical levers include robots.txt, Cloudflare’s audit/define/enforce workflow, and the new Content Signals Policy (to request limits on using scraped content), but enforcement limits remain—most notably with Google. Google’s “Extended” AI crawler is tied to Search AI Overviews, so blocking it can hurt search indexing; compliance with Content Signals is currently voluntary and hard to verify. Industry telemetry backs the concern: DataDome reports a 4x rise in AI traffic (Q1–Q3 2025) and some agents issuing billions of requests without returning traffic, underscoring why publishers are adopting selective, metrics‑driven blocking.

Loading comments...

loading comments...