Cloudflare Enters the Robots.txt Fray with a Content Signals Policy for AI Bots (www.searchengineworld.com)

0 points 1 day ago ago | visit original

🤖 AI Summary

Cloudflare announced a new Content Signals Policy for robots.txt that gives site owners a simple, machine-readable way to state how fetched content may be used by bots. It adds a human-readable comment block explaining three narrow signals—search (indexing/listing), ai-input (live grounding/RAG), and ai-train (training or fine-tuning)—and a single comma-delimited header line such as "Content-Signal: search=yes, ai-train=no" that sits alongside existing User-agent/Allow/Disallow rules. The policy does not change crawl permissions; omitting a signal means you neither grant nor deny that use. Cloudflare will inject the comment plus a default Content-Signal: search=yes, ai-train=no into managed robots.txt across 3.8M+ domains (free zones get comments only), and the spec is CC0 to encourage broader adoption. The move matters because it gives publishers a pragmatic, low-friction way to distinguish classic search indexing from AI uses (live answers and model training), supporting SEO strategies, licensing or pay-per-crawl models, and revenue protection. It’s a signal, not an enforcement mechanism—Cloudflare recommends pairing signals with WAF and Bot Management (examples: block known AI crawlers by user-agent or use allowlists for trusted search bots). Deploy via ContentSignals.org or Cloudflare’s managed robots.txt, monitor logs/traffic, and iterate—many publishers will likely keep search=yes and ai-train=no while deciding on ai-input.

Loading comments...

loading comments...