Cloudflare goes after Google's AI Overviews with a new license for 20% of the web (www.businessinsider.com)

0 points 3 hours ago ago | visit original

🤖 AI Summary

Cloudflare announced a new Content Signals Policy that extends robots.txt-style controls to explicitly manage how AI systems access and use web content. Built on Cloudflare’s existing bot management (used by ~3.8M domains, roughly 20% of the web), the policy lets site owners signal separate permissions for traditional search, AI input/inference, and AI training — effectively creating a new license that can block or allow bots (including those feeding Google’s AI Overviews) and can be enforced automatically at the edge. Cloudflare says these signals have “legal significance” and can be used to block crawlers that scrape material for AI summarization or model training. For the AI/ML community this is significant because it pushes back against large-scale web scraping practices that power answer engines and pre-training datasets, and it challenges Google’s advantage from using a single crawler for both search and AI features. If major sites opt out, companies must either stop crawling those domains or separate their crawling infrastructure and compliance for search versus AI uses. That creates practical and legal ripples for dataset curation, model training pipelines, and retrieval-augmented systems — raising the bar for responsible data collection and potentially changing how training corpora and web-based retrieval are assembled.

Loading comments...

loading comments...