X tells ChatGPT and Claude no – only Grok eats (x.com)

0 points 3 hours ago ago | visit original

🤖 AI Summary

X’s robots.txt has been tightened: the file explicitly allows limited endpoints for Googlebot and Facebook’s preview crawler but places broad blocks on many other agents (including Bingbot, Discordbot and various extended/third‑party bots) and ends with a blanket "User-agent: * Disallow: /" that prevents general crawling. The file also sets a 1s crawl-delay, marks some paths Noindex (e.g., /i/u) and calls out blocking indexing of links in notification emails, while still publishing sitemaps for x.com/twitter.com. For the AI/ML community this is significant because robots.txt is the primary signaling mechanism websites use to opt out of automated scraping; by selectively permitting only a few crawlers X is effectively throttling open-web harvests of its real‑time social stream. That complicates how models like ChatGPT or Claude could gather X content for training unless they obtain explicit access or rely on other sources, and it creates an advantage for xAI’s Grok if X grants it privileged ingestion (e.g., firehose or API). Technically, robots.txt is voluntary—respecting it is a legal/compliance best practice rather than a technical enforcement—and it won’t stop content obtained via APIs, user uploads or scraping that ignores the file, but it does raise the bar for lawful, large‑scale data collection and may accelerate licensing deals or access-based model differentiation.

Loading comments...

loading comments...