Reddit is winning the AI game (www.cjr.org)

🤖 AI Summary
Reddit has quietly become a major data supplier for AI companies, striking high-value licensing deals (reportedly ~$60M/year with Google and ~$70M/year with OpenAI) and emerging as one of the most-cited sources in AI-generated answers. Platform changes and a Google search algorithm update nearly tripled monthly visitors from 132M to 346M (Aug 2023–Apr 2024), and analytics firms found Reddit to be the top-cited domain for Google AI Overviews and Perplexity (and second for ChatGPT) between Aug 2024 and Jun 2025. To court publishers and lock in content, Reddit has rolled out publisher tools (AMAs, analytics, better embeds, and beta import/tracking features) and is testing Reddit Answers, a conversational search function powered by Google Gemini. For the AI/ML community this matters on several fronts: Reddit’s licensed, real‑time forum data is now both a strategic asset and a liability for model training and downstream search. The platform is tightening crawler access, suing alleged scrapers (Anthropic), and restricting archives to push licensing, while backing Really Simple Licensing (RSL) — a clearinghouse model for standardized payments and attribution akin to ASCAP/BMI. That shifts negotiation from opportunistic scraping to contract-driven access, enabling dynamic pricing tied to a site’s utility for AI answers. But it also raises risks: AI systems may prioritize forum content over originals, amplify low-quality or adversarial posts (“parasite SEO”), and inherit community-specific errors or misinformation—making careful dataset curation, provenance tracking, and licensing compliance more critical than ever.
Loading comments...
loading comments...