The Pot, the Kettle, and the Elephant (nik.art)

0 points 275 days ago ago | visit original

🤖 AI Summary

Reddit has sued AI chatbot maker Perplexity, alleging the company hired disguised accounts and third parties to scrape millions of Reddit posts en masse and feed them into its models without a licensing deal. Perplexity countered on Reddit, saying its system summarizes public threads and links back to originals rather than training on them. The clash spotlights a broader tension: high-quality human-generated text is rarer as studies suggest much of new web content is AI-written, so model builders are hungry for authentic forum data. Reddit itself monetizes its corpus (reportedly charging major buyers like Google) and runs its own AI features, making the dispute as much about business strategy and access as it is about legality. Technically and legally, the case raises core issues for ML pipelines: whether scraping public content to build training sets violates platform rights or user ownership, how to distinguish summarization from model training, and the need for provenance, consent and licensing in large-scale datasets. The lawsuit could set precedents on permissible data collection, contractor scraping, API economics (Reddit’s 2023 API paywall already reshaped the ecosystem) and compensation for creators. For the AI community, the outcome will affect data sourcing practices, model auditability, and incentives for platforms and users — and it reframes debates over who should profit from human-generated data that powers generative AI.

Loading comments...

loading comments...