🤖 AI Summary
            Reddit has sued Perplexity and three data-scraping firms (SerpApi, Oxylabs, AWMProxy), alleging they engaged in “industrial-scale” theft of Reddit content to fuel AI training and Perplexity’s answer engine. The complaint claims Perplexity bought data from scrapers that intentionally evade protections (masking identities, proxying, scraping Google search results to bypass robots.txt) and continued increasing Reddit citations even after a cease-and-desist. Reddit points to an experiment where it made a post crawlable only by Google and says Perplexity reproduced that post within hours—evidence, Reddit alleges, that the company scraped Google SERPs rather than negotiating a license as some competitors have done.
The case matters because it targets the ecosystem that supplies human-generated training data to LLMs and search/answer engines: scrapers, resale of scraped datasets, and the buyers who rely on them. Technically, the suit highlights common evasion tactics (robots.txt circumvention, proxy networks, search-result harvesting) and frames them as “data laundering.” If Reddit prevails or secures broad remedies, AI developers may face stronger legal pressure to license content, adopt provenance controls, or rely on curated/paid datasets, shifting economics and engineering practices around data ingestion, model auditing, and compliance. Perplexity denies wrongdoing and frames the dispute as defending public access, so the lawsuit could set an important legal precedent for how copyright and web-scraping rules apply to model training.
        
            Loading comments...
        
        
        
        
        
            login to comment
        
        
        
        
        
        
        
        loading comments...
        no comments yet