One-Third of the Internet Is Bots Now (www.vice.com)

🤖 AI Summary
Cloudflare reports that roughly one-third of all internet traffic now comes from bots, and cybersecurity firm Imperva estimates that automated traffic may be approaching half. Much of this activity is driven by large-scale scrapers and “ingestion” bots run by AI companies and data aggregators (and by malicious actors), which crawl websites to collect text, images, and other assets for training models. The result isn’t just more automated requests — entire pockets of online interaction increasingly look like algorithms talking to algorithms, with tiny attribution snippets or no clear provenance for the content being surfaced. For the AI/ML community this matters on multiple levels: training-data provenance and quality are being undermined as models ingest content that is already derivative or generated by other models, creating reinforcement loops and signal degradation (the so-called “copies of copies” problem). That raises risks of model drift, poorer factuality, and amplified biases, and it also erodes the economic incentives for human creators whose work is being scraped without clear compensation. Technical and policy responses include stricter crawler controls, dataset provenance tracking, watermarking/generated-content detection, rate-limiting and better attribution standards — all crucial if we want robust, diverse training corpora and to avoid accelerating the “dead internet” feedback loop.
Loading comments...
loading comments...