What AI doesn't know: we could be creating a global 'knowledge collapse' (www.theguardian.com)

0 points 3 hours ago ago | visit original

🤖 AI Summary

A personal family story — a man’s tongue tumour that inexplicably shrank after traditional herbal treatment — frames a broader warning: generative AI (GenAI) is amplifying pre-existing digital knowledge imbalances and risks accelerating a global “knowledge collapse.” The author, a responsible-AI researcher, argues that because large language models are trained on what’s already digitised, they privilege Western, institutional epistemologies while marginalising oral traditions, embodied practices and “low‑resource” languages. That matters not only for fairness and representation, but for the resilience of human knowledge — local ecological know-how, Indigenous building techniques, and community water-management practices often exist only orally and are being left out of the AI record. Key technical facts underline the problem: public corpora like Common Crawl are heavily skewed (English ≈45% of content despite being spoken by ~19% of people; Hindi ≈0.2% of the crawl despite ~7.5% speakers; Tamil ≈0.04% though spoken by ~86M people). Around 97% of the world’s languages are “low‑resource,” and a 2020 study found 88% face severe neglect in AI. For ML practitioners this implies concrete actions: diversify and curate training data, fund community-led digitisation and oral-to-text initiatives, develop multimodal corpora, build localized evaluation metrics, and engage epistemically with communities to avoid entrenching cultural hegemony and losing vital, place‑based knowledge.

Loading comments...

loading comments...