State media control influences large language models
Recent studies reveal that government control over media can significantly influence the output of large language models (LLMs). Research shows that LLMs trained in languages from countries with restricted media freedom tend to reflect pro-government sentiments, suggesting that state-controlled narratives are embedded within their training data. For instance, a multi-part case study on Chinese state media indicates that such content is prevalent in LLM training datasets, leading to more favorable responses about Chinese political institutions when prompted in Chinese compared to English.
This phenomenon raises critical implications for the AI/ML community, as it highlights the potential biases ingrained in LLMs due to their training sources. The findings suggest that governments may exploit their media control to shape AI outputs, reinforcing the importance of transparency and accountability in model development. As reliance on LLMs for information grows globally, the strategic use of biased training data by state actors could pose risks for information integrity and democratic discourse. The study calls for a reevaluation of how LLMs are trained and regulated to ensure they provide balanced and accurate information across diverse linguistic and political contexts.