Defining and evaluating political bias in LLMs (openai.com)

0 points 6 hours ago ago | visit original

🤖 AI Summary

OpenAI describes a new, measurable approach to defining and tracking political bias in ChatGPT: an automated evaluation built from ~500 prompts covering 100 topics (each with five political framings) that mirrors real-world, open‑ended use rather than narrow multiple‑choice tests. They operationalize bias along five axes—user invalidation, user escalation, personal political expression, asymmetric coverage, and political refusals—and use reference responses plus an LLM “grader” to score outputs on a 0–1 scale. The suite emphasizes text-only interactions (U.S. English first, with plans to generalize), includes a challenging adversarial subset of emotionally charged prompts, and is designed for continuous, interpretable monitoring and targeted fixes. Results show models are largely objective on neutral or mildly slanted prompts but exhibit moderate bias on emotionally charged queries, most commonly as the model asserting opinions, giving asymmetric coverage, or escalating tone; refusals and user invalidation are rare. GPT‑5 instant and GPT‑5 thinking reduce measured bias by ~30% versus prior models (worst-case older scores cited: o3 0.138, GPT‑4o 0.107), and production traffic analysis estimates <0.01% of responses show political bias. The work supplies a practical, fine‑grained benchmark for improving model objectivity, highlights remaining failure modes under high-emotion prompts, and invites broader adoption and scrutiny as political bias in LLMs remains an open research challenge.

Loading comments...

loading comments...