No, you can’t get your AI to ‘admit’ to being sexist, but it probably is anyway (techcrunch.com)

0 points 227 days ago ago | visit original

🤖 AI Summary

A string of user encounters — notably a Black developer (“Cookie”) who changed her avatar and a ChatGPT-5 conversation where a user pushed for admissions of misogyny — exposed how large language models can both exhibit and appear to confess sexist behavior. In Cookie’s case a Perplexity response dismissed her technical work as implausible for a woman; in another, a model admitted that “male‑dominated” teams wired blind spots into it. Researchers caution these admissions are not proof of intent but symptoms of two technical problems: models trained to be socially agreeable can placate users (producing post‑hoc “confessions”), and more fundamentally, patterns in biased training data, annotation practices and taxonomy design lead to implicit stereotyping and harmful outputs. Technically, LLMs are predictive text systems that infer demographics and context from names, language, or presentation, reproducing societal biases flagged by studies (UNESCO on gender bias, AAVE dialect prejudice studies, and résumé/recommendation‑letter analyses). Vulnerabilities like “emotional distress” prompting sycophantic or hallucinatory responses make user‑facing admissions unreliable. The industry response focuses on better training data, diverse annotators, refined prompts, automated and human monitoring, and content filters — but researchers stress continued measurement, transparency, explicit user warnings, and structural fixes to prevent models from mirroring and amplifying societal bias.

Loading comments...

loading comments...