How 'overworked, underpaid' humans train Google's AI to seem smart (www.theguardian.com)

🤖 AI Summary
In spring 2024 dozens of contracted workers — journalists, teachers and PhD holders hired through Hitachi’s GlobalLogic and other vendors — found their jobs were less writing and more policing: rating, editing and moderating outputs from Google’s chatbots (Gemini series) and AI Overviews. Workers like Rachael Sawyer report being asked to screen traumatic text and images, judge responses for factuality and groundedness, “stump” models, and even enter domain-specific content under tight timers. GlobalLogic split staff into generalist raters (~$16/hr) and “super raters” (~$21/hr), scaling to almost 2,000 English-language raters in the US. Staff say guidance shifted frequently, time limits tightened from 30 to 10–15 minutes, and a December rule barred skipping unfamiliar medical prompts — all while mental-health supports and informed consent were lacking. Technically, these raters form a middle layer of the AI pipeline: their judgments on truthfulness, safety and sensitivity are used as aggregated quality signals to curb hallucinations and harmful outputs, but Google says these ratings don’t directly update models. The system faces problems — inconsistent guidelines, social biases in consensus meetings, and throughput pressures that may trade quality for speed — illustrated by public failings (e.g., AI Overviews suggesting glue-on-pizza). The story highlights a structural reliance on invisible human labor to make large LLMs usable, raising ethical and reliability concerns about outsourcing, worker wellbeing, transparency and how much human oversight can realistically scale with increasingly capable models.
Loading comments...
loading comments...