AI-generated 'participants' can lead social science experiments astray (www.science.org)

🤖 AI Summary
A new arXiv preprint by metascientist Jamie Cummins warns that using LLMs as “silicon samples” in behavioral research can produce wildly different—and sometimes contradictory—results depending on model choice and researcher decisions. Cummins used real data from 85 participants on two psychological measures (a gut-level racial preference and belief in a just world) and fed participant profiles to various LLMs (e.g., ChatGPT, Deepseek), systematically varying demographic prompts and model settings like temperature. Across 252 combinations he measured how well simulated responses matched humans by ranking similarity, means and distributions, and the correlation between the two measures. Some configurations matched certain metrics but no single setup reproduced human data across the board. The paper signals a major methodological and ethical caution for the AI/ML and social-science communities: apparently defensible choices about model, prompt, and hyperparameters can materially change findings, threatening reproducibility and risking harm if LLMs are used to simulate underrepresented or vulnerable groups (who may be underrepresented in training data). Researchers suggest silicon samples may be useful for pilots if carefully validated against human data, but the field needs standards, transparency about choices, and ethical guidelines before replacing human participants.
Loading comments...
loading comments...