LLM Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation (arxiv.org)

🤖 AI Summary
A recent study highlights a critical and underexplored risk in using large language models (LLMs) for text annotation in social science research, termed "LLM hacking." By systematically replicating 37 annotation tasks across 21 studies with 18 different LLMs, researchers analyzed over 13 million labels to assess how choices like model selection, prompting, and temperature settings influence statistical outcomes. They discovered that these variations can lead to erroneous conclusions in roughly one-third of hypotheses for state-of-the-art models—and up to half for smaller models—indicating that even high-performing LLMs cannot fully eliminate the risk of bias or error propagation. This research is significant for the AI/ML community as it quantifies how subtle, often overlooked factors in LLM deployment can impact reproducibility and scientific integrity. Notably, the risk of LLM hacking decreases with larger effect sizes but remains substantial near significance thresholds, where careful verification is crucial. The study also finds that common correction methods, like regression adjustments, are ineffective in mitigating these risks, while incorporating human annotations and improved model selection strategies offer better safeguards. Alarmingly, the work reveals that intentional manipulation of results through minimal prompt tweaks and model choices is surprisingly easy, raising ethical and methodological concerns about the reliability of LLM-generated data analytics in research contexts.
Loading comments...
loading comments...