LLMs Reproduce Human Purchase Intent via Semantic Similarity (arxiv.org)

0 points 6 hours ago ago | visit original

🤖 AI Summary

Researchers introduce semantic similarity rating (SSR), a technique that uses large language models to simulate consumer survey responses by eliciting free-text justifications and then mapping those texts to Likert-style ratings via embedding similarity against curated reference statements. Tested on a substantial dataset of 57 personal-care product surveys with 9,300 human responses from an industry partner, SSR produces realistic response distributions (Kolmogorov–Smirnov similarity > 0.85) and reaches roughly 90% of human test–retest reliability. The method also yields rich qualitative explanations alongside each synthetic rating, addressing a major shortcoming of direct numeric prompting, which tends to produce implausible rating distributions. SSR’s significance lies in offering a scalable, interpretable alternative to expensive and biased human panels: it preserves traditional survey metrics while adding depth through model-generated rationale. Technically, the approach hinges on embedding-based semantic matching between model outputs and a set of reference statements corresponding to Likert points, rather than coercing models to output numbers. Immediate implications include lower-cost pretesting, rapid scenario exploration, and richer mock-consumer insights; caveats are remaining questions about generalizability beyond the tested category, potential amplification of model biases, and the need for careful calibration of reference statements and population representativeness.

Loading comments...

loading comments...