LLM-Powered Relevance Assessment for Pinterest Search (medium.com)

šŸ¤– AI Summary
Pinterest has announced a significant advancement in its search relevance assessment by leveraging large language models (LLMs) to enhance the effectiveness of its A/B testing for search ranking. Traditionally, measuring relevance relied heavily on limited human annotations, which restricted the ability to detect nuanced changes in user experience. By fine-tuning multilingual LLMs on human-annotated data, Pinterest has developed a method that not only reduces labeling costs and accelerates evaluation but also improves the granularity of feedback from experiments, demonstrating a reduction in minimum detectable effects (MDEs) from about 1.5% to 0.25%. This innovation is pivotal for the AI/ML community as it showcases the potential of LLMs to efficiently handle complex tasks, such as relevance prediction in personalized search environments. Utilizing a cross-encoder architecture, Pinterest's approach incorporates a diverse array of textual features and employs stratified sampling techniques to ensure representative metrics across user queries. The validation indicated strong alignment between LLM-generated labels and human judgments, with Kendall’s Ļ„ and Spearman’s ρ values suggesting high reliability. Furthermore, this method opens avenues for future enhancements, such as incorporating Visual Language Models (VLMs) and expanding non-English query assessments, amplifying its impact on international markets.
Loading comments...
loading comments...