LLM-based Relevance Assessment for Web-Scale Search Evaluation at Pinterest

arXiv — cs.LGWednesday, November 12, 2025 at 5:00:00 AM
Pinterest's recent advancement in automating relevance evaluation for search systems marks a significant shift in how search results are assessed. Traditionally reliant on human annotations, which are expensive and slow, the company has turned to fine-tuned large language models (LLMs) to streamline this process. By rigorously validating the alignment between LLM-generated judgments and human assessments, Pinterest demonstrates that LLMs can not only provide reliable relevance measurements but also enhance evaluation efficiency. This innovation allows for a broader query set and optimized sampling design, facilitating a more comprehensive assessment of search experiences at scale. As a result, the new method significantly reduces the Minimum Detectable Effect in online experiments, indicating a more sensitive and effective evaluation process. This development is crucial for improving personalized search systems, ensuring that users receive results that better align with their queries a…
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about