Evaluating job search ranking with LLM judged NDCG (corvi.careers)

🤖 AI Summary
A new evaluation method termed LLM judged NDCG has been introduced for assessing job search ranking systems, which accommodates various query types from broad roles like "software engineer" to niche skills such as "Haskell." This method leverages large language models (LLMs) to assign relevance scores from 0 to 100 for job matches based on user queries. By using NDCG (Normalized Discounted Cumulative Gain), this framework allows for a more nuanced comparison of job rankings, capturing subtle distinctions between candidates based on their scores, as opposed to traditional precision and recall metrics which can oversimplify relevance categorization. The significance of LLM judged NDCG lies in its ability to maintain the integrity of relevance scores and emphasize the importance of rank ordering, especially among similarly qualified job matches. By applying linear gain rather than exponential gain, the system accurately reflects the weight of small score differences without exaggerating their impact. With recent evaluations yielding a mean NDCG@10 score of 0.8799, this innovative metric can serve as both a performance indicator and a debugging tool, highlighting areas for improvement in the product's job ranking capabilities. This approach not only enhances the precision of job recommendations but also empowers continuous refinement of search algorithms in the AI/ML community.
Loading comments...
loading comments...