-
Notifications
You must be signed in to change notification settings - Fork 103
How Scoring Works in Quepid
To measure how good your search quality is, we need an Evaluation Measure, which in Quepid parlance is a Scorer.
Quepid ships with some scorers (that we call communal) out of the box, and the default is AP@10. It's easy however to customize or write your own scorers from scratch. The binary (i.e relevant/not relevant) scorers are P@10 and AP@10. The graded scorers (i.e 0,1,2 or 3) are CG@10, DCG@10, and NDCG@10. More details on how to use that grading scale is available in Judgement Rating Best Practices wiki page.
Note: Quepid only supports a single scorer per evaluation (in contrast to tools like RRE) that can provide multiple metrics.
Below are some notes about the nuances of the various scorers.
When scoring your documents, you may see that it's possible to score a perfect 100 even though all of your documents are rated with the lowest score.
One weird thing about NDCG is that it works with the ideal ordering of rated documents. I.e 4,3,2,1 and 3,3,1 and 1,1,1 all score the same. This means that no matter the rating, if the final results meet the ideal ordering, the score will be 100. So, if you score ten documents as 1,4,1,1,1,1,1,1,1,1, you get a 72. Tweak your algorithm to move that 4 to be first so that it's sorted as 4,1,1,1,1,1,1,1,1,1 and boom, you are back to 100 (even though 90% of the documents were rated as irrelevant)!
It would be nice if there was a warning about this!
NDCG oftens comes in two variants, Local and Global. In Local you only look at the documents that are returned from the search engine for your query. So if you have rated 20 docs, but you return the first 10, then the second 10 are NOT included in the scoring. In contrast, the Global scorer actually takes into account all documents that you have rated for a query. So if you know that there are other highly relevant documents that should be returned and aren't, and you use the Explain Other feature to find and score them, then they will contribute to the score.
Note: Quepid's NDCG@10 implements the Global algorithem. The topRatings(k) line in the scorer goes and get's all rated documents to compare against what the search engine returns. Change this line to XXX to swap to Local scoring.
Another limitation that we have is that it's easy to loose track of what you have scored using the Explain Other feature. We need a screen in Explain Other that lists all the documents that have been scored ;-)