nDCG of the fine-tuned MonoT5 models diverge from paper #1

philipphager · 2023-06-28T13:06:39Z

Hey all,

First of all thank you for this interesting work (I enjoyed reading the paper a lot)! After cloning and executing the project run_monot5.py we obtained the following results from the cached run files monoT5/runs:

name	P@1	P@5	P@10	nDCG@10	nDCG@20	RR	AP
MonoT5 fine-tuned title+url	0.8412	0.5991	0.3914	0.6858	0.7087	0.9025	0.7396
MonoT5 fine-tuned title+url+text	0.8581	0.5945	0.3910	0.7034	0.7268	0.9132	0.7462

Which are notably higher values in terms of nDCG than the values reported in the paper (which are ≈0.45). A student of mine also re-ran the T5 models published on huggingface without the run caching and reported similarly diverging values.

I just wanted to highlight this finding. Do you have any idea where these values are coming form?

Cheers,
Philipp

The text was updated successfully, but these errors were encountered:

seanmacavaney · 2023-06-29T09:51:34Z

Hey @philipphager -- thanks for reporting!

It looks like the nDCG results reported in the paper used an old version of the qrels file that mistakenly included duplicate entries. trec_eval (which provides P@k, MAP, etc.) handles these properly, but gdeval (which provides nDCG(dcg="exp-log2")@20) doesn't and counts the duplicates against the ideal DCG. When I run evaluation of MonoT5 using the official qrels and the result files included in this repository, my results match the ones you list above.

This issue did not affect the 𝜆-Mart results in the paper, since they were all computed using the correct qrels file.

From what I can tell, this problem doesn't change the conclusions in the paper, since the nDCG's of MonoT5 are still a cut below the 𝜆-Mart results. I'll try to prepare a corrected version of Table 4 that we can put in this repository.

Does this help?

philipphager · 2023-07-03T15:28:35Z

Hey @seanmacavaney!

Thanks for the quick follow-up! Hmm, that makes sense. Agreed, the conclusions do not change just the intuition of how far the two methods are apart from each other. So it'd definitely make sense to publish and updated table on the repository 👍

Thanks again for all the help, cheers!

philipphager changed the title ~~nDCG for the fine-tuned M5 models diverge from paper~~ nDCG of the fine-tuned MonoT5 models diverge from paper Jun 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nDCG of the fine-tuned MonoT5 models diverge from paper #1

nDCG of the fine-tuned MonoT5 models diverge from paper #1

philipphager commented Jun 28, 2023 •

edited

Loading

seanmacavaney commented Jun 29, 2023

philipphager commented Jul 3, 2023

nDCG of the fine-tuned MonoT5 models diverge from paper #1

nDCG of the fine-tuned MonoT5 models diverge from paper #1

Comments

philipphager commented Jun 28, 2023 • edited Loading

seanmacavaney commented Jun 29, 2023

philipphager commented Jul 3, 2023

philipphager commented Jun 28, 2023 •

edited

Loading