Could you share the retrieval code for the fullwiki setting?

89 views
Skip to first unread message

Deming Ye

unread,
May 8, 2019, 12:26:33 AM5/8/19
to HotpotQA

Hi,
Thanks for releasing the code, it's very helpful for me.

In paper's appendix, you introduce your retrieval strategy for fullwiki setting. I want to retrieve more than 10 articles, so I use DrQA's retriever(unigram+bigram hash table). But its performance is so far from the your results in the top 10 paragraphs(Hit@ 10).

Could you share the retrieval code for the fullwiki setting?

Thanks!

Best Regard,
Deming Ye

saizhen...@gmail.com

unread,
May 8, 2019, 1:46:29 AM5/8/19
to HotpotQA
Hi,

The algorithm implemented in this paper is a truncated approximate version of the DrQA's retriever, in which the similarity is only measured on a subset of wiki articles S_cand (|S_cand| < 5000, as explained in Appendix) given the question. Also, when a gold paragraph is not in that candidate subset S_cand, we set the rank of that gold paragraph to be |S_cand| + 1 which is an upper bound of its true rank (as explained in Appendix). This could explain why the result you got from DrQA retriever is different from the paper's numbers for hit@10. 

In practice, DrQA retriever should be a good starting point.

Best,

Deming Ye

unread,
May 8, 2019, 10:11:50 AM5/8/19
to HotpotQA
Thanks for your kind advice, I found get the subset of wiki articles S_cand by the algorithm in Appedix is help.

1 )When I use DrQA with unigram+bigram in the whole set, I got Hit@10: 0.330
2) When I use DrQA with unigram+bigram in the S_cand, I got Hit@10: 0.415
 
But it still have a gap with the paper's retriever (Hit@10 0.557).

Should I only use bigram tf-idf to rerank the S_cand without unigram as the paper did? 

And does tf-idf value mean "tfidf = log(tf + 1) * log((N - Nt + 0.5) / (Nt + 0.5))" as it in DrQA?

Thank you!

Best Regard,
Deming Ye

在 2019年5月8日星期三 UTC+8下午1:46:29,saizhe...@gmail.com写道:
Reply all
Reply to author
Forward
0 new messages