It is possible to evaluate IR performance by comparing the retrieved documents against the supporting facts, because it is easy to infer the gold paragraphs from the supporting facts by just taking the titles.
On Tuesday, December 4, 2018 at 3:34:39 PM UTC-5,
jiangk...@gmail.com wrote:
Could you please upload the code and something related about the retrieval performance part? I would like to test the performance of other IR models on HotpotQA. If not, is replacing the paragraphs in dev set the only way to evaluate other IR models? Hope can get your reply and help which I would really appreciate.