Answer grounding in Context

Nitish Gupta

unread,

Oct 22, 2018, 5:27:03 PM10/22/18

to HotpotQA

Hi,

There are quite a few training/dev examples where the answer string is not exactly found in the contexts.

1. The answer word doesn't appear exactly. Eg. answer: "Ghana", and the closest word in context is "Ghanian" (id: 5ac275e755429921a00aaf81)

2. The answer string doesn't appear as a separate token / span. For example, answer: "1989", and the closest token is "1989-was". (id: 5ab341f755429969a97a8114)

What should be done in this case?

Thanks,

Nitish

zhi...@google.com

unread,

Oct 27, 2018, 1:05:38 PM10/27/18

to HotpotQA

Hi Nitish,

(In your first example, the context word should be "Ghanaian").

It is only guaranteed that the answer is a substring of the context. Therefore, it is possible, as shown in your examples, that the answer could be just part of a word. You might refer to our code at https://github.com/hotpotqa/hotpot as an example of handling these. You can specifically pay attention to prepro.py. This rarely happens so in general it should not affect your model too much.

Hope it helps,

Zhilin

zhi...@google.com

unread,

Oct 27, 2018, 1:11:43 PM10/27/18

to HotpotQA

In fact we did use some tokenization to prevent getting sub-word answers during data collection, but our tokenizer is not always correct.

Reply all

Reply to author

Forward