Answer grounding in Context

106 views
Skip to first unread message

Nitish Gupta

unread,
Oct 22, 2018, 5:27:03 PM10/22/18
to HotpotQA
Hi,

There are quite a few training/dev examples where the answer string is not exactly found in the contexts. 
1. The answer word doesn't appear exactly. Eg. answer: "Ghana", and the closest word in context is "Ghanian" (id: 5ac275e755429921a00aaf81)
2. The answer string doesn't appear as a separate token / span. For example, answer: "1989", and the closest token is "1989-was". (id: 5ab341f755429969a97a8114)

What should be done in this case?

Thanks,
Nitish


zhi...@google.com

unread,
Oct 27, 2018, 1:05:38 PM10/27/18
to HotpotQA
Hi Nitish,

(In your first example, the context word should be "Ghanaian").

It is only guaranteed that the answer is a substring of the context. Therefore, it is possible, as shown in your examples, that the answer could be just part of a word. You might refer to our code at https://github.com/hotpotqa/hotpot as an example of handling these. You can specifically pay attention to prepro.py. This rarely happens so in general it should not affect your model too much.

Hope it helps,
Zhilin

zhi...@google.com

unread,
Oct 27, 2018, 1:11:43 PM10/27/18
to HotpotQA
In fact we did use some tokenization to prevent getting sub-word answers during data collection, but our tokenizer is not always correct.
Reply all
Reply to author
Forward
0 new messages