Processed QANTA dataset is up

5 views

Skip to first unread message

Chen Zhao

unread,

Nov 16, 2018, 3:30:40 PM11/16/18

to Human Computer Question Answering

Hi all,

We posted the processed QANTA dataset to make it more similar as trivia QA setting in https://sites.google.com/view/qanta/resources.

We split the questions into individual sentences. For each sentence, we first retrieve top-10 Wikipedia articles over whole Wikipedia using TFIDF scoring. Then insider these articles, we retrieve top-10 paragraphs with TFIDF scoring as candidates. After that we use TAGME to extract all entities linked to Wikipedia titles for each retrieved paragraph.

Hopefully this resource could help build machine reading based models!

Best,

Chen

Reply all

Reply to author

Forward

0 new messages