Processed QANTA dataset is up

5 views
Skip to first unread message

Chen Zhao

unread,
Nov 16, 2018, 3:30:40 PM11/16/18
to Human Computer Question Answering
Hi all,

We posted the processed QANTA dataset to make it more similar as trivia QA setting in https://sites.google.com/view/qanta/resources.

We split the questions into individual sentences. For each sentence, we first retrieve top-10 Wikipedia articles over whole Wikipedia using TFIDF scoring. Then insider these articles, we retrieve top-10 paragraphs with TFIDF scoring as candidates. After that we use TAGME to extract all entities linked to Wikipedia titles for each retrieved paragraph.

Hopefully this resource could help build machine reading based models!

Best,
Chen
Reply all
Reply to author
Forward
0 new messages