[trec-kba] official CCR query targets

John R. Frank

unread,

Oct 28, 2014, 8:22:10 AM10/28/14

to trec...@googlegroups.com

KBAers,

A couple people have pointed out that the truth data from 07-11 had fewer
entities that met the CCR query criterion than the 10-15 truth data. The
criterion is simply that there be at least 5 vitals in the time range of
the corpus. 20% gets used as training data, which in the minimal case is
just one of five.

In the email linked below, I said "If the training_time_range_end is
``null``, then that entity is not a CCR query target." I should have said
something different, like "they are not scored yet, but use them anyway."

https://groups.google.com/d/msg/trec-kba/dQNWPxBLmfs/oxOavFh0aQQJ

Given that miscommunication, the official scoring for TREC this year will
only use the CCR entities that met that criterion in 07-11. Anyone using
the truth data for new systems in the future should consider using all the
entities.

Here are the counts of number of entities matching the criterionn for the
two sets of truth data:

$ grep training_time trec-kba-2014-10-15-ccr-and-ssf-query-topics.json | grep -v null | wc -l
74

$ grep training_time trec-kba-2014-07-11-ccr-and-ssf-query-topics.json | grep -v null | wc -l
71

jrf

John R. Frank

unread,

Oct 28, 2014, 1:14:09 PM10/28/14

to trec...@googlegroups.com

KBAers,

Per this change that dropped a couple entities that were not part of the
07-11 truth data, I have re-run all the scoring. The resulting changes
appear to be minor. The dropbox download links circulated earlier still
work, and now point to the new files containing the updated scorer output.

jrf

Jingtian Jiang

unread,

Oct 28, 2014, 11:05:47 PM10/28/14

to trec...@googlegroups.com

Hi John,

I did not get the dropbox download links. Could you post it again?

Jingtian

Reply all

Reply to author

Forward