> So, it means that Training Data can be used when do Evaluations?
CCR systems must use the ...before-cutoff... file.
I say "must" because I don't think it is possible for a CCR system to
execute the task without looking at that data.
> In the final submission, we just submit the Evaluation Time Range
> results, but not Training Time Range?
Excellent question: please include results from the entire time range.
When we do pooled assessing, we will probably look at data from both ETR
and TTR.
> Here, I have another question, 'trec-kba-2014-07-11-truth-data' provides
> us with Truth Data After-Cutoff (Evaluation Time Range), so In the final
> submission, our systems should exclude items in the After-Cutoff truth
> items?
No need to exclude anything. Automatic CCR systems should not look at the
evaluation truth data trec-kba-2014-07-11-ccr-and-ssf.after-cutoff.tsv
For scoring and pooling purposes, we might filter the runs in various
ways, but you do not need to do anything like that when generating a run.
In pseudocode, a CCR system can be just these high-level steps.
my_filter = FancyFilter(path_to_before_cutoff_truth_data)
my_run_submission = TSVwriter(open('teamId-fancyFilter.tsv'))
for item in TimeOrderedCorpus:
rating, confidence, etc = my_filter.judge(item)
my_run_submission.add(rating, confidence, etc)
upload(my_run_submission)
A system could also use limited elements of the profiles YAML file; see
other discussion threads.
John