This tarball updated tarball is now available:
It includes important updates on:
- plots describing the kba-streamcorpus data volumes per hour
- NEW: training data for KBA SSF
- assessor guidelines for both KBA SSF and CCR
- statistics on KBA SSF evaluation data
- fixed stream_ids for KBA CCR training data
Here is an excerpt from the updated README.rst in the tarball:
KBA 2013 has two tasks:
1) Cumulative Citation Recommendation (CCR), and
2) Streaming Slot Filling (SSF)
CCR is a document filtering task, and SSF is a slot filling task.
In studying CCR, many people realized that a large fraction of "vital"
documents can be explained with a sentence of the form:
"The entity's _____ attribute acquired this value: ____."
In fact, it is an interesting research question to identify vital
documents that do not fit this pattern. These changing entity profiles
reflect real-world events, which often appear as spikes in this time
series plot of all of the vital documents across the entire
seventeen-month time range, which depicts both training and evaluation
ground truth data:
The underlying corpus time series is plotted here:
The CCR task requires coreference resolution of entity mentions. The SSF
task requires coreference resolution of both entities and slot fills.
See assessor guidelines for details.
One of KBA's goals is to attract researchers from both information
retrieval and natural language understanding. CCR naturally caters to IR,
and SSF to NLU. By weaving the two together, we hope to foster
The KBA Organizers