> In the 'trec-kba-2014-07-11-ccr-and-ssf.before-cutoff.tsv', the last
> column which represents "byte range". What does "byte range" really
> means?
Thanks for asking! We have to update the docs to say byte *or* character
range as indicated by prepending a "b" or a "c" to the range. Also, you
can include multiple ranges from a single document joined by commas. See
the trec-kba-2014-07-11-ccr-and-ssf.{before,after}-cutoff.tsv for examples
like this:
c1236-1245,c1246-1249,c1250-1261
that is three character ranges.
If you do not include a "b" or "c" before the "%d-%d" string, then we will
assume it is a byte range.
In the CCR training-&-evaluation data, the offsets are for *mentions* of
the target entity as identified by Serif named entity recognition. This
is part of what was shown to assessors during judging.
> How it is used in SSF?
For SSF, this character or byte range specifies the slot fill that your
system is asserting for the given target_id and slot_name. The offset (or
offsets) identify portion(s) of the clean_visible from the document
identified by the stream_id. The SSF scorer will compare the text from
these offsets to the text in the slot fills created by the assessors.
We're working on an updated SSF scorer that will clarify this further.
John