Questions about SSF Task and "Byte Range"

28 views
Skip to first unread message

Wallace

unread,
Aug 1, 2014, 1:30:39 AM8/1/14
to trec...@googlegroups.com
Hi John,

I have several questions about the second task SSF.
In the 'trec-kba-2014-07-11-ccr-and-ssf.before-cutoff.tsv', the last column which represents "byte range".
What does "byte range" really  means? 
How it is used in SSF?


Thanks,
Wallace

John R. Frank

unread,
Aug 3, 2014, 8:11:31 PM8/3/14
to trec...@googlegroups.com

> In the 'trec-kba-2014-07-11-ccr-and-ssf.before-cutoff.tsv', the last
> column which represents "byte range". What does "byte range" really
>  means? 

Thanks for asking! We have to update the docs to say byte *or* character
range as indicated by prepending a "b" or a "c" to the range. Also, you
can include multiple ranges from a single document joined by commas. See
the trec-kba-2014-07-11-ccr-and-ssf.{before,after}-cutoff.tsv for examples
like this:

c1236-1245,c1246-1249,c1250-1261

that is three character ranges.

If you do not include a "b" or "c" before the "%d-%d" string, then we will
assume it is a byte range.

In the CCR training-&-evaluation data, the offsets are for *mentions* of
the target entity as identified by Serif named entity recognition. This
is part of what was shown to assessors during judging.


> How it is used in SSF?

For SSF, this character or byte range specifies the slot fill that your
system is asserting for the given target_id and slot_name. The offset (or
offsets) identify portion(s) of the clean_visible from the document
identified by the stream_id. The SSF scorer will compare the text from
these offsets to the text in the slot fills created by the assessors.

We're working on an updated SSF scorer that will clarify this further.



John

Wallace

unread,
Aug 4, 2014, 3:35:28 AM8/4/14
to trec...@googlegroups.com
If there are several slot values exist in the same streamitem_id with  a specified entity_id,
we will have the following format?

team_id system_id  streamitem_id1    entity_ida ...... slotname1,   c1866-1871
team_id system_id  streamitem_id1    entity_ida ...... slotname2,   c1900-1925



在 2014年8月1日星期五UTC+8下午1时30分39秒,Wallace写道:

John R. Frank

unread,
Aug 4, 2014, 6:49:13 AM8/4/14
to trec...@googlegroups.com

Yes, that's right.

jrf
> --
> You received this message because you are subscribed to the Google Groups "TREC-KBA" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to trec-kba+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

Wallace

unread,
Aug 4, 2014, 10:03:04 AM8/4/14
to trec...@googlegroups.com
In "trec-kba-2014-07-11-truth-data/trec-kba-2014-07-11-ccr-and-ssf.profiles.yaml", 
Can we use the slot values in the YAML file?
If we found the slot values that are different from those in YAML, how to deal with this case?
Should we exclude the slot values defined in YAML file?


Thanks

在 2014年8月1日星期五UTC+8下午1时30分39秒,Wallace写道:

John R. Frank

unread,
Aug 4, 2014, 10:11:33 AM8/4/14
to trec...@googlegroups.com
> In
> "trec-kba-2014-07-11-truth-data/trec-kba-2014-07-11-ccr-and-ssf.profiles.yaml", Can
> we use the slot values in the YAML file?


That's the truth data, so the only *automatic* use of it is in scoring.
If your system uses that info, then it is not an *automatic* run.

Automatic SSF systems can use all of the ...before-cutoff.tsv *and*
...after-cutoff.tsv data.

From the README.txt: "For SSF, this entity is defined by *all* of the
truth data that mentions him, and also the external profiles. It is valid
for an SSF system to use all of the truth data from before and after the
cutoff and also external profiles. (If we find that this is sufficient to
make SSF "easy," then there will be much rejoicing and we will make it
harder next year :-)"


The SSF scorer will compare the your submitted .tsv file with the info in
the ...profiles.yaml file.

Does this clarify?

jrf

Wallace

unread,
Aug 4, 2014, 10:51:26 AM8/4/14
to trec...@googlegroups.com
You say  " If your system uses that info, then it is not an *automatic* run.  "
Does this mean if the system is not *automatic*, we can use it?
How to define an system is an "Manual" system?


Thanks.

在 2014年8月1日星期五UTC+8下午1时30分39秒,Wallace写道:

John R. Frank

unread,
Aug 4, 2014, 11:30:52 AM8/4/14
to trec...@googlegroups.com
Great question:

In general an "automatic" system should be able to run on a new inputs and
achieve the same performance without a human helping it at its task. For
SSF this is slightly subtle, because we are allowing SSF systems to see
the rating levels assigned by human assessors. The specific task of an
SSF system is to generate slot filling suggestions.

So, the definition of the input to an SSF system is a set of documents,
possibly with relevance rating levels from humans. The output is a set of
strings from StreamItems (identified by stream_id) associated with
(target_id, slot_name, confidence)

An automatic SSF system might also access the external_profile data from
any time.

The thing that an automatic SSF system *cannot* do is access the slots in
the profiles.yaml file. You might first build an automatic SSF system,
and then manually investigate the profiles.yaml file to figure out how to
make your system better--- and then the system gets classified as "manual"
because you would be tweaking-&-tuning it manually.

Does that clarify?

John


On Mon, 4 Aug 2014, Wallace wrote:

Reply all
Reply to author
Forward
0 new messages