Training Data and Evaluation Data

Wallace

unread,

Jul 31, 2014, 3:02:22 AM7/31/14

to trec...@googlegroups.com

Hi John,

I know there are two time range data: Training data, Evaluation data

1、In 'trec-kba-2014-07-11-truth-data/README.txt', which says that the training data can be used be both the definition of entities and training data.

What does 'can be used the definition of entities' mean?

Does training data just use for improving our Vital Filtering systems?

2、Can we use training data (such as truth stream items) when do Evaluation?

3、when score the result of our runs, I am not clear that score the result by using the attached truth data?

Thanks,

Wallace

John R. Frank

unread,

Jul 31, 2014, 8:34:24 AM7/31/14

to trec...@googlegroups.com

> 1、In 'trec-kba-2014-07-11-truth-data/README.txt', which says that the
> training data can be used be both the definition of entities and
> training data. What does 'can be used the definition of entities' mean?

The *mentions* of the entity in documents are sufficient for a human to
recognize and understand the entity. For these entities, instead of
treating the task like KBP Entity Linking, it is more like KBP Nil
Clustering. It is a more general form of coreference resolution.

> Does training data just use for improving our Vital Filtering systems?

You can use both the relevance rating = 1 and 2 documents to identify the
entity. Both indicate that a text contains mentions that refer to the
entity. The vital filtering task is to separate the 1s from the 2s.

> 2、Can we use training data (such as truth stream items) when do
> Evaluation?

The official metrics will be computed with only the ...after-cutoff...
data.

> 3、when score the result of our runs, I am not clear that score the
> result by using the attached truth data?

Using the scorer:
https://github.com/trec-kba/kba-scorer/

you can put a run in a directory:

$ mkdir runs

$ cat trec-kba-2014-07-11-ccr-and-ssf.after-cutoff.tsv | gzip > runs/nist-assessors.gz

And score it like this:

$ python -m kba.scorer.ccr --any-up --require-positives=4 runs trec-kba-2014-07-11-ccr-and-ssf.after-cutoff.tsv

Does that clarify?

John

wallac...@gmail.com

unread,

Jul 31, 2014, 9:06:40 AM7/31/14

to trec-kba

So, it means that Training Data can be used when do Evaluations?

In the final submission, we just submit the Evaluation Time Range results, but not Training Time Range?

Here, I have another question, 'trec-kba-2014-07-11-truth-data' provides us with Truth Data After-Cutoff (Evaluation Time Range),

so In the final submission,our systems should exclude items in the After-Cutoff truth items?

Thanks,

Wallace

--

You received this message because you are subscribed to a topic in the Google Groups "TREC-KBA" group.

To unsubscribe from this topic, visit https://groups.google.com/d/topic/trec-kba/dBqht58AvFQ/unsubscribe.

To unsubscribe from this group and all its topics, send an email to trec-kba+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

John R. Frank

unread,

Jul 31, 2014, 11:11:56 AM7/31/14

to trec-kba

> So, it means that Training Data can be used when do Evaluations?

CCR systems must use the ...before-cutoff... file.

I say "must" because I don't think it is possible for a CCR system to
execute the task without looking at that data.

> In the final submission, we just submit the Evaluation Time Range
> results, but not Training Time Range?

Excellent question: please include results from the entire time range.
When we do pooled assessing, we will probably look at data from both ETR
and TTR.

> Here, I have another question, 'trec-kba-2014-07-11-truth-data' provides
> us with Truth Data After-Cutoff (Evaluation Time Range), so In the final

> submission, our systems should exclude items in the After-Cutoff truth
> items?

No need to exclude anything. Automatic CCR systems should not look at the
evaluation truth data trec-kba-2014-07-11-ccr-and-ssf.after-cutoff.tsv

For scoring and pooling purposes, we might filter the runs in various
ways, but you do not need to do anything like that when generating a run.

In pseudocode, a CCR system can be just these high-level steps.

my_filter = FancyFilter(path_to_before_cutoff_truth_data)

my_run_submission = TSVwriter(open('teamId-fancyFilter.tsv'))

for item in TimeOrderedCorpus:
rating, confidence, etc = my_filter.judge(item)
my_run_submission.add(rating, confidence, etc)

upload(my_run_submission)

A system could also use limited elements of the profiles YAML file; see
other discussion threads.

John

Wallace

unread,

Aug 5, 2014, 12:44:49 AM8/5/14

to trec...@googlegroups.com

Hi John,

In the truth data (both beforecutoff.csv and aftercutoff.csv),

Is the truth data just the subset of our results? Namely, the truth data does not contain all of the relevant items of a specified entity.

So, in the final how many relevant items should we submit for each entity?

Thanks

在 2014年7月31日星期四UTC+8下午3时02分22秒，Wallace写道：

John R. Frank

unread,

Aug 5, 2014, 12:52:29 AM8/5/14

to trec...@googlegroups.com

> Is the truth data just the subset of our results? Namely, the truth data
> does not contain all of the relevant items of a specified entity. So, in
> the final how many relevant items should we submit for each entity?

For both CCR and SSF, you can submit any number of assertions from any
number of documents in the ...kba-filtered corpus.

For CCR, we will pool results from the entire time range, and conduct
pooled assessing on stream_ids that are not yet in the truth data. We
will then *expand* the truth data with the results of this additional
assessing. Official results will be scores generated using this expanded
set of assessments. This means that your runs consider StreamItems from
the entire ...kba-filtered corpus.

For SSF, we will evaluate the runs and figure out whether and how to pool
results for further assessing.

Does this answer your question?

jrf

Wallace

unread,

Aug 5, 2014, 1:32:15 AM8/5/14

to trec...@googlegroups.com

Hi, John, Thanks very much. I know the CCR.

Can we test the result of our SSF system using the score system?

Thanks.

Wallace

在 2014年7月31日星期四UTC+8下午3时02分22秒，Wallace写道：

John R. Frank

unread,

Aug 5, 2014, 1:37:24 AM8/5/14

to trec...@googlegroups.com

We're working on a major upgrade (replacement really) to the
kba.score.ssf. I'll have an update about it in a week or so.

jrf

On Mon, 4 Aug 2014, Wallace wrote:

> Hi, John, Thanks very much. I know the CCR.Can we test the result of our SSF system using the score system?

> --
> You received this message because you are subscribed to the Google Groups "TREC-KBA" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to trec-kba+u...@googlegroups.com.

Reply all

Reply to author

Forward