Official Scorer, StreamIds Map

56 views
Skip to first unread message

Vibou

unread,
Oct 14, 2014, 7:27:37 AM10/14/14
to trec...@googlegroups.com
Hello everyone,

I have two questions for John : 
 - Would it be possible to have a map (like you did last year) with stream-ids => xz file ? 
 - May be i missed it but is the official scorer working for KBA14 ? 

Thanks for your answers, can't wait to attend this conference :) 

Vincent.

John R. Frank

unread,
Oct 14, 2014, 8:27:06 AM10/14/14
to trec...@googlegroups.com
>  - Would it be possible to have a map (like you did last year) with
> stream-ids => xz file ? 

List of chunk files kba-streamcorpus-2014-v0_3_0-kba-filtered.txt.xz

linked here:
http://s3.amazonaws.com/aws-publicdatasets/trec/kba/index.html


>  - May be i missed it but is the official scorer working for KBA14 ? 

Yes,
https://github.com/trec-kba/kba-scorer/commit/9a8cc11b6e0f55f1382b23fb4bd1aaea7b289d3c


there is also a form of ssf scorer there, see later commits.

Will circulate update truth data and ssf scorer and output of running
scorer on all runs soon.


jrf

Vibou

unread,
Oct 15, 2014, 6:55:16 AM10/15/14
to trec...@googlegroups.com
Thanks for your quick response. 

I merge the both before and after cutoff and I run this command. Is this correct ? 

python -m  kba.scorer.ccr --cutoff-step 1 ../runs/ ../data/trec-kba-2014-07-11-ccr-and-ssf.before-and-after-cutoff.tsv >& ../2014-kba-scorer.log &

where ../runs/ contains my submitted runs.

John R Frank

unread,
Oct 15, 2014, 9:40:01 AM10/15/14
to trec...@googlegroups.com
Official scoring uses only after cut off truth. Automatic runs should only access before cut off truth data.

jrf

Vibou

unread,
Oct 15, 2014, 9:48:06 AM10/15/14
to trec...@googlegroups.com
List of chunk files kba-streamcorpus-2014-v0_3_0-kba-filtered.txt.xz contains only stream-ids from filtered corpus ? 

If so I need one with all stream-ids of the full corpus.

John R. Frank

unread,
Oct 15, 2014, 9:56:27 AM10/15/14
to trec...@googlegroups.com
> List of chunk files
> kba-streamcorpus-2014-v0_3_0-kba-filtered.txt.xz contains only
> stream-ids from filtered corpus ?  If so I need one with all stream-ids
> of the full corpus.

If you make one of those before we get to it (not sure when that will be),
please let me know and we can put it in s3 for everyone.


jrf

Vibou

unread,
Oct 20, 2014, 5:06:01 AM10/20/14
to trec...@googlegroups.com
Hi John.

In the official scorer how is considered annotators agreements in scoring tool ? do you consider highest rank ? lowest ? or both?

John R. Frank

unread,
Oct 20, 2014, 6:47:58 AM10/20/14
to trec...@googlegroups.com
On Mon, 20 Oct 2014, Vibou wrote:

> In the official scorer how is considered annotators agreements in
> scoring tool ? do you consider highest rank ? lowest ? or both?


Since we are using the --any-up flag, it means that we use the highest
rating. You can read more here:

https://github.com/trec-kba/kba-scorer/blob/master/src/kba/scorer/ccr.py#L254


jrf

Reply all
Reply to author
Forward
0 new messages