information on training

ashwin

unread,

Oct 20, 2013, 7:43:38 PM10/20/13

to stream...@googlegroups.com

Hi Everybody,

Need info the judgement of documents.

I see that assessor judgement for the StreamItems can be obtained from StreamItem::ratings field.

What was the rule followed regarding the number of assessor agreement for a document to be judged vital/useful?

Is it simple majority rule or there was some other criterion?

Thanks,

Ashwin

John R. Frank

unread,

Oct 22, 2013, 8:10:08 PM10/22/13

to stream...@googlegroups.com

> I see that assessor judgement for the StreamItems can be obtained from
> StreamItem::ratings field.

We have used the StreamItem::ratings map internally. However, I don't
think we have released any corpora with data in that field.

> What was the rule followed regarding the number of assessor agreement
> for a document to be judged vital/useful?
>

> Is it simple majority rule ï¿œor there was some other criterion?

The scorer for KBA uses this rule: if any assessor rated an assertion
lower, then it gets the lower rating. An assertion in CCR is (stream_id,
target_id).

I've experimented with other voting schemes, and it has minimal effect on
max(F(macro_avg(P), macro_avg(R))) as a function of cutoff.

jrf

Ashwani Rao

unread,

Oct 22, 2013, 8:18:48 PM10/22/13

to John R. Frank, stream...@googlegroups.com

Thanks John for the information.

How do I then get judgement (rating) of StreamItem?

I have not used training till now,but now plan to use that.

I thought StreamItem::rating had that information.

I have two more question.

1. How were the documents sampled for judgement in TTR period? Were they randomly sampled from TTR period.

2. Is there any document describing the judgement procedure for both TTR and ETR periods.

Ashwin

On Tue, Oct 22, 2013 at 8:10 PM, John R. Frank <j...@mit.edu> wrote:

I see that assessor judgement for the StreamItems can be obtained from StreamItem::ratings field.

We have used the StreamItem::ratings map internally. However, I don't think we have released any corpora with data in that field.

What was the rule followed regarding the number of assessor agreement for a document to be judged vital/useful?

Is it simple majority rule or there was some other criterion?

The scorer for KBA uses this rule: if any assessor rated an assertion lower, then it gets the lower rating. An assertion in CCR is (stream_id, target_id).

I've experimented with other voting schemes, and it has minimal effect on max(F(macro_avg(P), macro_avg(R))) as a function of cutoff.

jrf

--

Thanks,

ashwin

John R. Frank

unread,

Oct 22, 2013, 8:32:32 PM10/22/13

to trec...@googlegroups.com

Hi Ashwani,

I think your questions pertain to TREC KBA and not the other tracks using
the streamcorpus data, so I'm moving this to the trec-kba forum.

> How do I then get judgement (rating) of StreamItem?

The probably-final training data is available in this git repo:

https://github.com/trec-kba/kba-scorer/blob/master/data/trec-kba-ccr-judgments-2013-09-26-expanded-with-ssf-inferred-vitals-plus-len-clean_visible.before-and-after-cutoff.filter-run.txt

Assuming we find no further issues that need remedying, we'll send this to
NIST as the permanent distribution source for the final update to the
tarball of training and evaluation data and assessing guidelines.

The penultimate such tarball is described in this post from July 19th:

https://groups.google.com/forum/#!searchin/trec-kba/tarball/trec-kba/nfBHzBa04y8/JHkgllQZ8igJ

> 1. How were the documents sampled for judgement in TTR period? Were they
> randomly sampled from TTR period.

We used high recall surface form names for all of the 141 target entities
and exhaustively assessed all of the documents that matched these strings
from the full time range TTR + ETR.