Questions about gold standards

39 views
Skip to first unread message

sara_tonelli

unread,
Mar 3, 2010, 7:20:52 AM3/3/10
to SemEval2010-Keyphrase
Hi everybody,
have few questions and remarks for the organizers:
1. I checked the training files for evaluation (train.reader.final)
and I noticed there are some mistakes done during lemmatization. In
particular, some acronyms ending with "s" have been lemmatized, for
example gps has become gp. Are you going to manually check the final
gold standard provided for task evaluation or do we have to take into
account such lemmatized acronyms?
2. In the gold standards, there are keywords such as "agent-oriented"
or "systems-topic" that are sometimes divided by a slash and sometimes
by a space. Wouldn't it be possible to accept both solutions in the
evaluation (for example in the form "agent-oriented+agent oriented")?.
3. Can we submit two or more runs?

Thanks!
Sara.

Su Nam Kim

unread,
Mar 3, 2010, 5:00:12 PM3/3/10
to semeval201...@googlegroups.com
Dear Sara

> 1. I checked the training files for evaluation (train.reader.final)
> and I noticed there are some mistakes done during lemmatization. In
> particular, some acronyms ending with "s" have been lemmatized, for
> example gps has become gp. Are you going to manually check the final
> gold standard provided for task evaluation or do we have to take into
> account such lemmatized acronyms?

The lemmatization was done automatically using 'morpha' and confirmed manually.
Could you specify the detail of problematic keypwords to us please?
We can take a look at them and fix them if they are incorrectly lemmatized.

> 2. In the gold standards, there are keywords such as "agent-oriented"
> or "systems-topic" that are sometimes divided by a slash and sometimes
> by a space.  Wouldn't it be possible to accept both solutions in the
> evaluation (for example in the form "agent-oriented+agent oriented")?.

Keywords such as 'agent-oriented' and 'systems-topic' are taken from
the content. Thus, you should be able to get them as words.
At the moment, we do not intend to alternate the answer set for such cases.

> 3. Can we submit two or more runs?

Good point.
We haven't mentioned this.
Organizers will discuss this matter and get back to you as soon as possible.

Thank you.

--
======================================
Su Nam Kim
CSSE dept., University of Melbourne
http://www.csse.unimelb.edu.au/~snkim
sn...@csse.unimelb.edu.au
======================================

Sara Tonelli

unread,
Mar 4, 2010, 4:11:15 AM3/4/10
to semeval201...@googlegroups.com
Hi,
as for the mistakes in the gold standard, I noticed for example "hit algorithm" (given for H-42) while it is HITS (Hyperlink-Induced Topic Search). Also "vickrey-clarke-groves" in J-58 became "vickrey-clarke-grove". The same for "gps receiver" which was lemmatized as "gp receiver". Another small typo was "uva-based deploment" which was given as keyword in C-44, while it should be "deployment".
Sara.


________________________________________
From: semeval201...@googlegroups.com [semeval201...@googlegroups.com] On Behalf Of Su Nam Kim [suna...@gmail.com]
Sent: Wednesday, March 03, 2010 11:00 PM
To: semeval201...@googlegroups.com
Subject: Re: Questions about gold standards

Dear Sara

Thank you.

--
You received this message because you are subscribed to the Google Groups "SemEval2010-Keyphrase" group.
To post to this group, send an email to semeval201...@googlegroups.com.
To unsubscribe from this group, send email to semeval2010-keyp...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/semeval2010-keyphrase?hl=en-GB.

Su Nam Kim

unread,
Mar 4, 2010, 4:16:44 AM3/4/10
to semeval201...@googlegroups.com
Dear Sara

Thank you for the information.
We'll confirm the test answer set once more to make sure lemmas are correct.

Reply all
Reply to author
Forward
0 new messages