Are the target words lemmatized?

Joseph Turian

unread,

Mar 2, 2010, 1:09:21 AM3/2/10

to SemEval2010_Cross-Lingual Word Sense Disambiguation

I see in the development data that the target words are lemmatized.

For example:
<instance id="2">
<context>The BIS could conclude stand-by
credit agreements with the creditor countries' central <head>banks</
head> if they should so request.</context>

bank.n.fr 2 :: bancaire 1;bank 1;banque 5;caisse 1;institution 1;

However, lemmatization is not described in the task description
website or the task description <a href="http://www.aclweb.org/
anthology/W/W09/W09-2413.pdf">publication</a>.

Are we expected to lemmatize our output in the target languages?

Thanks,
Joseph

Els

unread,

Mar 2, 2010, 10:12:01 AM3/2/10

to SemEval2010_Cross-Lingual Word Sense Disambiguation

Hi Joseph,

it is indeed true that the output translations should be lemmatized.
All translations in the clusters have been lemmatized as well.

There is a detailed description of the task and data formats that
we use in the documentation pdf document that you could download
together
with the trial data and scorer.
You can also find a version of the detailed documentation on:
http://lt3.hogent.be/semeval/Trial/Task3_doc.pdf

Best,
Els

Joseph Turian

unread,

Mar 2, 2010, 12:57:54 PM3/2/10

to cross-li...@googlegroups.com

> You can also find a version of the detailed documentation on:
> http://lt3.hogent.be/semeval/Trial/Task3_doc.pdf

Permission denied.

> it is indeed true that the output translations should be lemmatized.
> All translations in the clusters have been lemmatized as well.

What technique did you use for lemmatization? We don't have expert
knowledge in the five target languages and are concerned that our
translations will be good but our lemmatization won't. This would make
our translations look bad, even though they aren't.

Other participants, what lemmatization techniques are you using?
For Task 2, where the target language is Spanish, Marine Carpuat is
using TreeTagger, but Pierpaolo Basile says the FreeLing is better for
Spanish: http://www.lsi.upc.edu/~nlp/freeling/

Els

unread,

Mar 4, 2010, 10:58:55 AM3/4/10

to SemEval2010_Cross-Lingual Word Sense Disambiguation

Hi Joseph,

1. I have changed the permissions for the documentation file,
so everybody should be able to access it now.

2. For lemmatisation, you can use the freely available Treetagger
for most languages. For Dutch, there is no pretrained lemmatiser
included in Treetagger, but there you can use TadPole
that you can download from:
http://ilk.uvt.nl/tadpole/.

Best,
Els

Reply all

Reply to author

Forward