For example:
<instance id="2">
<context>The BIS could conclude stand-by
credit agreements with the creditor countries' central <head>banks</
head> if they should so request.</context>
bank.n.fr 2 :: bancaire 1;bank 1;banque 5;caisse 1;institution 1;
However, lemmatization is not described in the task description
website or the task description <a href="http://www.aclweb.org/
anthology/W/W09/W09-2413.pdf">publication</a>.
Are we expected to lemmatize our output in the target languages?
Thanks,
Joseph
it is indeed true that the output translations should be lemmatized.
All translations in the clusters have been lemmatized as well.
There is a detailed description of the task and data formats that
we use in the documentation pdf document that you could download
together
with the trial data and scorer.
You can also find a version of the detailed documentation on:
http://lt3.hogent.be/semeval/Trial/Task3_doc.pdf
Best,
Els
Permission denied.
> it is indeed true that the output translations should be lemmatized.
> All translations in the clusters have been lemmatized as well.
What technique did you use for lemmatization? We don't have expert
knowledge in the five target languages and are concerned that our
translations will be good but our lemmatization won't. This would make
our translations look bad, even though they aren't.
Other participants, what lemmatization techniques are you using?
For Task 2, where the target language is Spanish, Marine Carpuat is
using TreeTagger, but Pierpaolo Basile says the FreeLing is better for
Spanish: http://www.lsi.upc.edu/~nlp/freeling/
1. I have changed the permissions for the documentation file,
so everybody should be able to access it now.
2. For lemmatisation, you can use the freely available Treetagger
for most languages. For Dutch, there is no pretrained lemmatiser
included in Treetagger, but there you can use TadPole
that you can download from:
http://ilk.uvt.nl/tadpole/.
Best,
Els