Lemmatization

30 views
Skip to first unread message

Joseph Turian

unread,
Feb 25, 2010, 12:28:17 AM2/25/10
to SemEval2010.CrossLingualLexicalSubstitution
Question:

It would be unfortunate if someone's system performed poorly merely
because they didn't lemmatize their translation output correctly, but
they otherwise got the correct meaning. Can you provide a lemmatizer
for the source and target language? If not, how can we assume that our
systems have the appropriate lemmas?

ambidextrous

unread,
Feb 25, 2010, 1:58:11 PM2/25/10
to SemEval2010.CrossLingualLexicalSubstitution
Hi again Joseph,

in tasks like this while it is evident that certain preprocessing
tools might put a system to a certain advantage, such tools are never
provided. Instead, you are given the choice to use any resource or
tool of your choice. We have not made any sort of comparison amongst
different lemmatizers available, but if you make sure that you strip
the words down to the basic lemma [no inflection], all lemmatizers
should lead to the same output [the same basic lemmas with no
inflections].

Ravi

Joseph Turian

unread,
Feb 28, 2010, 4:34:11 PM2/28/10
to SemEval2010.CrossLingualLexicalSubstitution
Ravi and other participants,

My concern is that the system will get a good translation of the word,
but choose the wrong lemma. In this case, the system would be unfairly
penalized.

What semi-automatic technique did the annotators use for
lemmatization?

We don't have any native Spanish-language speakers, so we are not sure
how to make sure we get the appropriate lemmas.
Other participants, do you mind sharing your lemmatization approach?

Thanks,
Joseph

Marine Carpuat

unread,
Mar 1, 2010, 10:05:02 AM3/1/10
to clls...@googlegroups.com
Hi Joseph,

I am planning to use TreeTagger: http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/, which I have found useful on other European languages. (But I am not a Spanish speaker either, and I cannot comment on how well it matches the semi-automatic technique used on the gold data.)

  Marine



--
You received this message because you are subscribed to the Google Groups "SemEval2010.CrossLingualLexicalSubstitution" group.
To post to this group, send email to clls...@googlegroups.com.
To unsubscribe from this group, send email to clls2010+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/clls2010?hl=en.


Pierpaolo

unread,
Mar 2, 2010, 4:26:56 AM3/2/10
to SemEval2010.CrossLingualLexicalSubstitution
Hi,
TreeTagger is not good for Spanish due to the low-quality of Spanish
training data.
I suggest to use FreeLing (http://www.lsi.upc.edu/~nlp/freeling/). I
plan to use it for this task ;-).

Good luck,
Pierpaolo

> > clls2010+u...@googlegroups.com<clls2010%2Bunsu...@googlegroups.com>

ambidextrous

unread,
Mar 5, 2010, 9:32:16 PM3/5/10
to SemEval2010.CrossLingualLexicalSubstitution
Hi all,

As to the semi-supervised method we used to lemmatize the
translations, we used TreeTagger, which was followed by a manual
inspection of lemmatization errors and/ or spelling errors.

Best,
Ravi

Reply all
Reply to author
Forward
0 new messages