Finding word lemmas

102 views
Skip to first unread message

Morten Minde Neergaard

unread,
Jan 29, 2012, 2:35:41 PM1/29/12
to nltk-...@googlegroups.com
Hi,

I've been looking into looking up single words in a dictionary as part
of a project. To allow input in non-canonical form, I'm using the method
nltk.corpus.wordnet.morphy. This works very well for English.

For other languages, the stemmers I've found in NLTK return forms
unsuitable for dictionary lookup, e.g. in Spanish:

nltk.stem.snowball.SpanishStemmer().stem('logicamente')
u'logic'

Does anyone have a good idea for how to handle this? A worst case
scenario would be using lemmatizers not written in python, but I'd like
to avoid this.

Cheers,
--
Morten Minde Neergaard

Alex Rudnick

unread,
Jan 29, 2012, 2:47:29 PM1/29/12
to nltk-...@googlegroups.com
Hey Morten,

Morphological analysis is kind of a hard problem for many languages!
You may have to find a language-specific tool in a lot of cases, and
many of them may not be in Python.

But if you want to do Spanish, Mike Gasser (my advisor) has some
Python 3 software that works pretty well for Spanish verbs. In many
cases (don't know the precision/recall), it will find the infinitive
form of a verb, given the conjugated form. There's also morphological
analyzers for a few other languages here:

http://www.cs.indiana.edu/~gasser/Research/software.html

Hope this helps!

--
-- alexr

Morten Minde Neergaard

unread,
Feb 8, 2012, 2:09:42 PM2/8/12
to nltk-...@googlegroups.com
At 14:47, Sun 2012-01-29, Alex Rudnick wrote:
> Hey Morten,

Hi! Sorry that I didn't remember to say “thank you” right away =)

> Morphological analysis is kind of a hard problem for many languages!
> You may have to find a language-specific tool in a lot of cases, and
> many of them may not be in Python.

Indeed, and lots of the tools are closed source and/or have rotten code
bases.

> But if you want to do Spanish, Mike Gasser (my advisor) has some
> Python 3 software that works pretty well for Spanish verbs. In many
> cases (don't know the precision/recall), it will find the infinitive
> form of a verb, given the conjugated form. There's also morphological
> analyzers for a few other languages here:

In case anyone cares, I ended up writing a small wrapper around
TreeTagger. It's giving me good results.
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html

--
Morten Minde Neergaard

Reply all
Reply to author
Forward
0 new messages