Stemming in gensim

670 views
Skip to first unread message

Swapnajit Chakraborti

unread,
Apr 27, 2014, 1:38:30 AM4/27/14
to gen...@googlegroups.com
While playing with basic examples, I found that gensim does not support stemming
of words. Is there anything that I am missing?

Regards,
Swapnajit

Valerio Maggio

unread,
Apr 27, 2014, 10:34:39 AM4/27/14
to gen...@googlegroups.com


> On 27/apr/2014, at 07:38, Swapnajit Chakraborti wrote:
>
> While playing with basic examples, I found that gensim does not support stemming
> of words. Is there anything that I am missing?

Hi Swapnajit.

As already pointed out few seconds ago in another thread, NLTK is what you need for these kind of NLP-related tasks: http://www.nltk.org/api/nltk.stem.html

Best,
Valerio

Swapnajit Chakraborti

unread,
Apr 27, 2014, 11:43:35 AM4/27/14
to gen...@googlegroups.com
Thanks a lot for the pointer.

Regards,
Swapnajit

Christian Ledermann

unread,
Apr 28, 2014, 8:41:19 AM4/28/14
to gensim
stemming is part of the tokenization process see:


I used the snowball stemmer with success:




--
You received this message because you are subscribed to the Google Groups "gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Best Regards,

Christian Ledermann

London - UK
Mobile : +44 7899488511

<*)))>{

If you save the living environment, the biodiversity that we have left,
you will also automatically save the physical environment, too. But If
you only save the physical environment, you will ultimately lose both.

1) Don’t drive species to extinction

2) Don’t destroy a habitat that species rely on.

3) Don’t change the climate in ways that will result in the above.

}<(((*>

Radim Řehůřek

unread,
Apr 28, 2014, 10:07:57 AM4/28/14
to gen...@googlegroups.com
Hi Swapnajit,

there's a little know module in gensim that does stemming, under `gensim.parsing.preprocessing`, contributed by one user. It uses the porter stemming algorithm.

But like Valerio says, if you need more NLP, you're better off using a dedicated library such as NLTK. I actually plan to deprecate the `gensim.parsing` package -- its place is not in core gensim IMO, and noone's worked on it for ages anyway.

HTH,
Radim

Skipper Seabold

unread,
Apr 28, 2014, 10:32:01 AM4/28/14
to gensim
On Sun, Apr 27, 2014 at 1:38 AM, Swapnajit Chakraborti <fi13swa...@iimidr.ac.in> wrote:
While playing with basic examples, I found that gensim does not support stemming
of words. Is there anything that I am missing?


FWIW, I'm using PyStemmer quite a bit. It includes (wrappers for) a suite of stemmers in different languages.


Skipper

Swapnajit Chakraborti

unread,
Apr 28, 2014, 1:22:13 PM4/28/14
to gen...@googlegroups.com
Hello All,

Thanks a lot for providing all the pointers. I shall check them
and get back if I find any difficulty.

Regards,
Swapnajit
Reply all
Reply to author
Forward
0 new messages