What are the benefit of using "lemmatize=True" in gensim.corpora.wikicorpus.WikiCorpus(full_path, lemmatize=lemmatize)?

48 views
Skip to first unread message

Willi S.

unread,
Jul 8, 2020, 12:26:18 PM7/8/20
to Gensim
Hi,
I'm fairly new to gensim. And I tried to figure out what the differences of lemmatize=True compared to lemmatize=False are. The documentation of this is very short.
If I compare both outputs, I would say that the output generated by lemmatize=False looks a bit better.

So what can you tell me about the differences?

Best regards
Willi S.

Radim Řehůřek

unread,
Jul 9, 2020, 4:19:59 PM7/9/20
to Gensim
Hi Will,

lemmatize=True will turn on lemmatization, which is the process of replacing each word with its lemma = base form. For example, "model = models = modelling = modelled" etc.


Gensim use a 3rd party library called "pattern" for lemmatisation. There have been reports that "pattern" is broken on Python 3, so we may drop it altogether. If `lemmatize=False` works better for you, great.

HTH,
Radim
Reply all
Reply to author
Forward
0 new messages