Does gensim's LsiModel use a weighting factor?

Brian Barry

unread,

Sep 10, 2021, 4:00:25 PM9/10/21

to Gensim

Hello,

I was looking at the documentation for the LsiModel and could not tell if/where a weighting scheme is applied to document vectors.

It seems like the following lines are where the main transformations of LSA take place, and no weighting occurs:

```
if not scipy.sparse.issparse(docs):
docs = matutils.corpus2csc(docs)
ut, s, vt = sparsesvd.sparsesvd(docs, k + 30)

```

However most definitions of the LSA algorithm include a step where some weighting scheme, like TF-IDF is applied. Do I have to apply gensim.models.TfidfModel to preprocess my corpus before training an LsiModel to fit this definition of LSA?

And out of curiosity if the LsiModel does not implicitly have a weighting step - why?

Thank you,

Brian

Radim Řehůřek

unread,

Sep 14, 2021, 7:15:16 AM9/14/21

to Gensim

Hi Brian,

because in Gensim, TF-IDF is a separate transformation which you may choose (or not) to apply alongside LSI:

https://radimrehurek.com/gensim/auto_examples/core/run_topics_and_transformations.html#sphx-glr-auto-examples-core-run-topics-and-transformations-py

HTH,

Radim

Brian Barry

unread,

Sep 14, 2021, 1:24:55 PM9/14/21

to gen...@googlegroups.com

I see. So LSI in this case is essentially just SVD on a count vectorizer matrix.

Thanks you,

Brian

--
You received this message because you are subscribed to a topic in the Google Groups "Gensim" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gensim/02JRcLSeuHk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gensim+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gensim/9f4611ff-9b19-43af-8547-1cfffa60904fn%40googlegroups.com.

Radim Řehůřek

unread,

Sep 14, 2021, 4:15:51 PM9/14/21

to Gensim

Yes. There are also other normalization / regularization schemes, besides TF-IDF, that you might want to use with LSI. For example the Log-Entropy model:

https://radimrehurek.com/gensim/models/logentropy_model.html

Plus of course, TF-IDF itself is a bag of models depending on what parameters / formula you use (its smartirs constructor option).

Hope that helps,

Radim

Brian Barry

unread,

Sep 14, 2021, 5:39:30 PM9/14/21

to gen...@googlegroups.com

I see, thank you so much for the advice!

Best,

Brian

To view this discussion on the web visit https://groups.google.com/d/msgid/gensim/eeee4fd4-85a5-4270-9a20-d80469dcfcd8n%40googlegroups.com.

Reply all

Reply to author

Forward