Does gensim's LsiModel use a weighting factor?

29 views
Skip to first unread message

Brian Barry

unread,
Sep 10, 2021, 4:00:25 PM9/10/21
to Gensim
Hello,

I was looking at the documentation for the LsiModel and could not tell if/where a weighting scheme is applied to document vectors. 
It seems like the following lines are where the main transformations of LSA take place, and no weighting occurs:
```
if not scipy.sparse.issparse(docs):
    docs = matutils.corpus2csc(docs)
ut, s, vt = sparsesvd.sparsesvd(docs, k + 30)
```

However most definitions of the LSA algorithm include a step where some weighting scheme, like TF-IDF is applied. Do I have to apply gensim.models.TfidfModel to preprocess my corpus before training an LsiModel to fit this definition of LSA?

And out of curiosity if the LsiModel does not implicitly have a weighting step - why?

Thank you,

Brian 

Radim Řehůřek

unread,
Sep 14, 2021, 7:15:16 AM9/14/21
to Gensim
Hi Brian,

because in Gensim, TF-IDF is a separate transformation which you may choose (or not) to apply alongside LSI:

HTH,
Radim

Brian Barry

unread,
Sep 14, 2021, 1:24:55 PM9/14/21
to gen...@googlegroups.com
I see. So LSI in this case is essentially just SVD on a count vectorizer matrix.

Thanks you,

Brian

--
You received this message because you are subscribed to a topic in the Google Groups "Gensim" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gensim/02JRcLSeuHk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gensim+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gensim/9f4611ff-9b19-43af-8547-1cfffa60904fn%40googlegroups.com.

Radim Řehůřek

unread,
Sep 14, 2021, 4:15:51 PM9/14/21
to Gensim
Yes. There are also other normalization / regularization schemes, besides TF-IDF, that you might want to use with LSI. For example the Log-Entropy model:

Plus of course, TF-IDF itself is a bag of models depending on what parameters / formula you use (its smartirs constructor option).

Hope that helps,
Radim

Brian Barry

unread,
Sep 14, 2021, 5:39:30 PM9/14/21
to gen...@googlegroups.com
I see, thank you so much for the advice!

Best,

Brian

Reply all
Reply to author
Forward
0 new messages