We want to assess the accuracy of LSI models of different dimensionality.
Of course we can do this by training models with different values for `numTopics`
but this costs a lot of time for high values of numTopics. Shouldn't it be possible to
just compute once a model with very high dimensionality (lets say N = 5000) and then use only
the first M < N dimensions for assessing the quality of a lower dimensional model.
The matrices should be somewhere in the LSI model, but I could not find how to access them.
best, stephan
thanks for your reply. Found the matrices, this was really easy.
I think this functionality would also be useful for other gensim users wo
don't want to mess with the LSImodel internals.
I would like to add this to the class and now think about which interface to use.
Usually a dimensionality parameter like this would be passed to the function,
but transformation is done in __getitem_ which should not receive another parameter.
So should it be set in another function like:
model.set_dim(50)
c = model[corpus]
or would this be confusing because maybe people afterwards don't remember that dimensionality
was reduced permanently?
stephan
this is already implemented in the latest version as far as I know.
You can just set the num_topics variable.
So you first create your model with e.g. 300 dim and then
model.num_topics = 200
c = model[corpus]
will give you a transform of the corpus to a model using only the 200 dim with largest eigenvalues.
stephan
>If you want to make sure gensim got it right, you can always (at least
>for inputs that fit in RAM) compare the lsi.projection.s matrix to the
>result of `numpy.linalg.svd(gensim.matutils.corpus2dense(mycorpus))`.
>All singular values at the tail of the spectrum should be zero (or
>almost zero) for rank-deficient input.
I think my problem will be solved.
Besides, I have learned some other things about Gensim from you.
Best Regards,
Shiva