Hello Alejandro,
this question has come up many times in the past, so I will just
copy&paste my previous email response:
>> Amber writes:
>> I am attempting to use gensim for part of my thesis work, and I'm
>> having a problem I hope you can help with. To test that I am using it
>> correctly, I have copied an example from a tutorial:
>>
www.engr.uvic.ca/~seng474/svd.pdf
> Radim writes:
> About document scaling: LSA in gensim builds latent document representation of any document x_q ("pseudo-document" in the original Deerwester et al terminology) by the formula d_q = s^-1 * u^T * x_q (from x = u * s * d). To compare similarity of two documents, d_q1 and d_q2, Deerwester et al suggest the formula d_q1 * s^2 * d_q2, that is, dot product between the `d` vectors each scaled by `s`. When combined, the `s` cancel out, that's why in lsa[query] I actually do only d_q = u^T * x_q, and then only d_q1 * d_q2 in doc-doc similarity.
>
> So the difference is, calling lsa[corpus] already produces what your tutorial calls d_1, d_2 etc. (not just U_2). The values are already scaled by `s`.
HTH,
Radim