When applying the LSI model during inference, I'm observing that the resulting LSI representation contains only 1499 topics (vec_lsi varaiable from inference code) instead of the expected 1500 topics. This discrepancy occurs despite the following:
Therefore, it appears that during the conversion from the bag-of-words representation to the LSI representation, one of the 1500 topics is being lost or omitted, resulting in an LSI representation with only 1499 topics. Could you please explain why this might be happening and provide any insights or potential solutions to address this issue?
My documents consist of very small, single line with 3 to 5 words, and the queries are also of the same length.
I have also observed that instead of obtaining 1500 topic representations, only 1499 are being generated.