Hi everyone,
I'm trying to use a model trained on one corpus to get topic assignments on new data.
Here is what I have so far:
#Model is LdaMulticore and was trained on another corpus
model = gensim.models.LdaMulticore.load(model_file)
corpus_new = gensim.corpora.MmCorpus(corpus_file)
topic_assignments = model[corpus_new]
for i in topic_assignments:
print i
# Results in error:
Traceback (most recent call last):
File "<ipython-input-17-be690e265876>", line 1, in <module>
for i in t_doc:
File "/home/frederic/anaconda/lib/python2.7/site-packages/gensim/interfaces.py", line 122, in __iter__
yield self.obj[doc]
File "/home/frederic/anaconda/lib/python2.7/site-packages/gensim/models/ldamodel.py", line 921, in __getitem__
return self.get_document_topics(bow, eps)
File "/home/frederic/anaconda/lib/python2.7/site-packages/gensim/models/ldamodel.py", line 908, in get_document_topics
gamma, _ = self.inference([bow])
File "/home/frederic/anaconda/lib/python2.7/site-packages/gensim/models/ldamodel.py", line 432, in inference
expElogbetad = self.expElogbeta[:, ids]
IndexError: index 8088 is out of bounds for axis 1 with size 7477
I suspect that this is caused by the mismatch between the word IDs used by the model and those used by the new corpus.
I would think that I somehow have to use the dictionary of the old corpus on the new one, but I am unclear on how to do this.
Any hints are appreciated
Frederic