Hi,
I have build a corpus and trained an LDA model. Now I am trying to use it on unseen documents and get the topic distributions against new unseen document.
Wont the new corpus will have totally different representation of words to vector space?
>>> # Create a corpus from a list of texts
>>> common_dictionary = Dictionary(common_texts)
>>> common_corpus = [common_dictionary.doc2bow(text) for text in common_texts]
>>> # Train the model on the corpus.
>>> lda = LdaModel(common_corpus, num_topics=10)
>>>all_topics = lda.get_document_topics(common_corpus, minimum_probability=0.0)
>>>all_topics_csr = gensim.matutils.corpus2csc(all_topics)
>>>all_topics_numpy = all_topics_csr.T.toarray()
>>>all_topics_df = pd.DataFrame(all_topics_numpy)
>>> # Save model to disk.
>>> temp_file = datapath("model")
>>> lda.save(temp_file)
>>>
>>> # Load a potentially pretrained model from disk.
>>> lda = LdaModel.load(temp_file)
>>> other_corpus = [common_dictionary.doc2bow(text) for text in other_texts]
>>> unseen_doc = other_corpus[0]
>>> vector = lda[unseen_doc]
# get topic probability distribution for a document
>>>all_topics = lda.get_document_topics(other_corpus, minimum_probability=0.0)
>>>all_topics_csr = gensim.matutils.corpus2csc(all_topics)
>>>all_topics_numpy = all_topics_csr.T.toarray()
>>>all_topics_df = pd.DataFrame(all_topics_numpy)