I want to get topic distributions on unseen documents using previoulsy tuned and saved LDA method

33 views
Skip to first unread message

Subham Biswas

unread,
Jul 4, 2022, 8:10:12 PM7/4/22
to Gensim
Hi,

I have build a corpus and trained an LDA model. Now I am trying to use it on unseen documents and get the topic distributions against new unseen document.

Wont the new corpus will have totally different representation of words to vector space?

>>> # Create a corpus from a list of texts
>>> common_dictionary = Dictionary(common_texts)
>>> common_corpus = [common_dictionary.doc2bow(text) for text in common_texts]

>>> # Train the model on the corpus.
>>> lda = LdaModel(common_corpus, num_topics=10)

>>>all_topics = lda.get_document_topics(common_corpus, minimum_probability=0.0)
>>>all_topics_csr = gensim.matutils.corpus2csc(all_topics)
>>>all_topics_numpy = all_topics_csr.T.toarray()
>>>all_topics_df = pd.DataFrame(all_topics_numpy)

>>> # Save model to disk.
>>> temp_file = datapath("model")
>>> lda.save(temp_file)
>>>
>>> # Load a potentially pretrained model from disk.
>>> lda = LdaModel.load(temp_file)

>>> other_corpus = [common_dictionary.doc2bow(text) for text in other_texts]
>>> unseen_doc = other_corpus[0]
>>> vector = lda[unseen_doc] 

 # get topic probability distribution for a document
>>>all_topics = lda.get_document_topics(other_corpus, minimum_probability=0.0)
>>>all_topics_csr = gensim.matutils.corpus2csc(all_topics)
>>>all_topics_numpy = all_topics_csr.T.toarray()
>>>all_topics_df = pd.DataFrame(all_topics_numpy)
Reply all
Reply to author
Forward
0 new messages