I had collected 1000 news articles on "Drug". I created a LDA model. Now I want to predict new topics for a new unseen article on my trained LDA model?
Will the new corpus for the new article be generated by new dictionary for that document or old dictionary?
from gensim.corpora import Dictionary
# Create a dictionary representation of the documents.
dictionary = Dictionary(docs) ## Docs here is list of 1000 articles - already pre-processed (removed stopwords, lemmatized etc.)
# Bag-of-words representation of the documents.
corpus = [dictionary.doc2bow(doc) for doc in docs] ## corpus for 1000 articles
Now, for generating new corpus for new unseen article :
# Create a dictionary representation of the docs4 which is the unseen article
dictionary4 = Dictionary(docs4)
# Bag-of-words representation of the documents. - using the dictionary for docs4 to create new corpus i.e. corpus4
corpus4 = [dictionary4.doc2bow(doc4) for doc4 in docs4]
# using it to create new topics for new corpus
new_topic = ldamodel[corpus4] #ldamodel - trained ldamodel on old corpus i.e. corpus- training on new corpus
for a in new_topic:
print (a)
[(1, 0.098211573818414402), (4, 0.028076146702543749), (8, 0.028542374478413981), (10, 0.18255508495305284), (15, 0.015144587592186069), (16, 0.18098460371101405), (26, 0.072777604705255586), (30, 0.012641000304970024), (33, 0.073507649419200294), (41, 0.088433736869442225), (45, 0.078896131665663172), (47, 0.085055118744042049), (48, 0.046267072369662939)]
Is this correct? I am generating 50 topics
My model is not accurate. What are the ways n which I can improve my model?