vec = dictionary.doc2bow(data.split())
topics_list = lda[vec]
However, the above code is taking 20 sec to classify new document. Is there any way to increase the speed ?
Regards,
Kiran.
Are you using the Multicore version of LDA? How many cores are you using?
0:00:00.107246
above is loading time
0:00:00.064976
above is pre-processing time
0:00:00.043155
above is doc2bow converting time
6 is length of new document vector [len(dictionary.doc2bow(rec.split()))], where rec is each document.
0:00:29.283126
above is testing time
Regards,
Kiran.
dictionary = gensim.corpora.Dictionary.load('ldadictionary.dict')
model_gensim = gensim.models.LdaModel(id2word=mallet_model.id2word, num_topics=mallet_model.num_topics, alpha=mallet_model.alpha, iterations=100)
model_gensim.expElogbeta[:] = mallet_model.wordtopics
0:02:53.144997
above is mallet model time
[0.32368686868687113, 0.32368686868687113, 0.32368686868687113, 0.32368686868687113, 0.32368686868687113, 0.32368686868687113, 0.32368686868687113, 0.32368686868687113, 0.32368686868687113, 0.32368686868687113]
above is probabilities
[76, 76, 76, 76, 76, 76, 76, 76, 76, 76]
above is topics
0:00:00.438506
above is gensim model time
[0.3296713681035431, 0.32937899769094003, 0.3290911680265532, 0.33037637275912024, 0.3308946749415768, 0.3219509847307517, 0.330096109579114, 0.33630419366304487, 0.33079761620347337, 0.33273683751942856]
above is probabilities
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
above is topics
However, timing wise, there is a significant improvement.
--
You received this message because you are subscribed to the Google Groups "gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
dictionary = gensim.corpora.Dictionary.load('ldadictionary.dict')
lda = gensim.models.wrappers.LdaMallet.load('ldamodel')
m_vals = []
m_inds = []
a = datetime.datetime.now()
for i in range(0,10):
processed_line=preprocess(line.encode('utf-8'))
new_vec = dictionary.doc2bow(processed_line.split())
topic_dist=lda[new_vec]
temp=[float(x[1]) for x in topic_dist]
max_val=max(temp)
m_vals.append(max_val)
index=temp.index(max_val)
m_inds.append(index)
b = datetime.datetime.now()
print b-a
print "above is mallet model time"
print m_vals
print "above is probabilities"
print minds
print "above is topics"
model_gensim = gensim.models.LdaModel(id2word=lda.id2word, num_topics=lda.num_topics, alpha=lda.alpha, iterations=200)
model_gensim.expElogbeta[:] = lda.wordtopics
a = datetime.datetime.now()
g_vals = []
g_inds = []
for i in range(0,10):
processed_line=preprocess(line.encode('utf-8'))
new_vec = dictionary.doc2bow(processed_line.split())
topic_dist=model_gensim[new_vec]
temp=[float(x[1]) for x in topic_dist]
max_val=max(temp)
g_vals.append(max_val)
index=temp.index(max_val)
g_inds.append(index)
b = datetime.datetime.now()
print b-a
print "above is gensim model time"
print g_vals
print "above is probability values"
print grinds
print "above is topics"
model.print_topics(num_topics=50)