Batch wise getTheta() and 'TopToken' Score diffrent than ARTM artm.BatchVectorizer() and artm.fit_offline()

54 views
Skip to first unread message

max H

unread,
Aug 13, 2019, 3:43:18 AM8/13/19
to bigartm-users

Hi

I try to run ARTM for getTheta() and 'TopTokens' using python

master = mc.MasterComponent(lib, scores=scores, cache_theta=True)
topic_names=['topic_{}'.format(i) for i in range(num_topics)]

master.initialize_model(model_name=pwt,dictionary_name=dictionary_name)

#test_06_get_theta.py Option:3
#Getting theta matrix online during iteration


master.normalize_model(pwt, nwt)
master.process_batches(pwt, nwt, num_document_passes, batches=[batch_filename])

After each batch i am get Theta Matrix of particular batch and clear Theta using

_, theta_matrix = master.get_theta_matrix()
master.clear_theta_cache()

Then want to get 'TopTokens' score using

top_tokens_score = master.get_score('TopTokens')

But results are not good, sometimes only one Token in some of Topics.

Help me to run topic modeling using memory efficient way.

Reply all
Reply to author
Forward
0 new messages