Batch wise getTheta() and 'TopToken' Score diffrent than ARTM artm.BatchVectorizer() and artm.fit_offline()

55 views

Skip to first unread message

max H

unread,

Aug 13, 2019, 3:43:18 AM8/13/19

to bigartm-users

I try to run ARTM for getTheta() and 'TopTokens' using python

master = mc.MasterComponent(lib, scores=scores, cache_theta=True)

topic_names=['topic_{}'.format(i) for i in range(num_topics)]

master.initialize_model(model_name=pwt,dictionary_name=dictionary_name)

#test_06_get_theta.py Option:3
#Getting theta matrix online during iteration

master.normalize_model(pwt, nwt)

master.process_batches(pwt, nwt, num_document_passes, batches=[batch_filename])

After each batch i am get Theta Matrix of particular batch and clear Theta using

_, theta_matrix = master.get_theta_matrix()
master.clear_theta_cache()

Then want to get 'TopTokens' score using

top_tokens_score = master.get_score('TopTokens')

But results are not good, sometimes only one Token in some of Topics.

Help me to run topic modeling using memory efficient way.

Reply all

Reply to author

Forward

0 new messages

Batch wise getTheta() and 'TopToken' Score diffrent than ARTM artm.BatchVectorizer() and artm.fit_offline()

max H

#test_06_get_theta.py Option:3 #Getting theta matrix online during iteration

#test_06_get_theta.py Option:3
#Getting theta matrix online during iteration