Пустой top_tokens_score.last_tokens

65 views
Skip to first unread message

Max Statsenko

unread,
Feb 28, 2017, 5:21:25 AM2/28/17
to bigartm-users
Возникла проблема. До этого работал с выгрузками в формате UCI, конвертированными в batch-формат и всё было нормально. Сейчас попробовал конвертировать из Vowpal Wabbit и получаю почему-то пустое значение top_tokens_score.last_tokens . Подскажите, пожалуйста, что я делаю не так. К сожалению, значения строк вынужден скрыть, т.к. может попадать под NDA/


import artm
print artm.version()
pass_count = 2
default_background_topics_count = 3
topics_num = 20
weight1 = 10.0
weight2 = 10.0 
background_tau = 0.5 
foreground_tau = 0.5 
decorrelation_tau = 1e+5
background_topics_count=default_background_topics_count

print 'Loading train batches...'
train_batch_vectorizer = artm.BatchVectorizer(data_path='C:\\Users\\m.statsenko\\PycharmProjects\\DwarfCatcher\\test_batches',
                                                      data_format='batches')
print 'Batches have been loaded'
print 'Loading dictionary...'
dictionary = artm.Dictionary(data_path='C:\\Users\\m.statsenko\\PycharmProjects\\DwarfCatcher\\test_batches')
print 'Dictionary loaded'

background_tau = abs(background_tau)
foreground_tau = - abs(foreground_tau)
topics_num = int(round(topics_num))

model = artm.ARTM(num_topics=topics_num,
                  class_ids={'class1': 1.0, 'class2': weight1, 'class3': weight2},
                  dictionary=dictionary)
model.scores.add(artm.PerplexityScore(name='perplexity_score'))
model.scores.add(artm.TopTokensScore(name='top_tokens_score'))
model.regularizers.add(artm.SmoothSparsePhiRegularizer(name='sparse_phi_regularizer_background',
                                                       tau=background_tau,
                                                       topic_names=model.topic_names[-background_topics_count:]))
model.regularizers.add(artm.SmoothSparsePhiRegularizer(name='sparse_phi_regularizer_foreground',
                                                       tau=foreground_tau,
                                                       topic_names=model.topic_names[0: -background_topics_count -1]))
model.regularizers.add(artm.SmoothSparseThetaRegularizer(name='sparse_theta_regularizer_background',
                                                         tau=background_tau,
                                                         topic_names=model.topic_names[-background_topics_count:]))
model.regularizers.add(artm.SmoothSparseThetaRegularizer(name='sparse_theta_regularizer_foreground',
                                                         tau=foreground_tau,
                                                         topic_names=model.topic_names[0: -background_topics_count-1]))
model.regularizers.add(artm.DecorrelatorPhiRegularizer(name='decorrelator_phi_regularizer',
                                                       tau=decorrelation_tau))

foreground_topics = model.topic_names[0: -background_topics_count]
background_topics = model.topic_names[-background_topics_count:]
model.fit_offline(train_batch_vectorizer, num_collection_passes=pass_count)
saved_top_tokens = model.score_tracker['top_tokens_score'].last_tokens
result = model.score_tracker['perplexity_score'].last_value
print saved_top_tokens, result

Выдача
0.8.2
Loading train batches...
Batches have been loaded
Loading dictionary...
Dictionary loaded
Widget Javascript not detected.  It may not be installed properly. Did you enable the widgetsnbextension? If not, then run "jupyter nbextension enable --py --sys-prefix widgetsnbextension"
Widget Javascript not detected.  It may not be installed properly. Did you enable the widgetsnbextension? If not, then run "jupyter nbextension enable --py --sys-prefix widgetsnbextension"
Widget Javascript not detected.  It may not be installed properly. Did you enable the widgetsnbextension? If not, then run "jupyter nbextension enable --py --sys-prefix widgetsnbextension"

{} 1023.86988819

При этом:
model.get_phi()



Выдает:
                               topic_0       topic_1       topic_2  \
xxxxxxxxxxxxxxxxxxxxxx    2.176410e-06  6.831331e-08  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    3.880483e-08  0.000000e+00  5.567056e-07   
xxxxxxxxxxxxxxxxxxxxxx    4.182876e-07  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  2.700850e-08  2.053835e-07   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    4.791038e-06  2.341877e-06  4.462831e-06   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    3.486495e-04  5.950908e-04  1.435101e-04   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  3.810325e-09  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    1.387673e-07  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    9.885448e-06  6.776532e-06  3.928494e-07   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  3.107991e-07   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    2.816075e-06  1.750824e-04  1.173249e-04   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  4.080995e-07  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    5.068147e-07  3.172171e-07  1.046817e-07   
xxxxxxxxxxxxxxxxxxxxxx    6.630335e-08  1.282282e-07  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    3.821329e-07  3.680565e-07  8.766111e-08   
xxxxxxxxxxxxxxxxxxxxxx    6.386458e-07  3.464642e-07  1.205484e-06   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    1.418981e-08  1.233779e-08  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
                               ...           ...           ...   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  1.067707e-07  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  1.811339e-07   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  1.596119e-07   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  9.911325e-08   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  2.348565e-10  3.576638e-08  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  9.894556e-10  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  8.510758e-09  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  1.924105e-08  0.000000e+00  0.000000e+00   
...
[XXXXXX rows x 20 columns]

Max Statsenko

unread,
Feb 28, 2017, 6:36:00 AM2/28/17
to bigartm-users
This is the English version of my Question. Excuse me for using Russian in this group. I did not know this rule of the group. I met with a strange problem. I had been working with data in UCI-Bag format converted into BiGARTM-batches and it was "ok". Now i tried to use Vowpal Wabbit format. Since that I got an empty value of top_tokens_score.last_tokens. Please, help me to find my mistake. Excuse me for hiding some data, it can be reached by nda.

This is code:
This is the result:
0.8.2
Loading train batches...
Batches have been loaded
Loading dictionary...
Dictionary loaded
Widget Javascript not detected.  It may not be installed properly. Did you enable the widgetsnbextension? If not, then run "jupyter nbextension enable --py --sys-prefix widgetsnbextension"
Widget Javascript not detected.  It may not be installed properly. Did you enable the widgetsnbextension? If not, then run "jupyter nbextension enable --py --sys-prefix widgetsnbextension"
Widget Javascript not detected.  It may not be installed properly. Did you enable the widgetsnbextension? If not, then run "jupyter nbextension enable --py --sys-prefix widgetsnbextension"

{} 1023.86988819

It is strange. If I call:
model.get_phi()

I got:
Thank you!


вторник, 28 февраля 2017 г., 13:21:25 UTC+3 пользователь Max Statsenko написал:

Oleksandr Frei

unread,
Feb 28, 2017, 7:06:15 AM2/28/17
to Max Statsenko, bigartm-users
Hi,

Did it help to specify class_id for the TopTokensScore?

model.scores.add(artm.TopTokensScore(name='top_tokens_score', class_id='class1'))

Kind regards,
Alex

--
You received this message because you are subscribed to the Google Groups "bigartm-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bigartm-users+unsubscribe@googlegroups.com.
To post to this group, send email to bigart...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bigartm-users/4d4bfc07-f44f-4653-8064-f3d677ba30c7%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Max Statsenko

unread,
Feb 28, 2017, 8:53:13 AM2/28/17
to bigartm-users, maxsta...@gmail.com
Hi!
Yes, it works! Thank you! 
It takes time, until fit_offline ended.
Thank you!

вторник, 28 февраля 2017 г., 15:06:15 UTC+3 пользователь Oleksandr Frei написал:

Kind regards,
Alex

To unsubscribe from this group and stop receiving emails from it, send an email to bigartm-user...@googlegroups.com.

To post to this group, send email to bigart...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages