Re: Empty top_tokens_score.last_tokens

33 views
Skip to first unread message

Oleksandr Frei

unread,
Feb 28, 2017, 5:28:43 AM2/28/17
to Max Statsenko, bigartm-users
Hi Max,

I guess it might be because you didn't specify class_id argument when creating top tokens score, so top tokens are retrieved for @default_class modality. Try to use

model.scores.add(artm.TopTokensScore(name='top_tokens_score', class_id='class1'))

Kind regards,
Alex

P.S. Let's aim to keep bigartm-users conversations in English. We have an internal group "artm...@googlegroups.com" <artm...@googlegroups.com>  for discussions in Russian, you are welcome to join :)

On Tue, Feb 28, 2017 at 11:21 AM, Max Statsenko <maxsta...@gmail.com> wrote:
Возникла проблема. До этого работал с выгрузками в формате UCI, конвертированными в batch-формат и всё было нормально. Сейчас попробовал конвертировать из Vowpal Wabbit и получаю почему-то пустое значение top_tokens_score.last_tokens . Подскажите, пожалуйста, что я делаю не так. К сожалению, значения строк вынужден скрыть, т.к. может попадать под NDA/


import artm
print artm.version()
pass_count = 2
default_background_topics_count = 3
topics_num = 20
weight1 = 10.0
weight2 = 10.0 
background_tau = 0.5 
foreground_tau = 0.5 
decorrelation_tau = 1e+5
background_topics_count=default_background_topics_count

print 'Loading train batches...'
train_batch_vectorizer = artm.BatchVectorizer(data_path='C:\\Users\\m.statsenko\\PycharmProjects\\DwarfCatcher\\test_batches',
                                                      data_format='batches')
print 'Batches have been loaded'
print 'Loading dictionary...'
dictionary = artm.Dictionary(data_path='C:\\Users\\m.statsenko\\PycharmProjects\\DwarfCatcher\\test_batches')
print 'Dictionary loaded'

background_tau = abs(background_tau)
foreground_tau = - abs(foreground_tau)
topics_num = int(round(topics_num))

model = artm.ARTM(num_topics=topics_num,
                  class_ids={'class1': 1.0, 'class2': weight1, 'class3': weight2},
                  dictionary=dictionary)
model.scores.add(artm.PerplexityScore(name='perplexity_score'))
model.scores.add(artm.TopTokensScore(name='top_tokens_score'))
model.regularizers.add(artm.SmoothSparsePhiRegularizer(name='sparse_phi_regularizer_background',
                                                       tau=background_tau,
                                                       topic_names=model.topic_names[-background_topics_count:]))
model.regularizers.add(artm.SmoothSparsePhiRegularizer(name='sparse_phi_regularizer_foreground',
                                                       tau=foreground_tau,
                                                       topic_names=model.topic_names[0: -background_topics_count -1]))
model.regularizers.add(artm.SmoothSparseThetaRegularizer(name='sparse_theta_regularizer_background',
                                                         tau=background_tau,
                                                         topic_names=model.topic_names[-background_topics_count:]))
model.regularizers.add(artm.SmoothSparseThetaRegularizer(name='sparse_theta_regularizer_foreground',
                                                         tau=foreground_tau,
                                                         topic_names=model.topic_names[0: -background_topics_count-1]))
model.regularizers.add(artm.DecorrelatorPhiRegularizer(name='decorrelator_phi_regularizer',
                                                       tau=decorrelation_tau))

foreground_topics = model.topic_names[0: -background_topics_count]
background_topics = model.topic_names[-background_topics_count:]
model.fit_offline(train_batch_vectorizer, num_collection_passes=pass_count)
saved_top_tokens = model.score_tracker['top_tokens_score'].last_tokens
result = model.score_tracker['perplexity_score'].last_value
print saved_top_tokens, result

Выдача
0.8.2
Loading train batches...
Batches have been loaded
Loading dictionary...
Dictionary loaded
Widget Javascript not detected.  It may not be installed properly. Did you enable the widgetsnbextension? If not, then run "jupyter nbextension enable --py --sys-prefix widgetsnbextension"
Widget Javascript not detected.  It may not be installed properly. Did you enable the widgetsnbextension? If not, then run "jupyter nbextension enable --py --sys-prefix widgetsnbextension"
Widget Javascript not detected.  It may not be installed properly. Did you enable the widgetsnbextension? If not, then run "jupyter nbextension enable --py --sys-prefix widgetsnbextension"

{} 1023.86988819

При этом:
model.get_phi()



Выдает:
                               topic_0       topic_1       topic_2  \
xxxxxxxxxxxxxxxxxxxxxx    2.176410e-06  6.831331e-08  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    3.880483e-08  0.000000e+00  5.567056e-07   
xxxxxxxxxxxxxxxxxxxxxx    4.182876e-07  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  2.700850e-08  2.053835e-07   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    4.791038e-06  2.341877e-06  4.462831e-06   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    3.486495e-04  5.950908e-04  1.435101e-04   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  3.810325e-09  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    1.387673e-07  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    9.885448e-06  6.776532e-06  3.928494e-07   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  3.107991e-07   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    2.816075e-06  1.750824e-04  1.173249e-04   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  4.080995e-07  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    5.068147e-07  3.172171e-07  1.046817e-07   
xxxxxxxxxxxxxxxxxxxxxx    6.630335e-08  1.282282e-07  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    3.821329e-07  3.680565e-07  8.766111e-08   
xxxxxxxxxxxxxxxxxxxxxx    6.386458e-07  3.464642e-07  1.205484e-06   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    1.418981e-08  1.233779e-08  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxx    0.000000e+00  0.000000e+00  0.000000e+00   
                               ...           ...           ...   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  1.067707e-07  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  1.811339e-07   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  1.596119e-07   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  9.911325e-08   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  2.348565e-10  3.576638e-08  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  9.894556e-10  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  8.510758e-09  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  0.000000e+00  0.000000e+00  0.000000e+00   
xxxxxxxxxxxxxxxxxxxxxxxx  1.924105e-08  0.000000e+00  0.000000e+00   
...
[XXXXXX rows x 20 columns]

--
You received this message because you are subscribed to the Google Groups "bigartm-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bigartm-users+unsubscribe@googlegroups.com.
To post to this group, send email to bigart...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bigartm-users/b7848950-4140-487f-b6be-f4c93c7e0133%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages