Problem with understanding of coherence score results.

64 views
Skip to first unread message

Aleksandr Kuhhar

unread,
May 9, 2019, 2:16:41 AM5/9/19
to bigartm-users
Hello.
I need help with understanding a meaning of coherence scores in BigArtm score tracker.
Every other metrics is working fine, only coherence score output have a weird behavior and i have several questions about it.

1) Average coherence
Why scores is tend to decrease?
Less is better?
Output:
[84, 89, 85, 87, 91, 90, 89, 93, 99, 81, 89, 91, 80, 75, 69, 58, 57, 62, 53, 51, 49, 46, 37, 42, 42, 40, 40, 40, 40, 40]

2) Coherence:
Why here exist metrics only for two topics where is another 10 topics?
Why second topic score list have only 4 values?

e:1
Topic_0 [34, 106, 146, 107, 39, 23, 91, 207]
Topic_1 [77, 69, 34, 75]
e:2
Topic_0 [34, 106, 146, 107, 85, 23, 91, 207]
Topic_1 [77, 69, 34, 94]
.
.
.
.
.
e:29
Topic_0 [0, 0, 117, 0, 37, 9, 96, 43]
Topic_1 [25, 10, 68, 79]
e:30
Topic_0 [0, 0, 117, 0, 37, 9, 96, 43]
Topic_1 [25, 10, 68, 79]


3) Last coherence
Questions are same as in the second part.

Topic_0 [0, 0, 117, 0, 37, 9, 96, 43]
Topic_1 [25, 10, 68, 79]

My data:
BigArtm version is 0.9.0
num_collection_passes=30
num_tokens=8
num_topics=12
 

Dictionary init:


dictionary.gather(data_path=batches_folder ,
                              cooc_file_path
='out/flat_model/cooc_scores/'
                             vocab_file_path
='out/flat_model/vocab.songs_lemma.txt',
                              symmetric_cooc_values
=True)

TopTokensScore init:
model
.scores.add(artm.TopTokensScore(
                            name
='TopTokensScore',
                            class_id
='@default_class',
                            num_tokens
=num_tokens,
                            dictionary
=dictionary,
                            topic_names
=topic_names))




Reply all
Reply to author
Forward
0 new messages