Hello.
I need help with understanding a meaning of coherence scores in BigArtm score tracker.
Every other metrics is working fine, only coherence score output have a weird behavior and i have several questions about it.
1) Average coherence
Why scores is tend to decrease?
Less is better?
Output:
[84, 89, 85, 87, 91, 90, 89, 93, 99, 81, 89, 91, 80, 75, 69, 58, 57, 62, 53, 51, 49, 46, 37, 42, 42, 40, 40, 40, 40, 40]
2) Coherence:
Why here exist metrics only for two topics where is another 10 topics?
Why second topic score list have only 4 values?
e:1
Topic_0 [34, 106, 146, 107, 39, 23, 91, 207]
Topic_1 [77, 69, 34, 75]
e:2
Topic_0 [34, 106, 146, 107, 85, 23, 91, 207]
Topic_1 [77, 69, 34, 94]
.
.
.
.
.
e:29
Topic_0 [0, 0, 117, 0, 37, 9, 96, 43]
Topic_1 [25, 10, 68, 79]
e:30
Topic_0 [0, 0, 117, 0, 37, 9, 96, 43]
Topic_1 [25, 10, 68, 79]
3) Last coherence
Questions are same as in the second part.
Topic_0 [0, 0, 117, 0, 37, 9, 96, 43]
Topic_1 [25, 10, 68, 79]
My data:
BigArtm version is 0.9.0
num_collection_passes=30
num_tokens=8
num_topics=12
Dictionary init:
dictionary.gather(data_path=batches_folder ,
cooc_file_path='out/flat_model/cooc_scores/'
vocab_file_path='out/flat_model/vocab.songs_lemma.txt',
symmetric_cooc_values=True)
TopTokensScore init:
model.scores.add(artm.TopTokensScore(
name='TopTokensScore',
class_id='@default_class',
num_tokens=num_tokens,
dictionary=dictionary,
topic_names=topic_names))