I have a question about Gensim's CoherenceModel

43 views
Skip to first unread message

Emil Rijcken

unread,
Nov 22, 2021, 10:25:37 AM11/22/21
to gen...@googlegroups.com
Hi, working with the Gensim package, I am encountering a problem that I haven't had before: 

I have a variable, data_words, which is my corpus and is a list of lists of strings (tokens).

Also, I have a variable topics, a list of list of strings (tokens).

Now, I want to find the 'c_v' score for my topics. To do so, I run the following code:


``` import gensim.corpora as corpora

from gensim.models.coherencemodel import CoherenceModel

id2word = corpora.Dictionary(data_words)
corpus = [id2word.doc2bow(text) for text in data_words]
coherence_score = CoherenceModel(topics=topics,
                          texts = data_words, 
                          corpus= corpus, 
                          dictionary= id2word, 
                          coherence= 'c_v',  

                          topn=20).get_coherence() ```

However, when I run the above, I get the following errors:

```Traceback (most recent call last):


  File "C:\Users\20200016\Anaconda3\lib\site-packages\gensim\models\coherencemodel.py", line 448, in _ensure_elements_are_ids
    return np.array([self.dictionary.token2id[token] for token in topic])

  File "C:\Users\20200016\Anaconda3\lib\site-packages\gensim\models\coherencemodel.py", line 448, in <listcomp>
    return np.array([self.dictionary.token2id[token] for token in topic])

KeyError: 'afgelopen'


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "<ipython-input-570-8aef06174d6c>", line 1, in <module>
    coherence_score = CoherenceModel(topics=topics,

  File "C:\Users\20200016\Anaconda3\lib\site-packages\gensim\models\coherencemodel.py", line 215, in __init__
    self.topics = topics

  File "C:\Users\20200016\Anaconda3\lib\site-packages\gensim\models\coherencemodel.py", line 430, in topics
    topic_token_ids = self._ensure_elements_are_ids(topic)

  File "C:\Users\20200016\Anaconda3\lib\site-packages\gensim\models\coherencemodel.py", line 451, in _ensure_elements_are_ids
    return np.array([self.dictionary.token2id[token] for token in topic])

  File "C:\Users\20200016\Anaconda3\lib\site-packages\gensim\models\coherencemodel.py", line 451, in <listcomp>
    return np.array([self.dictionary.token2id[token] for token in topic])

  File "C:\Users\20200016\Anaconda3\lib\site-packages\gensim\models\coherencemodel.py", line 450, in <genexpr>
    topic = (self.dictionary.id2token[_id] for _id in topic)
 

KeyError: 'lamp' ```


The error indicates that I am passing anstrwhere I should have passed an id. However, the variables and variables types align with the formats described in the documentation.

What can I do to get the coherence scores?

Reply all
Reply to author
Forward
0 new messages