I have a variable, data_words
, which is my corpus and is a list of lists of strings (tokens).
Also, I have a variable topics
, a list of list of strings (tokens).
Now, I want to find the 'c_v' score for my topics. To do so, I run the following code:
``` import gensim.corpora as corpora
from gensim.models.coherencemodel import CoherenceModel
id2word = corpora.Dictionary(data_words)
corpus = [id2word.doc2bow(text) for text in data_words]
coherence_score = CoherenceModel(topics=topics,
texts = data_words,
corpus= corpus,
dictionary= id2word,
coherence= 'c_v',
topn=20).get_coherence() ```
However, when I run the above, I get the following errors:
```Traceback (most recent call last):
File "C:\Users\20200016\Anaconda3\lib\site-packages\gensim\models\coherencemodel.py", line 448, in _ensure_elements_are_ids
return np.array([self.dictionary.token2id[token] for token in topic])
File "C:\Users\20200016\Anaconda3\lib\site-packages\gensim\models\coherencemodel.py", line 448, in <listcomp>
return np.array([self.dictionary.token2id[token] for token in topic])
KeyError: 'afgelopen'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<ipython-input-570-8aef06174d6c>", line 1, in <module>
coherence_score = CoherenceModel(topics=topics,
File "C:\Users\20200016\Anaconda3\lib\site-packages\gensim\models\coherencemodel.py", line 215, in __init__
self.topics = topics
File "C:\Users\20200016\Anaconda3\lib\site-packages\gensim\models\coherencemodel.py", line 430, in topics
topic_token_ids = self._ensure_elements_are_ids(topic)
File "C:\Users\20200016\Anaconda3\lib\site-packages\gensim\models\coherencemodel.py", line 451, in _ensure_elements_are_ids
return np.array([self.dictionary.token2id[token] for token in topic])
File "C:\Users\20200016\Anaconda3\lib\site-packages\gensim\models\coherencemodel.py", line 451, in <listcomp>
return np.array([self.dictionary.token2id[token] for token in topic])
File "C:\Users\20200016\Anaconda3\lib\site-packages\gensim\models\coherencemodel.py", line 450, in <genexpr>
topic = (self.dictionary.id2token[_id] for _id in topic)
KeyError: 'lamp' ```
The error indicates that I am passing anstr
where I should have passed an id. However, the variables and variables types align with the formats described in the documentation.
What can I do to get the coherence scores?