Hi,
I've noticed a bug in the c_v coherence code. I'm trying to obtain the c_v coherence measure for various lda models I've estimated as follows:
lda_1_c_v = CoherenceModel(model=lda_1, texts=texts, dictionary=dictionary, coherence='c_v')
print (lda_1_c_v.get_coherence())
Unfortunately
I kept getting a KeyError (a 'remark' KeyError, see screenshot attached). I did manage to get
the u_mass coherence, where you need to use the
corpus in the arguments, not texts. The texts I use is a list of the
documents and each document itself is a list of tokens. Hence, it's a
list of lists (the same type as used in the tutorials), so the texts aren't the problem either.
After trying many things (I posted a question here earlier but got no response), I noticed that I was able to obtain the c_v coherence if I did not prune the dictionary anymore. I pruned the dictionary via filter_extremes() and then used compactify().
Is there a way to fix this issue? Otherwise I would need to prune the texts from these extreme words before creating a dictionary and corpus, which is a lot more work than using filter_extremes. I really hope there's a way to use both a pruned dictionary and c_v coherence.
Many thanks in advance for your help!
Best wishes,
Myrthe