Evaluation with Topic Coherence umass // which is the best model?

3,600 views
Skip to first unread message

Nils Denter

unread,
Jun 26, 2018, 7:47:18 AM6/26/18
to gensim
Hello guys,

I did a evaluation of my models to choose the best k value for my topics. For evaluation I choosed the method Topic Coherence with umass.

The following graph was the output:


I assume to choose the "lowest" value, so k=60 topics?

Thanks in advance.

Best regards,
Nils

Hiba Aleqabie

unread,
Jun 26, 2018, 11:05:09 AM6/26/18
to gensim
hello
as I read, the one which has the high value of coherence is the k. whenever there is a peak. in your figure peak between 0-20 is the one.
regards

Nils Denter

unread,
Jun 26, 2018, 11:37:39 AM6/26/18
to gensim
Hello,

are you sure with that? I thought the lower the value the better the coherence??
Any other opinions on that topic?

Hiba Al-eqabie

unread,
Jun 26, 2018, 1:09:25 PM6/26/18
to gen...@googlegroups.com
Hello sir,
Yes Iam. I'm working on topic modeling too...
You may google it!!
You can use perplixity also...but it is the opposite seeks for the lowest values ...
You can check this link


Good luck
Regards
Hiba

--
You received this message because you are subscribed to a topic in the Google Groups "gensim" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gensim/Ybja9B15F1E/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gensim+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Derrick Wang

unread,
Jun 21, 2019, 12:23:32 AM6/21/19
to Gensim
topic coherence with umass decrease when number of topics increase. Topic coherence with c_v goes up when number of topic increases. I guess the 'c_v' is the one should be used instead of 'umass'.

Ryan Boch

unread,
Jun 21, 2019, 2:30:17 PM6/21/19
to Gensim
You can use either umass or c_v. Best coherence for umass is typically the minimum. Best coherence for c_v is typically the maximum. Umass is faster than c_v, but in my experience c_v gives better scores for optimal number of topics. This is not a hard decision rule. It depends on the use case. If you're evaluating topics for human readability you would probably want to compare a few models with low umass to see how the top keywords look with something like pyLDAvis. Vice versa for c_v.


On Tuesday, June 26, 2018 at 6:47:18 AM UTC-5, Nils Denter wrote:

Derrick Wang

unread,
Jun 21, 2019, 5:48:22 PM6/21/19
to gen...@googlegroups.com
Thanks Ryan.

--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gensim/5548dd26-ce3a-4f78-abd7-6553f3ac15db%40googlegroups.com.

Y. Zhou

unread,
Aug 23, 2019, 5:21:15 AM8/23/19
to Gensim
If typically the best coherence scores of umass and c_v are minimum and maximum numbers of topics, then how can these coherence metrics help us pick the best number of topics. 
Their results are conflicting to each other. If I only use umass, then I choose 1 topic, c_v the more the better.
I think we are expecting a wave crest on the plot (topic coherence vs. number of topics). The crest is the point has the best coherence.  
Reply all
Reply to author
Forward
0 new messages