get_coherence() is not returning an output . Its been running for hours already

Hithesh Sankararaman M ee22s077

unread,

Oct 6, 2022, 9:19:51 AM10/6/22

to Gensim

Hi ,

My data consist of 1,24,196 sentences .

dataset = [d.split() for d in df_topic_modeling['stopwords_removed_str']]
dictionary = Dictionary(dataset)
corpus = [dictionary.doc2bow(doc) for doc in dataset]
print("Building LDA Multicore model")
lda_multicore_model_using_gensim = LdaMulticore(corpus=corpus, id2word=dictionary, iterations=50, num_topics=5, passes=10)
print("Computing Coherence")
df_topic_modeling['stopwords_removed_str_tokenize']= df_topic_modeling['stopwords_removed_str'].apply(word_tokenize)
cm = CoherenceModel(model=lda_multicore_model_using_gensim,texts=df_topic_modeling['stopwords_removed_str_tokenize'],corpus=corpus, dictionary=dictionary, coherence='c_v')
coherence_lda = cm.get_coherence()
print('\nCoherence Score: ', coherence_lda)

I am trying to get the coherence score for my LDA multicore model. But coherence score is not obtained. Please help

Gordon Mohr

unread,

Oct 9, 2022, 4:31:44 PM10/9/22

to Gensim

Are you sure it's the call to `.get_coherence()` that's hung? (If so, how did you determine that? Where is this code running?)

Does the system/Python-process seem to be keeping the CPU busy while you're waiting? (If not, there was probably some unreported process-ending error/crash.)

Do you have logging on, to at least the INFO level, to observe progress leading-up to the problem point?

Does "1,24,196 sentences" mean ~124k separate texts? About how long is each individual text?

- Gordon

Hithesh Sankararaman M ee22s077

unread,

Oct 10, 2022, 5:23:38 AM10/10/22

to gen...@googlegroups.com

1) I am using spyder .

Code :-

# I will apply the Dictionary Object from Gensim, which maps each word to their unique ID:

dataset = [d.split() for d in df_topic_modeling['stopwords_removed_str']]
dictionary = Dictionary(dataset)
corpus = [dictionary.doc2bow(doc) for doc in dataset]
print("Building LDA Multicore model")
lda_multicore_model_using_gensim = LdaMulticore(corpus=corpus, id2word=dictionary, iterations=50, num_topics=5, passes=10)
print("Computing Coherence")
df_topic_modeling['stopwords_removed_str_tokenize']= df_topic_modeling['stopwords_removed_str'].apply(word_tokenize)

cm = CoherenceModel(model=lda_multicore_model_using_gensim,texts=df_topic_modeling['stopwords_removed_str_tokenize'], coherence ='c_v')
print("Coherence model built")
print('\nCoherence Score: ', cm.get_coherence())

Output :-

Please see screenshot attached.

When i use "u_mass" , the output is obtained quickly, but when i use "u_v" , its running to an infinite loop.I didnt get the output for "cm.get_coherence()" Can you check the code ?

--
You received this message because you are subscribed to a topic in the Google Groups "Gensim" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gensim/TB7rVqYd6RY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gensim+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gensim/4f274d99-6951-4359-a0bf-a281615863afn%40googlegroups.com.

Screenshot from 2022-10-10 14-48-48.png

Gordon Mohr

unread,

Oct 11, 2022, 2:44:44 PM10/11/22

to Gensim

(1) Does the system/Python-process seem to be keeping the CPU busy while you're waiting? You could use a system-specific tool like `top` or `Activity Monitor` or `Task Manager` to check this. If not, there was probably some unreported process-ending error/crash/deadlock, rather than an "infinite loop".

(2) Do you have logging on, to at least the INFO level, to observe progress leading-up to the problem point?

(3) Does "1,24,196 sentences" mean ~124k separate texts? About how long is each individual text?

(4) What's shown if you interrupt the seemingly-hung step with CTRL-C?

- Gordon

Hithesh Sankararaman M ee22s077

unread,

Oct 12, 2022, 3:17:49 AM10/12/22

to gen...@googlegroups.com

Sorry . The issue was on my side. Since there were empty lines in my text after pre-processing, i didnt get the output. Once I deleted the empty lines , I got the coherence score.

Thank you so much .

To view this discussion on the web visit https://groups.google.com/d/msgid/gensim/19783688-f0be-4c96-88b8-8edb35c61311n%40googlegroups.com.

Gordon Mohr

unread,

Oct 12, 2022, 3:09:19 PM10/12/22

to Gensim

Glas to hear, you're welcome!

If in fact the Gensim code is so fragile that a mere empty text/line inside a corpus is enough to cause a hang on a common operation, that's something we might want to armor against. So if you happen to have a small self-contained example of triggering that, it'd be interesting to consider it as an example of something to improve.

- Gordon

Reply all

Reply to author

Forward