Different results in LdaModel and LdaMulticore

rboc...@gmail.com

unread,

Apr 17, 2019, 12:15:33 PM4/17/19

to Gensim

I was experimenting on LdaMulticore and noticed the topics are not nearly as good as when I use LdaModel. Any particular reason this would happen? I used a random_state on both LdaModel and LdaMulticore, but otherwise the hyper-parameters are set as default.

David G

unread,

Aug 8, 2021, 3:01:18 AM8/8/21

to Gensim

I am seeing the same thing. I have a corpus of 14,000 documents and a vocabulary of 81,000 words.

lda_model_serial = gensim.models.ldamodel.LdaModel(corpus=word_frequency_map,

#lda_model_parallel = gensim.models.ldamulticore.LdaMulticore(corpus=word_frequency_map,

id2word=id2word,

num_topics=10,

random_state=100,

chunksize=100,

passes=8,

alpha='symmetric',

)

When I subjectively look at the results in pyLDAvis, serial is better. Also, the serial model gives a c_v coherence score of 0.6 on the training corpus. The parallel model gives 0.52.

Anyone have an idea why LdaMulticore doesn't give as good results as LdaModel?

Thanks,

David

Gordon Mohr

unread,

Aug 10, 2021, 12:29:38 PM8/10/21

to Gensim

Perhaps: the extra overhead of splitting/merging results across processors means the same number of passes don't provide as much actual model convergence.

What happens if you increase the `passes` in the `LdaMulticore` case? Can you match your quality-evaluations of the `LdaModel`, while still spending less wall-clock-time overall?

- Gordon

Reply all

Reply to author

Forward