Which corpus is used as reference for NPMI calculation in Gensim?

48 views
Skip to first unread message

Ronald Benz Zhang

unread,
Feb 1, 2023, 9:20:59 AM2/1/23
to Gensim
According to Hoyle et al. (2021) in their paper "Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence", the calculation of coherence scores can vary depending on the reference corpus used.
"As a result, the choice of reference corpus determines the strength of human correlation (Lau et al., 2014; Röder et al., 2015)."
Can you tell me which reference corpus Gensim uses for calculating NPMI? Thank you.

Gordon Mohr

unread,
Feb 1, 2023, 2:53:21 PM2/1/23
to Gensim
No external or fixed 'reference corpus' is used. Users supply their own text corpus to Gensim operations that calculate NPMI, in CoherenceModel or one mode of the `Phrases` functionality.

- Gordon

Ronald Benz Zhang

unread,
Feb 1, 2023, 9:14:01 PM2/1/23
to Gensim
Thank you for your reply Gordon! :)
Reply all
Reply to author
Forward
0 new messages