the doc of gensim.matutils.kullback_leibler is wrong?

46 views
Skip to first unread message

周唐

unread,
Apr 23, 2017, 7:18:33 AM4/23/17
to gensim
gensim.matutils.kullback_leibler(vec1vec2num_features=None)

A distance metric between two probability distributions. Returns a distance value in range <0,1> where values closer to 0 mean less distance (and a higher similarity) Uses the scipy.stats.entropy method to identify kullback_leibler convergence value. If the distribution draws from a certain number of docs, that value must be passed.

but I cal some num bigger than 1, and in wiki:
 
  • The Kullback–Leibler divergence is always non-negative,
{\displaystyle D_{\mathrm {KL} }(P\|Q)\geq 0,\,}
a result known as Gibbs' inequality, with DKL(PQ) zero if and only if P = Q almost everywhere. The entropy H(P) thus sets a minimum value for the cross-entropy H(P,Q), the expected number of bits required when using a code based on Q rather than P; and the Kullback–Leibler divergence therefore represents the expected number of extra bits that must be transmitted to identify a value x drawn from X, if a code is used corresponding to the probability distribution Q, rather than the "true" distribution P.
 Wrong?

Lev Konstantinovskiy

unread,
Apr 26, 2017, 8:40:08 PM4/26/17
to gensim
Hi,

Thanks for the catch. It's indeed an error. Would appreciate a small PR on github.
Reply all
Reply to author
Forward
0 new messages