Entropy of a function of a random variable

Golabi Doon

unread,

Dec 22, 2009, 6:58:03 AM12/22/09

to

Hello everyone,

Simple entropy question... Given a random variable X with entropy H
(X), and given a function f(.). Is there a relationship between H(X)
and H(f(X))?

I found an answer in Thomas Cover's book: H(X)>=H(f(X)).

But I don't know why, because I can think of counter examples. For
example, say X has normal distribution. Thus H(X)=0.5*log
(2*pi*e*sigma^2).

Now define f(.) simply as y=f(x)=ax. Thus random variable Y has also a
normal distribution with variance (a*sigma)^2. So H(Y)=0.5*log(2*pi*e*
(a*sigma)^2) . It seems when a>1 then H(Y)>H(X) and when 0<a<1 H(Y)<H
(X).

I appreciate if you let me know why what I see here does not match
Cover's book?

Regards

Golabi

Jussi Piitulainen

unread,

Dec 22, 2009, 9:02:46 AM12/22/09

to

Golabi Doon writes:

> Simple entropy question... Given a random variable X with entropy H
> (X), and given a function f(.). Is there a relationship between H(X)
> and H(f(X))?
>
> I found an answer in Thomas Cover's book: H(X)>=H(f(X)).

As far as I can see, Cover and Thomas make that statement for a
discrete random variable X only. Its proof is exercise 5 in chapter 2
in the 1991 edition of their Elements of Information Theory. You may
have the newer edition or some other book in mind.

> But I don't know why, because I can think of counter examples. For
> example, say X has normal distribution. Thus H(X)=0.5*log
> (2*pi*e*sigma^2).

A normally distributed random variable is not discrete.

Cover and Thomas (1991) define "differential entropy" h(X) for a
continuous random variable X, analogous to H(Y) for a discrete Y. They
warn in the beginning of chapter 9 that "there are some important
differences". I think this is one of the important differences. And I
don't see any statement about h(X) and h(f(X)) in the book.

[...]

Golabi Doon

unread,

Dec 22, 2009, 10:06:45 AM12/22/09

to

On Dec 22, 8:02 am, Jussi Piitulainen <jpiit...@ling.helsinki.fi>
wrote:

> Cover and Thomas (1991) define "differential entropy" h(X) for a
> continuous random variable X, analogous to H(Y) for a discrete Y. They
> warn in the beginning of chapter 9 that "there are some important
> differences". I think this is one of the important differences. And I
> don't see any statement about h(X) and h(f(X)) in the book.

Thanks, you are right, H(X)>=H(f(X)) only holds for discrete case, and
not for continous random variables.

But this is getting interesing now... For discrete X, H(X)>=H(f(X))
says that any fixed mapping f(.) from the original random variable X
to some other random variable Y can possibly kill some information,
but cannot create information. This makes complete sense to me.

Now... If I have a continous random variable, I think it is rational
to expect the same statement to hold (forget about entropy or
differential entropy, and just think of it as a question about
information). Thus, again, Y=f(X) on a continous random variable X may
kill some information, but should not create information.

Under this belief, is there any functional that can take a density
function (of a continious variable) as input, and then output a number
that indicates the amount of information and yet respect the
constraint of killing but not creating information, similar to the
discrete entropy?

Thanks

Golabi

Graham Jones

unread,

Dec 22, 2009, 2:03:48 PM12/22/09

to

"Golabi Doon" <golab...@gmail.com> wrote in message
news:f74b1ab5-ccd9-4ac0...@d21g2000yqn.googlegroups.com...

Under this belief, is there any functional that can take a density
function (of a continious variable) as input, and then output a number
that indicates the amount of information and yet respect the
constraint of killing but not creating information, similar to the
discrete entropy?

**********************************************

Golabi Doon

unread,

Dec 23, 2009, 12:23:17 AM12/23/09

to

On Dec 22, 1:03 pm, "Graham Jones" <x...@x.x> wrote:
> You might be interested in this
>
> http://en.wikipedia.org/wiki/Hirschman_uncertainty
>

Thank you Graham for the pointer. I had a look at the definition, and
as I understand, it gives the following inequality for a density
function p(x):

H(p)+H(q)>=log(e/2)

Where q(x)=F{sqrt(p(x))}^2 and F{} is the Fourier transform.

However, I do not see its connection to the original question..
specifically, if I have a random variable X with density p(x), where
is the location of the arbitrary function f(.) in this formula?

Thanks

Golabi

Graham Jones

unread,

Dec 23, 2009, 4:42:31 AM12/23/09

to

"Golabi Doon" <golab...@gmail.com> wrote in message

news:dc2e0773-3d63-40f0...@r5g2000yqb.googlegroups.com...

H(p)+H(q)>=log(e/2)

*************************

It was just a suggestion, and the following is based on memory. I am no
expert on QM.

What I had in mind was that the minimum total uncertainty (for position and
momentum together) is achieved when the wave packet is a complex gaussian,
when the densities in position space and in momentum space are both normal
[If I remember correctly there is a theorem which says this.] If you stretch
(x -> ax) in position space you squeeze in momentum space so the uncertainty
remains constant. And I think that any transformation which converts a
normal into something else will increase uncertainty.

Graham