Entropy Definition : Request for Comments

LawCounsels

unread,

Mar 13, 2012, 4:40:42 AM3/13/12

to

A useful Entropy Definition for variable length N bits produced by
source :

SUMS [ entropy of N bits * % probability of source producing N
bits ] where N = 1 to arbitrary large # of bits produced by source

is this right ?

LawCounsels

Thomas Richter

unread,

Mar 13, 2012, 5:13:46 AM3/13/12

to

No. Entropy is the entropy of a random process. If the random process
generates symbols X_1,X_2,...X_n,... etc, in an alphabet \Omega of size
M, the N-block entropy is defined as

H_N(X) = - \sum{i_1...i_n=1}^M p(x_{i_1},...,x_{i_n}) log_2
p(x_{i_1},...x_{x_n})

that is, you look at the average information content of blocks of size N
where you sum over all possible blocks of this size, with their
probability times log_2 of their probability.

One can then see (with not too much problem) that H_N is between H_1,
the one-block entropy, and N*H_1.

Then, the entropy-rate is given as

h(X) = \lim_{n->\infty} H_n / n

that is, the average number of bits to encode state of an n-block, per
symbol in that block, for the block-size going to infinity.

For example, if the source is an iid source, then H_n = n*H_1, the
one-block entropy, and thus h(X) = H_1(X). Similar simple considerations
hold for Markov sources of n-th order where the limit is actually
reached at the size of the memory of the source.

Entropy-rate is in so far relevant as the Shannon theorems talk about
this object (or its closely related co-information-rate).

Greetings,
Thomas

LawCounsels

unread,

Mar 13, 2012, 1:03:16 PM3/13/12

to

Thanks for the concise current definition .... highlighting the need
for Entropy to always measured at present referencing
a particular chosen fixed Block of size N , as the Shannon theorems
talk about this object

some initial comments here :

[ 1 ] seems like SUMS [ entropy of N bits * % probability of source

producing N bits ] where N = 1 to arbitrary large # of bits produced

by source ALSO fits perfect with the concise current definition
( which defined & fits the term "entropy of N bits" within SUMMATION
here ) .... where the source is as you mentioned iid then its perfect
fit

[ 2 ] in addition to [ 1 ] above , this SUMMATION definition is more
general ALLOWS the source to eg 1st 'randomly'
( based on some probability a variable length N chosen ) choose to
output a certain length N [ & immediate STOPS , not continuously
output further symbols ... of course it may also choose to output N
arbitrary large # of symbols ] , & then based on this 'random' chosen
output length N to use various associated distinct different source
probability .... eg length of eg 16 would use 60/40 weighted coins
tosses , length of 32 uses 70/30

It appears Shannon's work did not encompass scenario [2] above , only
deals with continuously 'random' producing source output probabilities

LawCounsels

.

Thomas Richter

unread,

Mar 13, 2012, 2:25:49 PM3/13/12

to

> ( which defined& fits the term "entropy of N bits" within SUMMATION

> here ) .... where the source is as you mentioned iid then its perfect
> fit
>
> [ 2 ] in addition to [ 1 ] above , this SUMMATION definition is more
> general ALLOWS the source to eg 1st 'randomly'
> ( based on some probability a variable length N chosen ) choose to
> output a certain length N [& immediate STOPS , not continuously
> output further symbols ... of course it may also choose to output N

> arbitrary large # of symbols ] ,& then based on this 'random' chosen

> output length N to use various associated distinct different source
> probability .... eg length of eg 16 would use 60/40 weighted coins
> tosses , length of 32 uses 70/30
>
> It appears Shannon's work did not encompass scenario [2] above , only
> deals with continuously 'random' producing source output probabilities

You don't understand what "source" means. A source is a black box that
generates an infinite stream of symbols from an alphabet, where each
symbol is chosen at random, with a probability that depends possibly on
the symbol, and possibly on the past symbols generated, i.e. the source
may have a memory, and it may also depend on how many symbols it
generated, i.e. probabilities may change over time.

It is a common misconception that you can "observe" probabilities. You
can't. Probabilities are *defined* into the model. Just because in the
last ten dice rolls you got ten sixes does not mean that the probability
of the six is one. Nor does it mean that the next symbol must be
something else but six to get "correct probabilities".

Shannon talks about *models* of systems, and these systems have
probabilities you know. It does not talk about how you *get*
probabilities. A model of a dice with p(6)=1 does have an entropy (of
zero), and you can compute it, and it is "your model". It might be a bad
model, though, but that's nothing the theorems care or describe.

Sure enough, in a real system, you only have symbols, and you may try or
should fit a probabilistic model to such a real source to apply Shannon.
But it is outright *wrong* to say that you can replace "probabilities"
by "relative frequences" and then compute "the entropy of the sequence".
Such a thing does not exist nor does it make sense. Entropy is always
relative to a *model*, and not to a specific source. You can compute the
entropy of a dice-model for which you defined(!) (yes really!) the
probabilities, but you *cannot* compute the entropy of "the green dice
on my desk I rolled 100 times and wrote down the number of pips". The
latter makes no sense.

A model is not a reality. It is a model *of* reality. Of course, the
better the model describes the properties of your *real* source, the
more you can expect a compression algorithm to work on your source, but
Shannon theorems make absolutely no statement on how to model sources.
The theorems *apply* only to models. Not to real sequences.

So long,
Thomas

LawCounsels

unread,

Mar 13, 2012, 2:56:54 PM3/13/12

to

yes ... needs only now precise restate as model ( not source ) ....
the comments then holds true :

A useful Entropy Definition for variable length N bits produced by

Model for which the probabilities of length N symbols produced &
immediate STOP was defined :

SUMS [ entropy of N bits * % probability of source producing N

bits ] where N = 1 to arbitrary large # of bits produced by Model

Thomas Richter

unread,

Mar 13, 2012, 3:38:53 PM3/13/12

to

Once again: A source does not have a length. A source creates an
*infinite* stream of symbols.

lawco...@gmail.com

unread,

Mar 14, 2012, 4:43:33 AM3/14/12

to

yes ... its now corrected says Model

lawco...@gmail.com

unread,

Mar 14, 2012, 4:46:19 AM3/14/12

to

> yes ... its now corrected says Model

A useful Entropy Definition for variable length N bits produced by
Model for which the probabilities of length N symbols produced &
immediate STOP was defined :

SUMS [ entropy of N bits * % probability of Model producing N

Thomas Richter

unread,

Mar 14, 2012, 7:09:33 AM3/14/12

to

And once again, the answer is "no" because it depends on your model.
Just saying "variable length N bits" does not specify precisely enough
what are you looking for because "variable length string" is nothing
that is already well-defined.

Just to give you an example how one can model "variable length" strings:

Model 1) A variable length string on an alphabet \Omega is a string
consisting of an alphabet \Omega'=\Omega \cup {EOF} (EOF = EOF-Symbol)
and an equivalence relation ~ such that string s_1 from \Omega' and s_2
from \Omega' are equivalent under the conditions that

s_1(i) = s_2(i) for all i<=i_n where i_n is the first position in s such
that s_1(i_n) = s_2(i_n) = EOF.

In other words, you consider strings up to an additional EOF symbol and
do not care about the strings beyond this. This is certainly one model
how to think about variable length strings.

Model 2) A variable length string is an element of \Omega^N, where N is
a random variable, and \Omega is an alphabet. A probability distribution
on random length strings is given by probability distributions
p_1,p_2,...,p_n where p_i is defined on \Omega^i. The probability of
getting a string s of length N consisting of symbols a_1,...a_n is given
by \sum_n q(n) p_n(a_1....a_n) where q is the probability distribution
on the string lengths.

Here, in this model you first roll a dice to define the length of the
string, and then roll dices to fill in the method.

It is not ad-hoc clear which model you are talking about (or possibly
about a third model, or whether these models are probably identical), so
what the probability distributions are. Hence, it is completely unclear
what "entropy" should be in this case unless you are more specific how
you arrive at the length and at the symbols.

IOW, to repeat this again: Your question doesn't make sense. Please be
more specific which type of model you have in mind to generate "variable
length strings", and depending on that, an entropy definition might be
given. Without further information, all is up to speculation.

LawCounsels

unread,

Mar 14, 2012, 11:52:29 AM3/14/12

to

On Mar 14, 11:09 am, Thomas Richter <t...@math.tu-berlin.de> wrote:

In the 1st instance the formula should be generally applicable to any
models without restrictions
whatsoever , including ALL those presently ample described by
Shannon's fixed N-Block ...
also the Model 1) & Model 2) described OR any other Models anybody
could conceiveable
comes up with ( hopefully which also can be 'economic' interesting )

Otherwise the formula is not a valid general formula in the 1st place

lawco...@gmail.com

unread,

Mar 14, 2012, 2:34:23 PM3/14/12

to

An interesting related phenoma would be to ask :

if complete random binary file of length N bits were to be REVERSIBLE 'transformed' / 'randomised' / 'jumbled' by any methods ,
will the resultant 'jumbled' of exact same N bits EXHIBIT any particular
patterns / restrictions on any portion of the resultant file BEYOND those normally associated with 'random' file ( like the 1Million Random Digits file )

Thomas Richter

unread,

Mar 14, 2012, 5:10:35 PM3/14/12

to

On 14.03.2012 16:52, LawCounsels wrote:

>> IOW, to repeat this again: Your question doesn't make sense. Please be
>> more specific which type of model you have in mind to generate "variable
>> length strings", and depending on that, an entropy definition might be
>> given. Without further information, all is up to speculation.
>
> In the 1st instance the formula should be generally applicable to any
> models without restrictions
> whatsoever ,

IOW, your question does not make sense.

> including ALL those presently ample described by
> Shannon's fixed N-Block ...

To apply Shanon's lossless or lossy coding theorem, you must have a
random source. And this means infinite strings. Entropy can be defined
on a finite collection of symbols, and is defined on blocks of size N or
- to put this differently - on probability spaces that are the N-fold
tensor product of a base probability space.

> also the Model 1)& Model 2) described OR any other Models anybody

> could conceiveable
> comes up with ( hopefully which also can be 'economic' interesting )
>
> Otherwise the formula is not a valid general formula in the 1st place

Ok, here is the formula: Let \Omega be a probability space and p a
probability measure on \Omega, then

H = -\sum_{a \in \Omega} p(a) \log_2 p(a)

This is the general definition. If your probability space is \Omega^N =
\Omega \otimes \Omega ... \otimes \Omega, then you get the N-block
entropy back by substitution in the sense I gave above.

Now define what you understand as "probability space of variable length
strings" and we are back in business.

If this answer does not help, please think about what your question
means. Once again, without having a probability space, there is no
entropy. There is no more specific answer *unless* you make your spaces
more concrete. If you don't want to do this, then the above *is* the
correct answer, but probably unhelpful for your purposes.

Did this help? Probably not, but there is nothing else to say then.

So long,
Thomas

Thomas Richter

unread,

Mar 14, 2012, 5:16:47 PM3/14/12

to

On 14.03.2012 19:34, lawco...@gmail.com wrote:

>
> An interesting related phenoma would be to ask :
>
> if complete random binary file of length N bits were to be REVERSIBLE 'transformed' / 'randomised' / 'jumbled' by any methods ,
> will the resultant 'jumbled' of exact same N bits EXHIBIT any particular
> patterns / restrictions on any portion of the resultant file BEYOND those normally associated with 'random' file ( like the 1Million Random Digits file )

This is not exactly related. What is "exact the same N bits". If the
bits are exactly the same, the file is identical, so there is nothing to
say. If the transformation is reversible, then any reversible transform
on a finite set is just a permutation on the base space, and as the
entropy is a sum over the full base space, any permutation doesn't
change this sum at all - IOW, the entropy is invariant under reversible
transformations of a base space into itself. [As intuition also
suggests, of course].

So this is not exactly "interesting". The answer is immediate, and
exactly what is expected.

More interesting things happen if you map this to a smaller set, say,
you consider for example sums of random variables. Then, under mild
constraints of the source, you get something called the "central limit
theorem", i.e. your distribution approaches Gaussians (or stable
distributions under less constrained conditions).

Greetings,
Thomas

lawco...@gmail.com

unread,

Mar 15, 2012, 2:29:38 AM3/15/12

to

we are both on the right wavelengths here

myself would 100 times out of 100 immediate says the exact same you did here without hesitations ... EXCEPT the INVARIABLE REVERSIBLE resultant transformed file of same N bits long ALWAYS GIVES sizeable portions with unmistakable particularly strong patterns / restrictions ( UNMISTAKABLY UNAMBIGUOUS LOW ENTROPY bits sequences at unmistakable unambiguous ascertainable 'start' / 'end' bits positions .... mathematics proven instantiated with the described Entropy Definition formula on ALL real transformed resultant files )

LawCounsels

Thomas Richter

unread,

Mar 15, 2012, 7:10:51 AM3/15/12

to

On 15.03.2012 07:29, lawco...@gmail.com wrote:

>> So this is not exactly "interesting". The answer is immediate, and
>> exactly what is expected.
>>
>> More interesting things happen if you map this to a smaller set, say,
>> you consider for example sums of random variables. Then, under mild
>> constraints of the source, you get something called the "central limit
>> theorem", i.e. your distribution approaches Gaussians (or stable
>> distributions under less constrained conditions).
>>
>> Greetings,
>> Thomas
>
> we are both on the right wavelengths here
>
> myself would 100 times out of 100 immediate says the exact same you did here without hesitations ... EXCEPT the INVARIABLE REVERSIBLE resultant transformed file of same N bits long ALWAYS GIVES sizeable portions with unmistakable particularly strong patterns / restrictions ( UNMISTAKABLY UNAMBIGUOUS LOW ENTROPY bits sequences at unmistakable unambiguous ascertainable 'start' / 'end' bits positions .... mathematics proven instantiated with the described Entropy Definition formula on ALL real transformed resultant files )

Look, what people often confuse is "has many obvious patterns" = "has
low entropy". This is simply wrong. A specific outcome of a random
source may very well have "obvious patterns", but might still have a low
probability.

For example, let Y be a binary random iid source with p(0)=7/8 and
p(1)=1/8. Let z_k be the n-th digit in the binary expansion of Pi, or to
be precise, let z_k = floor(\pi * 2^k) mod 2. Then define x_k = y^k xor
z_k. Then, the outcome (realization) of z_k being 0000...000000 is
surely very unlikely, yet has an obvious pattern, and the entropy of Y
and the entropy of X are identical. (And the entropy of Z is zero
because it is not a random process...).

This "paradox" stems from the fact that sources humans like you or me
care about only care about reduntant sources, i.e. the random sources we
look at (newspapers, photos, audio recordings) are highly redundant
*and* contain many patterns. That is, for the sources we care about, we
have "low entropy" = "many patterns". But this is simply false in
general. Or to be even more concrete "having a pattern" is not a
well-defined term, nor does it have any relation to entropy. It *only*
(or should I write "only"?) has a relation to entropy for those sources
we as human observers care about. Sources with patterns allow "simple
models", but just because the models are simpler that allow you to
consider them as low-entropy random sources does not mean that there are
low-entropy sources that do not show obvious patterns. In fact, in a
precise way, *most* low-order entropy sources are like that (i.e. do not
show obvious patterns). Unfortunately, even these "most low order
entropy sources" (with non-obvious patterns) are still a very rare
species in the set of all sources, rare enough to disallow any escape
from the counting argument.

So long,
Thomas

lawco...@gmail.com

unread,

Mar 15, 2012, 9:23:23 AM3/15/12

to

these are different, the strong patterns here mathematics proven instantiated with the described Entropy Definition formula on ALL real transformed resultant files )

Thomas Richter

unread,

Mar 15, 2012, 11:45:38 AM3/15/12

to

> these are different, the strong patterns here mathematics proven instantiated with the described Entropy Definition formula on ALL real transformed resultant files )

This not English is sentence, not understand do I.

jacko

unread,

Mar 15, 2012, 7:27:52 PM3/15/12

to

I suppose some of the other thread is trying to teach the difference between a infinite central limit distribution and the distribution formed by finite sampling from it.

In this sense the expectation of different quantities could be different. The entropy (or expected self information) E(I) may vary depending on the distribution statistics.

jacko

unread,

Mar 15, 2012, 7:40:38 PM3/15/12

to

Entropy is the expected self information of a source or probabilistic event producer.

"When it comes to science, there is only one way to do it. Or that was what the experiment demonstrated!" - post-modern 21st century dogma, developed by the church of the one true budget and light.

jacko

unread,

Mar 15, 2012, 8:14:35 PM3/15/12

to

On Thursday, March 15, 2012 11:40:38 PM UTC, jacko wrote:
> Entropy is the expected self information of a source or probabilistic event producer.
>
> "When it comes to science, there is only one way to do it. Or that was what the experiment demonstrated!" - post-modern 21st century dogma, developed by the church of the one true budget and light.

Another part of the sci-fi script is the english agent holding a passport made of the american agent while sitting in front of the criminal entry database, while saying "it depends if your wanted in this country."

And the american agent phones to ask for permission to chat about "the censorship of data versus the faux product." just as the french maid french-italian made "being made" hiding agent walks in with a beer.

jacko

unread,

Mar 15, 2012, 8:24:38 PM3/15/12

to

a nice titles would be right brothers to current video of brief history of flight, and then move in for a close flash "apparent crash" landing of UFO irrevocable event 1 as it becomes known in scene 1.

LawCounsels

unread,

Mar 20, 2012, 9:08:20 PM3/20/12

to

On Mar 15, 1:23 pm, lawcouns...@gmail.com wrote:
> On Thursday, March 15, 2012 11:10:51 AM UTC, Thomas Richter wrote:

each of these many 'start'/ 'end' sequences each consists of ternary
symbols a b c
only condition here is total# of c ALWAYS exact = total# b + 2 &
this
occurs only at EOSequence , ie at any stage IF progressive total#
c
becomes = progressive # b + 2 then a sequence ends .
[ total# a unrestricted , but usually around same as total# b BUT
this is
irrelevant ]

The entropy of all these many 'start'/'end' sequences together with
total N symbols ( total # a + total # b +
total # c = N , which is N * 1.5 binary bits ) OBVIOUS can be
practical encoded smallest possible than N * 1.5 binary bits

see :
http://groups.google.com/group/comp.compression/browse_thread/thread/43cbd7738ca13fa9/f92dfa3abf1d7edd?hl=en&lnk=raot#f92dfa3abf1d7edd

LawCounsels

LawCo...@aol.com

unread,

Mar 31, 2012, 1:30:00 PM3/31/12

to

On Friday, March 16, 2012 12:24:38 AM UTC, jacko wrote:
> a nice titles would be right brothers to current video of brief history of flight, and then move in for a close flash "apparent crash" landing of UFO irrevocable event 1 as it becomes known in scene 1.

Mankind's centuries old Kraft's Inequality / Pigeonholes hurdles now comes to this :

can each of these many 'start'/ 'end' sequences ( each consists of ternary

symbols a b c only condition here is total# of c ALWAYS exact = total# b + 2 &
this occurs only at EOSequence , ie at any stage IF progressive total#

c becomes = progressive # b + 2 then a sequence ends .... total# a unrestricted , but usually around same as total# b BUT this is irrelevant ) be better represented encoded 'shorter'.... take care the encoded bitstring MUST be self-delimiting meaning unambiguous as to where the encoded bitsstring ends ( subsequent follows / merged with other bits , needs able distinguish this boundary from the encoded bitstring itself )

The entropy of all these many 'start'/'end' sequences together with

total N symbols ( total # a + total # b + total # c = N , which is N * 1.5 binary bits ) already OBVIOUS smaller than N * 1.5 binary bits

REWARDS for 1st person put forth a practical solution , and REWARDS also for the best practical solution put forth

Look Forward ,
LawCounsels