> ( which defined& fits the term "entropy of N bits" within SUMMATION
> here ) .... where the source is as you mentioned iid then its perfect
> fit
>
> [ 2 ] in addition to [ 1 ] above , this SUMMATION definition is more
> general ALLOWS the source to eg 1st 'randomly'
> ( based on some probability a variable length N chosen ) choose to
> output a certain length N [& immediate STOPS , not continuously
> output further symbols ... of course it may also choose to output N
> arbitrary large # of symbols ] ,& then based on this 'random' chosen
> output length N to use various associated distinct different source
> probability .... eg length of eg 16 would use 60/40 weighted coins
> tosses , length of 32 uses 70/30
>
> It appears Shannon's work did not encompass scenario [2] above , only
> deals with continuously 'random' producing source output probabilities
You don't understand what "source" means. A source is a black box that
generates an infinite stream of symbols from an alphabet, where each
symbol is chosen at random, with a probability that depends possibly on
the symbol, and possibly on the past symbols generated, i.e. the source
may have a memory, and it may also depend on how many symbols it
generated, i.e. probabilities may change over time.
It is a common misconception that you can "observe" probabilities. You
can't. Probabilities are *defined* into the model. Just because in the
last ten dice rolls you got ten sixes does not mean that the probability
of the six is one. Nor does it mean that the next symbol must be
something else but six to get "correct probabilities".
Shannon talks about *models* of systems, and these systems have
probabilities you know. It does not talk about how you *get*
probabilities. A model of a dice with p(6)=1 does have an entropy (of
zero), and you can compute it, and it is "your model". It might be a bad
model, though, but that's nothing the theorems care or describe.
Sure enough, in a real system, you only have symbols, and you may try or
should fit a probabilistic model to such a real source to apply Shannon.
But it is outright *wrong* to say that you can replace "probabilities"
by "relative frequences" and then compute "the entropy of the sequence".
Such a thing does not exist nor does it make sense. Entropy is always
relative to a *model*, and not to a specific source. You can compute the
entropy of a dice-model for which you defined(!) (yes really!) the
probabilities, but you *cannot* compute the entropy of "the green dice
on my desk I rolled 100 times and wrote down the number of pips". The
latter makes no sense.
A model is not a reality. It is a model *of* reality. Of course, the
better the model describes the properties of your *real* source, the
more you can expect a compression algorithm to work on your source, but
Shannon theorems make absolutely no statement on how to model sources.
The theorems *apply* only to models. Not to real sequences.
So long,
Thomas