Entropy Definition : Request for Comments  LawCounsels  3/13/12 1:40 AM  A useful Entropy Definition for variable length N bits produced by source : SUMS [ entropy of N bits * % probability of source producing N bits ] where N = 1 to arbitrary large # of bits produced by source is this right ? LawCounsels 
Re: Entropy Definition : Request for Comments  Thomas Richter  3/13/12 2:13 AM  No. Entropy is the entropy of a random process. If the random process
generates symbols X_1,X_2,...X_n,... etc, in an alphabet \Omega of size M, the Nblock entropy is defined as H_N(X) =  \sum{i_1...i_n=1}^M p(x_{i_1},...,x_{i_n}) log_2 p(x_{i_1},...x_{x_n}) that is, you look at the average information content of blocks of size N where you sum over all possible blocks of this size, with their probability times log_2 of their probability. One can then see (with not too much problem) that H_N is between H_1, the oneblock entropy, and N*H_1. Then, the entropyrate is given as h(X) = \lim_{n>\infty} H_n / n that is, the average number of bits to encode state of an nblock, per symbol in that block, for the blocksize going to infinity. For example, if the source is an iid source, then H_n = n*H_1, the oneblock entropy, and thus h(X) = H_1(X). Similar simple considerations hold for Markov sources of nth order where the limit is actually reached at the size of the memory of the source. Entropyrate is in so far relevant as the Shannon theorems talk about this object (or its closely related coinformationrate). Greetings, Thomas 
Re: Entropy Definition : Request for Comments  LawCounsels  3/13/12 10:03 AM  Thanks for the concise current definition .... highlighting the need
for Entropy to always measured at present referencing a particular chosen fixed Block of size N , as the Shannon theorems talk about this object some initial comments here : [ 1 ] seems like SUMS [ entropy of N bits * % probability of source producing N bits ] where N = 1 to arbitrary large # of bits producedby source ALSO fits perfect with the concise current definition ( which defined & fits the term "entropy of N bits" within SUMMATION here ) .... where the source is as you mentioned iid then its perfect fit [ 2 ] in addition to [ 1 ] above , this SUMMATION definition is more general ALLOWS the source to eg 1st 'randomly' ( based on some probability a variable length N chosen ) choose to output a certain length N [ & immediate STOPS , not continuously output further symbols ... of course it may also choose to output N arbitrary large # of symbols ] , & then based on this 'random' chosen output length N to use various associated distinct different source probability .... eg length of eg 16 would use 60/40 weighted coins tosses , length of 32 uses 70/30 It appears Shannon's work did not encompass scenario [2] above , only deals with continuously 'random' producing source output probabilities LawCounsels . 
Re: Entropy Definition : Request for Comments  Thomas Richter  3/13/12 11:25 AM  > ( which defined& fits the term "entropy of N bits" within SUMMATION
> here ) .... where the source is as you mentioned iid then its perfect> arbitrary large # of symbols ] ,& then based on this 'random' chosen > output length N to use various associated distinct different sourceYou don't understand what "source" means. A source is a black box that generates an infinite stream of symbols from an alphabet, where each symbol is chosen at random, with a probability that depends possibly on the symbol, and possibly on the past symbols generated, i.e. the source may have a memory, and it may also depend on how many symbols it generated, i.e. probabilities may change over time. It is a common misconception that you can "observe" probabilities. You can't. Probabilities are *defined* into the model. Just because in the last ten dice rolls you got ten sixes does not mean that the probability of the six is one. Nor does it mean that the next symbol must be something else but six to get "correct probabilities". Shannon talks about *models* of systems, and these systems have probabilities you know. It does not talk about how you *get* probabilities. A model of a dice with p(6)=1 does have an entropy (of zero), and you can compute it, and it is "your model". It might be a bad model, though, but that's nothing the theorems care or describe. Sure enough, in a real system, you only have symbols, and you may try or should fit a probabilistic model to such a real source to apply Shannon. But it is outright *wrong* to say that you can replace "probabilities" by "relative frequences" and then compute "the entropy of the sequence". Such a thing does not exist nor does it make sense. Entropy is always relative to a *model*, and not to a specific source. You can compute the entropy of a dicemodel for which you defined(!) (yes really!) the probabilities, but you *cannot* compute the entropy of "the green dice on my desk I rolled 100 times and wrote down the number of pips". The latter makes no sense. A model is not a reality. It is a model *of* reality. Of course, the better the model describes the properties of your *real* source, the more you can expect a compression algorithm to work on your source, but Shannon theorems make absolutely no statement on how to model sources. The theorems *apply* only to models. Not to real sequences. So long, Thomas 
Re: Entropy Definition : Request for Comments  lawco...@gmail.com  3/13/12 11:56 AM  On Mar 13, 6:25 pm, Thomas Richter <t...@math.tuberlin.de> wrote: > You can compute theyes ... needs only now precise restate as model ( not source ) .... the comments then holds true : Model for which the probabilities of length N symbols produced & immediate STOP was defined : bits ] where N = 1 to arbitrary large # of bits produced by Model 
Re: Entropy Definition : Request for Comments  Thomas Richter  3/13/12 12:38 PM  Once again: A source does not have a length. A source creates an
*infinite* stream of symbols. 
Re: Entropy Definition : Request for Comments  lawco...@gmail.com  3/14/12 1:43 AM  yes ... its now corrected says Model

Re: Entropy Definition : Request for Comments  lawco...@gmail.com  3/14/12 1:46 AM 
A useful Entropy Definition for variable length N bits produced bySUMS [ entropy of N bits * % probability of Model producing N 
Re: Entropy Definition : Request for Comments  Thomas Richter  3/14/12 4:09 AM  And once again, the answer is "no" because it depends on your model.
Just saying "variable length N bits" does not specify precisely enough what are you looking for because "variable length string" is nothing that is already welldefined. Just to give you an example how one can model "variable length" strings: Model 1) A variable length string on an alphabet \Omega is a string consisting of an alphabet \Omega'=\Omega \cup {EOF} (EOF = EOFSymbol) and an equivalence relation ~ such that string s_1 from \Omega' and s_2 from \Omega' are equivalent under the conditions that s_1(i) = s_2(i) for all i<=i_n where i_n is the first position in s such that s_1(i_n) = s_2(i_n) = EOF. In other words, you consider strings up to an additional EOF symbol and do not care about the strings beyond this. This is certainly one model how to think about variable length strings. Model 2) A variable length string is an element of \Omega^N, where N is a random variable, and \Omega is an alphabet. A probability distribution on random length strings is given by probability distributions p_1,p_2,...,p_n where p_i is defined on \Omega^i. The probability of getting a string s of length N consisting of symbols a_1,...a_n is given by \sum_n q(n) p_n(a_1....a_n) where q is the probability distribution on the string lengths. Here, in this model you first roll a dice to define the length of the string, and then roll dices to fill in the method. It is not adhoc clear which model you are talking about (or possibly about a third model, or whether these models are probably identical), so what the probability distributions are. Hence, it is completely unclear what "entropy" should be in this case unless you are more specific how you arrive at the length and at the symbols. IOW, to repeat this again: Your question doesn't make sense. Please be more specific which type of model you have in mind to generate "variable length strings", and depending on that, an entropy definition might be given. Without further information, all is up to speculation. 
Re: Entropy Definition : Request for Comments  LawCo...@aol.com  3/14/12 8:52 AM  On Mar 14, 11:09 am, Thomas Richter <t...@math.tuberlin.de> wrote:
In the 1st instance the formula should be generally applicable to any models without restrictions whatsoever , including ALL those presently ample described by Shannon's fixed NBlock ... also the Model 1) & Model 2) described OR any other Models anybody could conceiveable comes up with ( hopefully which also can be 'economic' interesting ) Otherwise the formula is not a valid general formula in the 1st place 
Re: Entropy Definition : Request for Comments  lawco...@gmail.com  3/14/12 11:34 AM  An interesting related phenoma would be to ask :
if complete random binary file of length N bits were to be REVERSIBLE 'transformed' / 'randomised' / 'jumbled' by any methods , will the resultant 'jumbled' of exact same N bits EXHIBIT any particular patterns / restrictions on any portion of the resultant file BEYOND those normally associated with 'random' file ( like the 1Million Random Digits file ) 
Re: Entropy Definition : Request for Comments  Thomas Richter  3/14/12 2:10 PM  On 14.03.2012 16:52, LawCounsels wrote:IOW, your question does not make sense. To apply Shanon's lossless or lossy coding theorem, you must have a random source. And this means infinite strings. Entropy can be defined on a finite collection of symbols, and is defined on blocks of size N or  to put this differently  on probability spaces that are the Nfold tensor product of a base probability space. > also the Model 1)& Model 2) described OR any other Models anybody > could conceiveableOk, here is the formula: Let \Omega be a probability space and p a probability measure on \Omega, then H = \sum_{a \in \Omega} p(a) \log_2 p(a) This is the general definition. If your probability space is \Omega^N = \Omega \otimes \Omega ... \otimes \Omega, then you get the Nblock entropy back by substitution in the sense I gave above. Now define what you understand as "probability space of variable length strings" and we are back in business. If this answer does not help, please think about what your question means. Once again, without having a probability space, there is no entropy. There is no more specific answer *unless* you make your spaces more concrete. If you don't want to do this, then the above *is* the correct answer, but probably unhelpful for your purposes. Did this help? Probably not, but there is nothing else to say then. So long, Thomas 
Re: Entropy Definition : Request for Comments  Thomas Richter  3/14/12 2:16 PM  On 14.03.2012 19:34, lawco...@gmail.com wrote:This is not exactly related. What is "exact the same N bits". If the bits are exactly the same, the file is identical, so there is nothing to say. If the transformation is reversible, then any reversible transform on a finite set is just a permutation on the base space, and as the entropy is a sum over the full base space, any permutation doesn't change this sum at all  IOW, the entropy is invariant under reversible transformations of a base space into itself. [As intuition also suggests, of course]. So this is not exactly "interesting". The answer is immediate, and exactly what is expected. More interesting things happen if you map this to a smaller set, say, you consider for example sums of random variables. Then, under mild constraints of the source, you get something called the "central limit theorem", i.e. your distribution approaches Gaussians (or stable distributions under less constrained conditions). Greetings, Thomas 
Re: Entropy Definition : Request for Comments  lawco...@gmail.com  3/14/12 11:29 PM  we are both on the right wavelengths here
myself would 100 times out of 100 immediate says the exact same you did here without hesitations ... EXCEPT the INVARIABLE REVERSIBLE resultant transformed file of same N bits long ALWAYS GIVES sizeable portions with unmistakable particularly strong patterns / restrictions ( UNMISTAKABLY UNAMBIGUOUS LOW ENTROPY bits sequences at unmistakable unambiguous ascertainable 'start' / 'end' bits positions .... mathematics proven instantiated with the described Entropy Definition formula on ALL real transformed resultant files ) LawCounsels 
Re: Entropy Definition : Request for Comments  Thomas Richter  3/15/12 4:10 AM  On 15.03.2012 07:29, lawco...@gmail.com wrote:Look, what people often confuse is "has many obvious patterns" = "has low entropy". This is simply wrong. A specific outcome of a random source may very well have "obvious patterns", but might still have a low probability. For example, let Y be a binary random iid source with p(0)=7/8 and p(1)=1/8. Let z_k be the nth digit in the binary expansion of Pi, or to be precise, let z_k = floor(\pi * 2^k) mod 2. Then define x_k = y^k xor z_k. Then, the outcome (realization) of z_k being 0000...000000 is surely very unlikely, yet has an obvious pattern, and the entropy of Y and the entropy of X are identical. (And the entropy of Z is zero because it is not a random process...). This "paradox" stems from the fact that sources humans like you or me care about only care about reduntant sources, i.e. the random sources we look at (newspapers, photos, audio recordings) are highly redundant *and* contain many patterns. That is, for the sources we care about, we have "low entropy" = "many patterns". But this is simply false in general. Or to be even more concrete "having a pattern" is not a welldefined term, nor does it have any relation to entropy. It *only* (or should I write "only"?) has a relation to entropy for those sources we as human observers care about. Sources with patterns allow "simple models", but just because the models are simpler that allow you to consider them as lowentropy random sources does not mean that there are lowentropy sources that do not show obvious patterns. In fact, in a precise way, *most* loworder entropy sources are like that (i.e. do not show obvious patterns). Unfortunately, even these "most low order entropy sources" (with nonobvious patterns) are still a very rare species in the set of all sources, rare enough to disallow any escape from the counting argument. So long, Thomas 
Re: Entropy Definition : Request for Comments  lawco...@gmail.com  3/15/12 6:23 AM  these are different, the strong patterns here mathematics proven instantiated with the described Entropy Definition formula on ALL real transformed resultant files )

Re: Entropy Definition : Request for Comments  Thomas Richter  3/15/12 8:45 AM  This not English is sentence, not understand do I. 
Re: Entropy Definition : Request for Comments  jacko  3/15/12 4:27 PM  I suppose some of the other thread is trying to teach the difference between a infinite central limit distribution and the distribution formed by finite sampling from it.
In this sense the expectation of different quantities could be different. The entropy (or expected self information) E(I) may vary depending on the distribution statistics. 
Re: Entropy Definition : Request for Comments  jacko  3/15/12 4:40 PM  Entropy is the expected self information of a source or probabilistic event producer.
"When it comes to science, there is only one way to do it. Or that was what the experiment demonstrated!"  postmodern 21st century dogma, developed by the church of the one true budget and light. 
Re: Entropy Definition : Request for Comments  jacko  3/15/12 5:14 PM  On Thursday, March 15, 2012 11:40:38 PM UTC, jacko wrote:Another part of the scifi script is the english agent holding a passport made of the american agent while sitting in front of the criminal entry database, while saying "it depends if your wanted in this country." And the american agent phones to ask for permission to chat about "the censorship of data versus the faux product." just as the french maid frenchitalian made "being made" hiding agent walks in with a beer. 
Re: Entropy Definition : Request for Comments  jacko  3/15/12 5:24 PM  a nice titles would be right brothers to current video of brief history of flight, and then move in for a close flash "apparent crash" landing of UFO irrevocable event 1 as it becomes known in scene 1.

Re: Entropy Definition : Request for Comments  LawCo...@aol.com  3/20/12 6:08 PM  On Mar 15, 1:23 pm, lawcouns...@gmail.com wrote:each of these many 'start'/ 'end' sequences each consists of ternary symbols a b c only condition here is total# of c ALWAYS exact = total# b + 2 & this occurs only at EOSequence , ie at any stage IF progressive total# c becomes = progressive # b + 2 then a sequence ends . [ total# a unrestricted , but usually around same as total# b BUT this is irrelevant ] The entropy of all these many 'start'/'end' sequences together with total N symbols ( total # a + total # b + total # c = N , which is N * 1.5 binary bits ) OBVIOUS can be practical encoded smallest possible than N * 1.5 binary bits see : http://groups.google.com/group/comp.compression/browse_thread/thread/43cbd7738ca13fa9/f92dfa3abf1d7edd?hl=en&lnk=raot#f92dfa3abf1d7edd LawCounsels 
Re: Entropy Definition : Request for Comments  LawCo...@aol.com  3/31/12 10:30 AM  On Friday, March 16, 2012 12:24:38 AM UTC, jacko wrote:Mankind's centuries old Kraft's Inequality / Pigeonholes hurdles now comes to this : can each of these many 'start'/ 'end' sequences ( each consists of ternary symbols a b c only condition here is total# of c ALWAYS exact = total# b + 2 &c becomes = progressive # b + 2 then a sequence ends .... total# a unrestricted , but usually around same as total# b BUT this is irrelevant ) be better represented encoded 'shorter'.... take care the encoded bitstring MUST be selfdelimiting meaning unambiguous as to where the encoded bitsstring ends ( subsequent follows / merged with other bits , needs able distinguish this boundary from the encoded bitstring itself ) total N symbols ( total # a + total # b + total # c = N , which is N * 1.5 binary bits ) already OBVIOUS smaller than N * 1.5 binary bits REWARDS for 1st person put forth a practical solution , and REWARDS also for the best practical solution put forth Look Forward , LawCounsels 