Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

entropy of a binary file with conditions

35 views
Skip to first unread message

pierre

unread,
Mar 3, 2012, 4:04:29 AM3/3/12
to
hi,

let's suppose we have a binary file (runs of zeroes and ones) and its
length is 1024 bits.

any of these runs is not bigger than 8 (00000000 or 11111111)

how can one compute the entropy of such a file ?

Thank you

glen herrmannsfeldt

unread,
Mar 3, 2012, 4:41:14 AM3/3/12
to
Not enough information. If all the runs are of length 8, then
the entropy is one. Statistically in 1024 bits, one would expect
runs a little longer than 8, but not much. Qualitatively,
the entropy is slightly less than a completely random bit stream,
but not much less.

-- glen

pierre

unread,
Mar 3, 2012, 4:47:20 AM3/3/12
to
I said "not bigger" , so it can go from 1 to 8

I know the entropy of such a file is smaller than the one of a random
file, but I would like to know how to compute it: is there a well-known
formula or a study about this ?

thanks

pierre

glen herrmannsfeldt

unread,
Mar 3, 2012, 5:00:52 AM3/3/12
to
pierre <pie...@invalid.invalid> wrote:
> Le 03/03/2012 10:41, glen herrmannsfeldt a écrit :
>> pierre<pie...@invalid.invalid> wrote:

>>> let's suppose we have a binary file (runs of zeroes and ones)
>>> and its length is 1024 bits.

(snip)
>>> how can one compute the entropy of such a file ?

(snip)
> I said "not bigger" , so it can go from 1 to 8

So, equally distributed between 1 and 8?

> I know the entropy of such a file is smaller than the one of a random
> file, but I would like to know how to compute it: is there a well-known
> formula or a study about this ?

http://en.wikipedia.org/wiki/Entropy_%28information_theory%29#Rationale

H(X) is the entropy. Usually b is 2, and the entropy is in bits.

If you know at each point the probability of a bit being 0 or 1,
then you can compute it from the sum. The most it can be is the
number of bits, 1024, the least 0.

-- glen

pierre

unread,
Mar 3, 2012, 5:34:20 AM3/3/12
to
> http://en.wikipedia.org/wiki/Entropy_%28information_theory%29#Rationale
>
> H(X) is the entropy. Usually b is 2, and the entropy is in bits.
>
> If you know at each point the probability of a bit being 0 or 1,
> then you can compute it from the sum. The most it can be is the
> number of bits, 1024, the least 0.
>


what don't you understand ?

I asked the formula of the entropy for a sequence of 1024 bits where
runs of bits have a length from 1 to 8.

do you know it or no ?

if you don't, just don't answer

If something can help you, just imagine you have runs with length from 1
to 5.

in this present situation, given the condition, the entropy cannot be 0
and it cannot be 1024.

what is the entropy ?

biject

unread,
Mar 3, 2012, 3:34:58 PM3/3/12
to
Easy look for all runs of 8 running. Except don't count the run if its
exactly at the end of file.

The entropy will be 1024 - n where n is the number of runs exactly 8
zeroes
or ones.
You could be quick to point out that if the file is all zeroes the
entropy
should be much less. However entropy is a function of the model not
just the data.

Another example suppose the have a long list of bits that tests
random but then someone points out that it is exactly every third bit
in the expansion of pi. Well
at that point you could use a program to create it so its entropy
would be
vastly smaller one could consider it almost zero.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

glen herrmannsfeldt

unread,
Mar 3, 2012, 4:11:53 PM3/3/12
to
biject <bijec...@gmail.com> wrote:
> On Mar 3, 2:04 am, pierre <pie...@invalid.invalid> wrote:

(after someone wrote)
>> let's suppose we have a binary file (runs of zeroes and ones)
>> and its length is 1024 bits.
(snip)
>> how can one compute the entropy of such a file ?

> Easy look for all runs of 8 running. Except don't count the run if its
> exactly at the end of file.

> The entropy will be 1024 - n where n is the number of runs exactly 8
> zeroes or ones.

> You could be quick to point out that if the file is all zeroes the
> entropy should be much less. However entropy is a function of the
> model not just the data.

This is what I tried to hint at with my answer, but maybe the OP
didn't figure that out. Consider generating the file by flipping
a coin, and that even a fair coin could give heads 1024 times
in a row. (Very unlikely, though.)

> Another example suppose the have a long list of bits that tests
> random but then someone points out that it is exactly every
> third bit in the expansion of pi.

For the OP question to make any sense, you have to assume that
there is no such underlying system. The distribution of run
lengths will give you some hint, though. If runs of 1 to 8 are
(approximately) equally likely, then one possible bit source
is a generator of numbers between 1 and 8. Given that, it is
simple to add up the entropy, with the assumption that the
base 8 number is random. If it is instead digits from base 8
representation of pi, then a different answer.

> Well at that point you could use a program to create it so its
> entropy would be vastly smaller one could consider it almost zero.

Actually, maybe not all that much less than 1024 bits.

-- glen

Alex Mizrahi

unread,
Mar 3, 2012, 5:08:59 PM3/3/12
to
> what don't you understand ?
>
> I asked the formula of the entropy for a sequence of 1024 bits where
> runs of bits have a length from 1 to 8.
>
> do you know it or no ?
>
> if you don't, just don't answer

Information entropy is a property of a random variable. To calculate
entropy one needs to know probability distribution for that variable.

Like, uniform, normal or something.

You haven't specified distribution, so entropy is undefined here.

Formula won't help you if you don't know basics: you won't be able to
use it correctly anyway.

You should learn what entropy is (starting with probability theory
basics) and only then revisit this question.


> If something can help you, just imagine you have runs with length from 1
> to 5.

All having same frequency? I.e. runs with size 1 happen just as frequent
as ones of size 5?

> in this present situation, given the condition, the entropy cannot be 0
> and it cannot be 1024.

> what is the entropy ?

If all kinds of runs are equiprobable, you have average length of 3 bits
per run. Pulling 1024/3=341.3(3) such runs will give you bit length of
1024 on average.

Entropy per run is 2.321928 bits, so we get 792.55066 bits total.

This is a crude estimate, however, since using average number of runs
per file isn't right. Calculating exact number would be rather complex,
I guess.

But having equiprobable length of runs is kinda strange. It's way more
likely that they will have geometric distribution:

http://en.wikipedia.org/wiki/Geometric_distribution

Or some modification of thereof.

It would help if you explain where do you get this data from.

If it is a "random" data, entropy is 1024. Even if you've observed that
length of runs is limited in this particular instance, communicating
that information would take about as much space as you save from that
knowledge. "Savings" you'll get on some kind of "truncated" geometric
distribution just won't be great.

George Johnson

unread,
Mar 3, 2012, 10:30:39 PM3/3/12
to
"pierre" <pie...@invalid.invalid> wrote in message
news:4f51deb4$0$21489$ba4a...@reader.news.orange.fr...
Well if each 8 bit chunk is constant then the easiest thing to say is
that the file can be compressed to 1/8 of the original size (1024 bits / 8 =
128 bits) every time. I'd just use an existing data compression program to
compress the file and tell me how well it compressed. It is a tad foolish
to not use existing quick tools to achieve a simple goal that math alone
would not give sufficient answers to. The practical term is "OVER-THINKING
THE PROBLEM".

I could ramble on about many different techniques, but if you're not
bothering with them yourself (choosing to be bothering us with them
instead), then I'd think it a tad silly to school you in them. Again, you
are either lacking the mathematical skills, tools, self-discipline, etc to
know how to do this yourself already, so be lazy and use an existing data
compressor (7-Zip is free, though a tad buggy, amongst many tools out there)
and just get the answer quickly - cleanly - without sounding like a begging
puppy wanting the slice of warm pizza on the plate.
http://en.wikipedia.org/wiki/Chi-squared_distribution
http://en.wikipedia.org/wiki/Information_entropy
http://en.wikipedia.org/wiki/Shannon_index#Shannon_index

The particular organization of those reduced chunks determines the
entropy.
If I have a deck of 52 cards with 42 duplicate 8-of-spades and 10
duplicate 3-of-diamonds, then the entropy level is more of a probability
function than any other prediction. However you randomize the card stack
there will be 42 copies of one card and 10 copies of the other card. For a
random chosen run section of that set I can give you a pretty decent
statistical prediction of how many 8-of-spades copies & how many
3-of-diamonds copies will be in a section, but not their particular
organizational pattern.

If I were given a deck of 52 cards with the knowledge of that there were
only duplicates of a Jack-of-Hearts & Queen-of-Clubs with no allowance to
examine the deck prior to know how many duplicates of each populated the
deck, then I could make zero statistical predictions on card organization
probabilities or predictions which card would be most likely to be drawn
from any random point in the deck. Therein, I could make no predictions on
the random entropy levels of the deck itself per shuffling.

In this case, it is simplest to use the handiest tools available
(existing data compression programs) and have them run a compressibility
test, by which I could then make estimations of the deck's contents (being
not allowed to examine the deck for the duplicated amount of each card) and
potential card distributions.

And what is all this math to the newbie? Painful.
Well, thems the breaks. I likey my warm pizza very much.
I'm not grading you. In fact I just might point you off in horribly
wrong directions if it amuses me. I have no financial or moral obligation
to assist you in the portions of your life that you have neglected up to
now. Nobody else here is obligated to assist you in any manner. Does this
mean that we are all selfish assholes? Nope. We might be the most
kind-hearted snuggly kittens of compassion you would ever encounter in real
life or we could just be sneering jerks behind a computer in some remote
location (like in the house next to you) or we could just be average Joes
with busy lives plus expensive bills plus family matters distracting us.
Like a randomized deck of cards we have predictable limited-range outputs
with statistically predictable distributions, but the probable organization
for any deck of 52 unique units is quite large.

I give you a thought exercise. Imagine that you only got one important
email a day. 365 important emails per year.
Now how about 10 of these important emails per day, 3650 important
emails per year?
Now how about 10 of these important emails per hour, 87600 important
emails per year?
Now imagine you were running a business that required a thoughtful
math-intensive response per hour.
An Internet business with 600 important emails per hour? Perhaps 60000
per hour?
How much free time would you have to play unpaid teacher's nursemaid to
someone who goofed off during math class and is looking for a quickie answer
so they can instantly forget you to go enjoy goofing off some more?
Are all the posters here likely to be in the identical situation? Nope,
not me even. However, consider that people's time is worth much more than
money some days and an anonymous "Thank You" can be justifiable payment for
labors exerted, but usually never comes anywhere close to paying the
electric bill, the heat bill, or the food bills.

Me, I tend to be a helpful jackass (if it amuses me). Most people won't
admit that I am a jackass or even helpful. The ones that do tend to enjoy
that admission greatly (an inexpensive joy for them, like a piece of candy,
yum). You know what, I also like to take a mirror out with me sometimes and
show folks when they are being jackasses too (I'm strangely helpful that
way) when they are realizing just how big their jackassery is showing. Oh,
at first they are offended (we have a jolly good laugh at that later). The
forced introspection can be quite the epiphany for the less clever folk out
there though.

I have a gift for you.
It's a mirror.


George Johnson

unread,
Mar 3, 2012, 10:45:53 PM3/3/12
to
"pierre" <pie...@invalid.invalid> wrote in message
news:4f51f3c3$0$12492$ba4a...@reader.news.orange.fr...
What is the smell of your finger?
http://xkcd.com/356/

The clever man knows when to abort computation and begin punching face
into concrete.
The less clever man never thinks past the joke until their blood begins
to stain the concrete.
My punching hand is feeling rather restless right now.

Sentenced to Crossword Prison
http://www.waylay.com/Store/OrigPages/391.html

Pedro Santana MS 391 Principal A Phony Baloney
http://www.southbronxschool.com/2010/06/pedro-santana-ms-391-principal-phony.html
Fifty nine percent passed, but 41% didn't. And in 2007-2008, only forty
percent passed.




George Johnson

unread,
Mar 4, 2012, 1:08:20 AM3/4/12
to
"George Johnson" <matr...@charter.net> wrote in message
news:QyB4r.7653$wf....@newsfe09.iad...
> "pierre" <pie...@invalid.invalid> wrote in message
> news:4f51f3c3$0$12492$ba4a...@reader.news.orange.fr...

[ blah blah blah - deliberate time-wasting nonsense clipped ]

>>
>> what is the entropy ?
>
> What is the smell of your finger?
> http://xkcd.com/356/
>
> The clever man knows when to abort computation and begin punching face
> into concrete.
> The less clever man never thinks past the joke until their blood begins
> to stain the concrete.
> My punching hand is feeling rather restless right now.
>
> Sentenced to Crossword Prison
> http://www.waylay.com/Store/OrigPages/391.html

This one you can read. Carol Lay is a brilliant ironic cartoonist.
Sentenced to Crossword Prison
http://www.waylay.com/Store/RoughsPages/391r.html

If you've grown to love her work by now too, you can buy a Kindle E-Book
of that at Amazon.com

http://www.amazon.com/LOVE-AND-LUNACY-Reformatted-ebook/dp/B005LI34UE/ref=sr_1_8?s=books&ie=UTF8&qid=1330841143&sr=1-8
Kindle Purchase Price: $4.49

And no, I am not her. I am significantly more bothersome if you get on
my bad side.



James Dow Allen

unread,
Mar 4, 2012, 1:07:56 AM3/4/12
to
I think you may be asking
What is the number, N, of distinct bit-strings
with the stated property?
Or equivalently, what is log(N)?

Is this correct? If so, let me offer a nit-pick :-)

"Entropy" is a useful word when it clarifies or simplifies.
It can only obfuscate when your real question is a simple
"How many?"

Moving on ...
There are two distinct bits possible (i.e. 1 bit
of associated information) after a run of 7 or less,
but only one possibility (i.e. zero information)
after a run of 8.

The chance that a bit begins a run is 1/2 (actually
slightly more, given the constraint), so the chance that
it begins a run of 8 (or more) would seem to be 1/256
(or somewhat less, I think, given the constraint).

Only p = 1017/1024 of the bits can possibly begin such a run.
The answer therefore would appear to be
approx. 1024 * (1 - (1017/1024)/256) = ~1020 bits

Because of the shortcut taken in this derivation, the
correct answer is, I think, closer to 1021 bits than
to 1020 bits.

Does this help?

James


Willem

unread,
Mar 4, 2012, 10:41:12 AM3/4/12
to
pierre wrote:
) hi,
)
) let's suppose we have a binary file (runs of zeroes and ones) and its
) length is 1024 bits.
)
) any of these runs is not bigger than 8 (00000000 or 11111111)
)
) how can one compute the entropy of such a file ?

I assume you want to know the entropy of a model where all files are
allowed, as long as they do not have a run of longer than 8, and that
all other files are equally likely.

For this, you need to calculate the probability that a 1024-bit contains
a run of more than 8. Or alternatively, the probability that you get 9
heads or tails in a row when you flip a fair coin 1024 times.

This can probably be found in most probability textbooks.

Then simply add the base-2 log of this number to 1024 and there's your
'entropy'.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Alex Mizrahi

unread,
Mar 4, 2012, 2:39:03 PM3/4/12
to
> For this, you need to calculate the probability that a 1024-bit contains
> a run of more than 8. Or alternatively, the probability that you get 9
> heads or tails in a row when you flip a fair coin 1024 times.

> This can probably be found in most probability textbooks.

Really?
I thought about this a bit, and the only way to calculate it exactly is
fairly complex. Maybe I suck at combinatorics...

So, let's say we look at sequences of n bits with at most k equal bits
in a row. We need to calculate how many such sequences are there.

I don't see any direct way to do this, so instead we can calculate
number of sequences which have at least one subsequence of (k+1) equal
bits in a row.

Let's start with case where such (k+1) subsequence is at the beginning.
There are 2 * 2^(n - (k+1)) = 2^(n-k) such subsequences: (n - (k+1))
bits in 'postfix' can be chosen at random, and prefix can be either all
zeros or all ones.

Now we need to take into account cases where "run" is in the middle of
sequence. Naively one could try to insert at all different positions,
but then you will be same sequences counted many times.

So we need to cut this into disjoint sets. One way to do this is to
break sequence into three parts: A B C.
A is sequence of m bits with at most k equal bits in a row.
B is (k+1) ones or zeros.
C is arbitrary sequence of size (1024 - (k+1) - m)

Then we can just sum results for all m = 0..(1024 - (k+1)).
Well, almost. We must also demand that A ends with a digit different to
B, otherwise we'd be counting same sequences many times. Or instead bits
in B are chosen opposite to last bit of A.

So, let's define Q(n, k) to be number of n-bit sequences which have no
(k+1) same bit "runs", and R(n, k) its complement -- number of sequences
which have at least one such "run". We can compute them recursively:

R(n, k) = 2^(n-k) + //a special case for m = 0
+ sum for m = 1..(1024 - (k+1)):
Q(m, k) * 2^(1024 - (k+1) - m)

Q(n, k) = 2^n - R(n, k)

I've made a program which does the computation, and the result is:

log2(Q(1024, 8)) = 1021.08575...

That matches James Dow Allen's estimate.

I kinda doubt that you can find this problem in textbook. It might be
similar to some other problem, like binomial coefficients. but
derivation of a concrete formula is still challenging.
Also it would be hard if not impossible to attack this using probability
theory methods rather than combinatorics.

Am I missing something?

pierre

unread,
Mar 4, 2012, 2:57:42 PM3/4/12
to
> The entropy will be 1024 - n where n is the number of runs exactly 8
> zeroes
> or ones.


that's right, thank you very much !

pierre

unread,
Mar 4, 2012, 2:59:45 PM3/4/12
to
> The answer therefore would appear to be
> approx. 1024 * (1 - (1017/1024)/256) = ~1020 bits
>
> Because of the shortcut taken in this derivation, the
> correct answer is, I think, closer to 1021 bits than
> to 1020 bits.


thakn you for this answer, it's ok for me !
that's what I was waiting for !
many thanks to you and David Scott



glen herrmannsfeldt

unread,
Mar 4, 2012, 5:45:49 PM3/4/12
to
Willem <wil...@toad.stack.nl> wrote:

(snip)
> I assume you want to know the entropy of a model where all files are
> allowed, as long as they do not have a run of longer than 8, and that
> all other files are equally likely.

> For this, you need to calculate the probability that a 1024-bit contains
> a run of more than 8. Or alternatively, the probability that you get 9
> heads or tails in a row when you flip a fair coin 1024 times.

Well, you need to know, as best you can, how the data was generated.
One possibility is that whenever a run of eight is generated, the
opposite bit is stuffed in after it. 1024 bits might not be quite
enough to get the statistics needed to detect such bit stuffing.

Many data transmission systems aren't reliable when too many
of the same value (zero or one) are sent in succession.
One popular solution is bit scramblers. That helps in the case
where the source data has low entropy, yet needs to be sent.

After the scrambler, bit stuffing is used if necessary.

-- glen

Alex Mizrahi

unread,
Mar 5, 2012, 3:03:14 AM3/5/12
to
> Well, you need to know, as best you can, how the data was generated.
> One possibility is that whenever a run of eight is generated, the
> opposite bit is stuffed in after it. 1024 bits might not be quite
> enough to get the statistics needed to detect such bit stuffing.

I think OP meant maximum possible entropy for a set of such sequences.
It is achieved when all sequences in set are equiprobable, then you only
need to count them.

I really doubt that this problem would arise in any practical setting,
as it makes more sense to look at how data is generated rather than at
maximum possible entropy within certain limitations.

glen herrmannsfeldt

unread,
Mar 5, 2012, 4:52:11 AM3/5/12
to
Alex Mizrahi <alex.m...@gmail.com> wrote:
>> Well, you need to know, as best you can, how the data was generated.
>> One possibility is that whenever a run of eight is generated, the
>> opposite bit is stuffed in after it. 1024 bits might not be quite
>> enough to get the statistics needed to detect such bit stuffing.

> I think OP meant maximum possible entropy for a set of such sequences.
> It is achieved when all sequences in set are equiprobable, then you only
> need to count them.

Well, all 2**1024 bit sequences are equally probable.

But some sequences are more likely to be generated in other ways.

The sequence of 1024 zeros, for example, is more likely to be
generated as 1024 zeros than 1024 consecutive tails on a coin flip.

It seems a reasonable question, given a sequence, to ask
about the likely generators of that sequence, and the entropies
for them. (Plural to allow for multiple possibilities.)

So, what generators generate sequences of runs of lengths
from 1 to 8? One possibility generates equal probability runs,
and one can compute that entropy.

A random independent stream of 1024 bits will tend to have runs
longer than 8, but also should be exponential in run length.
Many artificial (non-independent) bit streams should be easily
detected from the run length distribution.

> I really doubt that this problem would arise in any practical setting,
> as it makes more sense to look at how data is generated rather than at
> maximum possible entropy within certain limitations.

You don't always have that option, but if you do, yes, use it.

-- glen

James Dow Allen

unread,
Mar 5, 2012, 9:03:41 AM3/5/12
to
On Mar 5, 2:39 am, Alex Mizrahi <alex.mizr...@gmail.com> wrote:
> I thought about this a bit, and the only way to calculate it exactly is
> fairly complex. Maybe I suck at combinatorics...
> ... [detailed combinatoric discussion] ...
>
> log2(Q(1024, 8)) = 1021.08575...

The evidence here suggests that you do *not* suck
at combinatorics! :-)

Since 1021.1 is so close to 1024, this problem is not
over-interesting in compression. You might want to
tackle the similar problem, looking for a shortcut,
when maximum run length is just 2. Such restricted bit
strings are related to Fibonacci codes, which, IIRC,
David Scott has mentioned here in the past.

In analyzing "RLL2" (run-length-limited to 2), I
think you'll end up writing at some point
r^2 = r + 1
and find the Fibonacci code, or RLL2, is described with
a parameter, r, which happens to be the Golden Ratio.

RLL8 may be described via r ~= 2 where
r^8 = r^7 + r^6 + r^5 + r^4 + r^3 + r^2 + r + 1

Details left as exercise!! :-)

On Mar 5, 3:03 pm, Alex Mizrahi <alex.mizr...@gmail.com> wrote:
> I really doubt that this problem would arise in any practical setting,
> as it makes more sense to look at how data is generated rather than at
> maximum possible entropy within certain limitations.

OP's conditioned binary file reminds us of run-length-limited
codes, e.g. as used on disk drives or communication lines.
With such coding, optimality may require equiprobability.
I don't recall RLL codes allowing runs as long as 8, but perhaps
that's not farfetched today.

But I do agree the topic would be more interesting if OP
told us where his data comes from. :-)

James

Alex Mizrahi

unread,
Mar 5, 2012, 11:05:02 AM3/5/12
to
> Well, all 2**1024 bit sequences are equally probable.
> But some sequences are more likely to be generated in other ways.

I see what you mean, but it is a fairly confusing language.

Elements of a set have different probabilities assigned to them
depending on context.

Particularly, if we believe that sequences are generated, the way they
are generated affects probabilities, yes.

But assuming elements equally probably without knowing a context makes
no sense.

And "are more likely" is a same thing as "are more probable".

> So, what generators generate sequences of runs of lengths
> from 1 to 8? One possibility generates equal probability runs,
> and one can compute that entropy.

My guess is that OP got an idea for an universal compressor which would
analyze input sequence for maximum possible runs, write this information
into output bit stream and then use it during encoding.

James Dow Allen

unread,
Mar 5, 2012, 7:16:03 PM3/5/12
to
On Mar 5, 9:03 pm, James Dow Allen <jdallen2...@yahoo.com> wrote:
> RLL8 may be described via r ~= 2 where
>   r^8 = r^7 + r^6 + r^5 + r^4 + r^3 + r^2 + r + 1
>
> Details left as exercise!!  :-)

The number of RLL2 strings of length n
is twice the n'th Fibonacci number.

I think the RLL8 problem has similar answer,
using the RLL8 equivalent of the Fibonacci
series. This is too difficult for me to
work, but for large N, the number of strings
will increase, for each added bit, by the factor
r ~= 1.996031179735
where r is the solution of
r^8 = r^7 + r^6 + r^5 + r^4 + r^3 + r^2 + r + 1
The bit entropy is asymptotically
E = log_2(r) ~= .997134257
E * 1024 ~= 1021.06548

which differs from Andrew's exact result due to
start-up. Indeed, adding 8*(1-log_2(r)) since
the first 8 bits are unrestricted, gives
1021.088
in close agreement with Andrew's exact result.

0.132 is, I think, the approx. chance that the longest
run in a random 1024-long bit string is 8 or less,
so it takes
-log_2(0.132) = 2.92 bits
to encode the fact that that is the case.

1021.086 + 2.92 is almost uncannily close to 1024.00

Shannon and his Pigeons win again!

James Dow Allen

James Dow Allen

unread,
Mar 5, 2012, 7:27:40 PM3/5/12
to
On Mar 6, 7:16 am, James Dow Allen <jdallen2...@yahoo.com> wrote:
> Indeed, adding 8*(1-log_2(r)) since
> the first 8 bits are unrestricted, gives
>    1021.088
> in close agreement with Andrew's exact result.

::face turns read:: ... ::whacks his head::

I mean *Alex's* exact result.

Phil Carmody

unread,
Mar 20, 2012, 12:49:56 PM3/20/12
to
"George Johnson" <matr...@charter.net> writes:
> "pierre" <pie...@invalid.invalid> wrote in message
> news:4f51deb4$0$21489$ba4a...@reader.news.orange.fr...
> > hi,
> >
> > let's suppose we have a binary file (runs of zeroes and ones) and its
> > length is 1024 bits.
> >
> > any of these runs is not bigger than 8 (00000000 or 11111111)
> >
> > how can one compute the entropy of such a file ?
> >
> > Thank you
>
> Well if each 8 bit chunk is constant then the easiest thing to say is
> that the file can be compressed to 1/8 of the original size (1024 bits / 8 =
> 128 bits) every time.

But each 8 bits must be followed with an absolutely predictable next 8 bits.
So you can compress this situation down to almost 1 bit - the first one.

Phil
--
> I'd argue that there is much evidence for the existence of a God.
Pics or it didn't happen.
-- Tom (/. uid 822)
0 new messages