"pierre" <pie...@invalid.invalid> wrote in message
news:4f51deb4$0$21489$ba4a...@reader.news.orange.fr...
Well if each 8 bit chunk is constant then the easiest thing to say is
that the file can be compressed to 1/8 of the original size (1024 bits / 8 =
128 bits) every time. I'd just use an existing data compression program to
compress the file and tell me how well it compressed. It is a tad foolish
to not use existing quick tools to achieve a simple goal that math alone
would not give sufficient answers to. The practical term is "OVER-THINKING
THE PROBLEM".
I could ramble on about many different techniques, but if you're not
bothering with them yourself (choosing to be bothering us with them
instead), then I'd think it a tad silly to school you in them. Again, you
are either lacking the mathematical skills, tools, self-discipline, etc to
know how to do this yourself already, so be lazy and use an existing data
compressor (7-Zip is free, though a tad buggy, amongst many tools out there)
and just get the answer quickly - cleanly - without sounding like a begging
puppy wanting the slice of warm pizza on the plate.
http://en.wikipedia.org/wiki/Chi-squared_distribution
http://en.wikipedia.org/wiki/Information_entropy
http://en.wikipedia.org/wiki/Shannon_index#Shannon_index
The particular organization of those reduced chunks determines the
entropy.
If I have a deck of 52 cards with 42 duplicate 8-of-spades and 10
duplicate 3-of-diamonds, then the entropy level is more of a probability
function than any other prediction. However you randomize the card stack
there will be 42 copies of one card and 10 copies of the other card. For a
random chosen run section of that set I can give you a pretty decent
statistical prediction of how many 8-of-spades copies & how many
3-of-diamonds copies will be in a section, but not their particular
organizational pattern.
If I were given a deck of 52 cards with the knowledge of that there were
only duplicates of a Jack-of-Hearts & Queen-of-Clubs with no allowance to
examine the deck prior to know how many duplicates of each populated the
deck, then I could make zero statistical predictions on card organization
probabilities or predictions which card would be most likely to be drawn
from any random point in the deck. Therein, I could make no predictions on
the random entropy levels of the deck itself per shuffling.
In this case, it is simplest to use the handiest tools available
(existing data compression programs) and have them run a compressibility
test, by which I could then make estimations of the deck's contents (being
not allowed to examine the deck for the duplicated amount of each card) and
potential card distributions.
And what is all this math to the newbie? Painful.
Well, thems the breaks. I likey my warm pizza very much.
I'm not grading you. In fact I just might point you off in horribly
wrong directions if it amuses me. I have no financial or moral obligation
to assist you in the portions of your life that you have neglected up to
now. Nobody else here is obligated to assist you in any manner. Does this
mean that we are all selfish assholes? Nope. We might be the most
kind-hearted snuggly kittens of compassion you would ever encounter in real
life or we could just be sneering jerks behind a computer in some remote
location (like in the house next to you) or we could just be average Joes
with busy lives plus expensive bills plus family matters distracting us.
Like a randomized deck of cards we have predictable limited-range outputs
with statistically predictable distributions, but the probable organization
for any deck of 52 unique units is quite large.
I give you a thought exercise. Imagine that you only got one important
email a day. 365 important emails per year.
Now how about 10 of these important emails per day, 3650 important
emails per year?
Now how about 10 of these important emails per hour, 87600 important
emails per year?
Now imagine you were running a business that required a thoughtful
math-intensive response per hour.
An Internet business with 600 important emails per hour? Perhaps 60000
per hour?
How much free time would you have to play unpaid teacher's nursemaid to
someone who goofed off during math class and is looking for a quickie answer
so they can instantly forget you to go enjoy goofing off some more?
Are all the posters here likely to be in the identical situation? Nope,
not me even. However, consider that people's time is worth much more than
money some days and an anonymous "Thank You" can be justifiable payment for
labors exerted, but usually never comes anywhere close to paying the
electric bill, the heat bill, or the food bills.
Me, I tend to be a helpful jackass (if it amuses me). Most people won't
admit that I am a jackass or even helpful. The ones that do tend to enjoy
that admission greatly (an inexpensive joy for them, like a piece of candy,
yum). You know what, I also like to take a mirror out with me sometimes and
show folks when they are being jackasses too (I'm strangely helpful that
way) when they are realizing just how big their jackassery is showing. Oh,
at first they are offended (we have a jolly good laugh at that later). The
forced introspection can be quite the epiphany for the less clever folk out
there though.
I have a gift for you.
It's a mirror.