Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

# Mathematical Combinations and Compression.

6 views

### Bill Tuttle

Jun 21, 2000, 3:00:00 AM6/21/00
to
It isn't a matter of finding the right algorithm.

Compression boils down to two factors, understanding the uniqueness of the
data and representing it efficiently.

This is generally considered as modeling the data and encoding it. This is
taking advantage of unique knowledge of the distribution of the data.
Whether this is done statically or dynamically.

Replace the term 'any' with 'random' and this can be proven mathematically.

If you are referring to 'any' as being a specific and determinalistic
dataset, then yes you can create a unique 'algorithym' or model to compress
it efficiently.

There will be many others who can describe these issues in far greater
detail and correctness

Bill

"Itai Bar-Haim" <av...@bezeqint.net> wrote in message
news:3951...@news.bezeqint.net...
> Hi.
> I have read in several posts already that it is impossible to compress
data
> to less than X bits since the number of mathematical combinations of bits
in
> a certain form has to be represented by X bits.
> That is not true.
> When dealing with compression, mathematical combinations has many times
> NOTHING to do with the compression algorithm.Compression usually uses
> dictionarries, sorting, and other types of not-mathematical ways to handle
> the data.
> That's why there must be a way to compress any data whatsoever, by finding
> the right algoritm.
>
> Itai.
>
>

### Harvey Rook

Jun 21, 2000, 3:00:00 AM6/21/00
to
Oh no! It's another infinite compressions is possible thread.. Any bets on
how long this will go on?

"Itai Bar-Haim" <av...@bezeqint.net> wrote in message
news:3951...@news.bezeqint.net...
> Hi.
> I have read in several posts already that it is impossible to compress
data
> to less than X bits since the number of mathematical combinations of bits
in
> a certain form has to be represented by X bits.
> That is not true.
> When dealing with compression, mathematical combinations has many times
> NOTHING to do with the compression algorithm.Compression usually uses
> dictionarries, sorting, and other types of not-mathematical ways to handle
> the data.

What is your definiation of a mathematical combination? From what I see,
dictionaries, huffman trees, sorted lists, etc are all types of
mathematical combinations.

> That's why there must be a way to compress any data whatsoever, by finding
> the right algoritm.
>

But, then you'd also have to include the algorithm in the compressed
results. This
sucks up space. Weither you like it or not, the Counting Argument still
holds true.

> Itai.

Harvey Rook

### D.A.Kopf

Jun 21, 2000, 3:00:00 AM6/21/00
to
Harvey Rook wrote:
>
> Oh no! It's another infinite compressions is possible thread.. Any bets on
> how long this will go on?

Of course, it must be realized that *all possible files* could take much
longer than the universe will exist to ever generate.

\$1 says at least another twelve posts?

### Dale King

Jun 21, 2000, 3:00:00 AM6/21/00
to
Itai Bar-Haim wrote:
>
> Hi.
> I have read in several posts already that it is impossible to compress data
> to less than X bits since the number of mathematical combinations of bits in
> a certain form has to be represented by X bits.
> That is not true.
> When dealing with compression, mathematical combinations has many times
> NOTHING to do with the compression algorithm.Compression usually uses
> dictionarries, sorting, and other types of not-mathematical ways to handle
> the data.
> That's why there must be a way to compress any data whatsoever, by finding
> the right algoritm.

The process of compression and decompression can be considered as this
process:

input data ----> compressed data ----> original data

The mathematical combinations part tells you how many possible input
files you can possibly see. In your case that was shown to be around
2^817 possible input files. Let's imagine that we had a compression
algorithm that could compress all 2^817 input files by at least one bit.
That means that the compressed data would contain no more than 816 bits.
The problem is that would require multiple sets of input data to map to
the same compressed data file. This is because the number of data files
containing at most 816 bits is 2^817 - 1 (I am including a zero bit
file). That seems to say that we can compress all the files but 1, but
then there's a little problem of needing to know how many bits are in
the output. That has to somehow be encoded into the compressed data as
well and must be added to the size of the data. The naive way would be
to simply include 10 extra bits giving the size in bits. But if you add
these 10 bits to all of the compressed data files you now have a net
loss if you average over all possible input files. Certainly, some of
the input files will indeed compress down to less than 817 bits, but
they are in the vast minority. Most will actually grow in size. You can
cut down how much they grow by adding a bit that says whether the data
compressed or not. But this still means that over half of the input
files will grow by 1 bit. Still a net loss on average.

Note that this analysis did not look at any particular algorithm. It
merely looked at the input and output. Therefore the algorithm used does
not matter to the compressability of random data. You cannot escape the
fact that there is a theoretical lower limit for compression. That
depends on the characteristics of the input data source. In information
theory that is quanitified by entropy.

What compression software does is try to map the most frequently
occurring input files onto those compressed files that are smaller than
the input file. By doing so, there must exist input files that are
actually expanded. It is a mathematical law. But what we try to do is
make sure that the ones that are expanded by the compressor are very
rare.

You will see various compressors that do wonders on certain input files.
For instance compressing English text will often yield better than 50%
compression. But that is because the input source does not have high
entropy. The dictionary compressors you mention are simply exploiting
the fact that text is not random data. Dictionary compressors rely on
the fact that phrases are often repeated in the text. If that is not
true of your text it may actuall expand it. Other compressors may
exploit the facts like, if you see the the letters "elephan", the odds
of the next letter being t are nearly 100%. The odds of the next
character being Z is pretty much zero.

You have said that your data is random. That means it has high entropy.
There is way to predict what the next bit will be outside of the already
established rules for how many bits are set and how many total bits
there are. That information gets us down to 817 bits. Without any more
knowledge of the data you cannot do better than that for all possible
inputs. You can pick some that will be compressed, but at the cost of
expanding others.

--
--- Dale King

Recruiters: I am not willing to relocate outside of Indianapolis!

### Dale King

Jun 21, 2000, 3:00:00 AM6/21/00
to
But do you include this message in the count, since it is kind of a

Of course, how long these threads go on can only be governed by how long
it takes to sink into the person claiming the impossibile is possible.
Hopefully it will sink into Itai quickly.

On this topic though, is there a good web site that plainly explains the
counting principle? I have never been able to find a good one. Note, I
understand the counting principle, I just want a nice site to point
people to to cut threads like this short without having to take 1/2 an
hour to explain it myself.

"D.A.Kopf" wrote:
>
> Harvey Rook wrote:
> >
> > Oh no! It's another infinite compressions is possible thread.. Any bets on
> > how long this will go on?
>
> Of course, it must be realized that *all possible files* could take much
> longer than the universe will exist to ever generate.
>
> \$1 says at least another twelve posts?

### D.A.Kopf

Jun 21, 2000, 3:00:00 AM6/21/00
to
Dale King wrote:
>
> But do you include this message in the count, since it is kind of a

Certainly not. At least from now on. (That's four and counting)

>
> Of course, how long these threads go on can only be governed by how long
> it takes to sink into the person claiming the impossibile is possible.
> Hopefully it will sink into Itai quickly.
>
> On this topic though, is there a good web site that plainly explains the
> counting principle? I have never been able to find a good one. Note, I
> understand the counting principle, I just want a nice site to point
> people to to cut threads like this short without having to take 1/2 an
> hour to explain it myself.

Sesame Street has a quite understandable introduction to the topic. As I
recall, hosted by an animated character named "Count Dracula". They present
him with animated objects that occupy his animated space, and he counts them.
Don't know if he exists in a virtual-world web site. Even more unsure about
any relevance of this to the world *we* live in!

### Itai Bar-Haim

Jun 22, 2000, 3:00:00 AM6/22/00
to
Hi.
I have read in several posts already that it is impossible to compress data
to less than X bits since the number of mathematical combinations of bits in
a certain form has to be represented by X bits.
That is not true.
When dealing with compression, mathematical combinations has many times
NOTHING to do with the compression algorithm.Compression usually uses
dictionarries, sorting, and other types of not-mathematical ways to handle
the data.
That's why there must be a way to compress any data whatsoever, by finding
the right algoritm.

Itai.

### Juan de Dios Santander Vela

Jun 22, 2000, 3:00:00 AM6/22/00
to
en artículo 8ird1m\$bnn\$1...@pyrite.mv.net, Bill Tuttle
<bill....@imagauto.com> escribió:

> If you are referring to 'any' as being a specific and determinalistic
> dataset, then yes you can create a unique 'algorithym' or model to compress
> it efficiently.

An example: you can compress a zillion bytes file into a single bit...
as long those are exactly the zillion bytes in a row that bit is supposed
to represent... but if you knew it in the first time, why bother to trans-
mit even that bit? Then we have compressed a zillion bytes to nothing...
but then we have no information, for we knew what was going to be transmi-
tted...

So, if we want a general method to be useful to every possible stream,
there's no way around the counting principle.

--
Juan de Dios Santander Vela
Ingeniero en Electrónica (Electronics Engineer)
España (Spain)
e-mail: juand...@airtel.net ICQ: 22903952

### Phil Norman

Jun 22, 2000, 3:00:00 AM6/22/00
to
On Wed, 21 Jun 2000 17:46:10 -0500, Dale King <Ki...@TCE.com> wrote:
>But do you include this message in the count, since it is kind of a

Well, it is caused by the original counting question, so it
probably counts as part of the discussion. Of course, the real
fun starts when sub-threads start getting renamed. Hopefully this
one won't be so enormous.

>On this topic though, is there a good web site that plainly explains the
>counting principle? I have never been able to find a good one. Note, I
>understand the counting principle, I just want a nice site to point
>people to to cut threads like this short without having to take 1/2 an
>hour to explain it myself.

Has anyone mentioned the FAQ yet? I don't think they have.
There's a wonderful explanation in section 9.2 of the FAQ which
provides a particularly clear example anyone should be able to
follow. URLs for the FAQ are:

There's also a section slightly later on about software patents
which is rather amusing.

Cheers,
Phil

### Paul McCanny

Jun 22, 2000, 3:00:00 AM6/22/00
to
> An example: you can compress a zillion bytes file into a single bit...
>
> as long those are exactly the zillion bytes in a row that bit is
> supposed
> to represent... but if you knew it in the first time, why bother to
> trans-
> mit even that bit? Then we have compressed a zillion bytes to
> nothing...
> but then we have no information, for we knew what was going to be
> transmi-
> tted...

The bit is transmitted to let the decoder know that the zillion bit
message is included in the original message. If no information was
contained (unless this is explicity defined in the decoder syntax).

paul

### D.A.Kopf

Jun 22, 2000, 3:00:00 AM6/22/00
to
Paul McCanny wrote:
>
> > An example: you can compress a zillion bytes file into a single bit...
> >
> > as long those are exactly the zillion bytes in a row that bit is supposed
> > to represent... but if you knew it in the first time, why bother to
> > transmit even that bit? Then we have compressed a zillion bytes to
> > nothing...but then we have no information, for we knew what was going to
> > be transmitted...

>
> The bit is transmitted to let the decoder know that the zillion bit
> message is included in the original message. If no information was
> transmitted then no assumption could be made about what the message
> contained (unless this is explicity defined in the decoder syntax).
>
And also to tell the decoder precisely when the message occurs, so that it can
make the right selection from among its vast store of bit-to-file mappings!-)

### Dale King

Jun 22, 2000, 3:00:00 AM6/22/00
to
I don't think the explanation is quite rigorous enough in the FAQ. What
it says is certainly true but for someone not familiar with the
arguments, it can easily be seen as "handwaving". I previously have gone
through the rigorous calculations. One are it doesn't plainly address is
where you consider a fixed size input buffer. It shows all files >= N,
but not all files exactly equal to N. And it doesn't address cases like
this where the size of the input file is large, but not all possible
input files are allowed.

Phil Norman wrote:
>
> >On this topic though, is there a good web site that plainly explains the
> >counting principle? I have never been able to find a good one. Note, I
> >understand the counting principle, I just want a nice site to point
> >people to to cut threads like this short without having to take 1/2 an
> >hour to explain it myself.
>
> Has anyone mentioned the FAQ yet? I don't think they have.
> There's a wonderful explanation in section 9.2 of the FAQ which
> provides a particularly clear example anyone should be able to
> follow. URLs for the FAQ are:
>
> http://www.landfield.com/faqs/compression-faq/part1/preamble.html
> http://www.cis.ohio-state.edu/hypertext/faq/usenet/compression-faq/top.html
>
> There's also a section slightly later on about software patents
> which is rather amusing.

### mikael.l...@spray.se

Jun 22, 2000, 3:00:00 AM6/22/00
to
Hi!
How true it is what you are saying ...
I almost suspect that when people come with a lot of difficult theory to
explain why it's impossible to solve this or that problem, they actually
really don't have a clue. The fact remains, the most powerful tool you
have is your mind. If your attitude is that nothing is impossible and you
try and try again, NEVER EVER giving up, then the problem is destined to
be solved no matter what the NO-sayers tries to tell you!
Where would we be if not Edison, Marconi or Einstein, with this attitude,
had not done what they did?

Best regards and good luck,
Mikael

In article <3951...@news.bezeqint.net>,

Sent via Deja.com http://www.deja.com/

### Juan de Dios Santander Vela

Jun 22, 2000, 3:00:00 AM6/22/00
to
en artículo 8itti2\$m7t\$1...@nnrp1.deja.com, mikael.l...@spray.se
<mikael.l...@spray.se> escribió:

> I almost suspect that when people come with a lot of difficult theory to
> explain why it's impossible to solve this or that problem, they actually
> really don't have a clue. The fact remains, the most powerful tool you
> have is your mind. If your attitude is that nothing is impossible and you
> try and try again, NEVER EVER giving up, then the problem is destined to
> be solved no matter what the NO-sayers tries to tell you!
> Where would we be if not Edison, Marconi or Einstein, with this attitude,
> had not done what they did?

It's not only a matter of attitude: they realized which were the difficul-
ties... and went other way. If you want to compress EVERY possible stream
losslessly, you're under the counting argument. So if you want to get rid
of it, you'll have to be out of its range.

And, about the NO-sayers, anyone with a perpetual movement machine? Every-
body says you can't, but if you try and try...

### mikael.l...@spray.se

Jun 22, 2000, 3:00:00 AM6/22/00
to
Hello!
I'm a software engineer who have been developing compression software a
couple of years. In the 70's scientists said that it was impossible to
compress to less than what the entropy of the file described. Today it's
possible to compress far far better than to this limit using bwt, ppm or
ctw algorithms as preprocessors.
When speculations about sending people to the moon came in the 50's, the
common man laughed at it and stated categorically that IT-IS-IMPOSSIBLE
and laughed some more. We all know the outcome of that too ...
From Apple:

While some see them as the crazy ones,
we see genius.

Because the people who are crazy enough to think
they can change the world, are the ones who do.

You're right, there should always be some realism included when making
decisions what to do with problems through life. But we don't know
everything now and will most certainly never do. So, that's why we need
these crazy people that are crazy enough to think they can change the
world. We should not inhibit them with ill-educated "advice" but
encourage them!

Best regards,
Mikael Lundqvist
Software engineer
Sweden

In article <B5784A58.7FD9%juand...@airtel.net>,

### Dale King

Jun 22, 2000, 3:00:00 AM6/22/00
to
And I don't care how much effort you put forth you cannot get 10 gallons
of water into a 5 gallon container (perhaps with a black hole, but then
we would have to be more rigorous on our definition of a gallon). That
in a nutshell is the counting argument! Or alternatively as the
pigeon-hole principle you can't put n pigeons into n-1 holes without
putting 2 pigeons in the same hole. Those aren't complicated theories,
they are simple logic.

Feel free to try all you want and don't give up. When you find a way
around these problems be sure to let us know :-)

And funny you should mention Einstein. He seemed to put some limits on
the universe like not being able to go beyond the speed of light. He
must be wrong. If we put forth enough effort we should be able to break
the speed of light barrier.

(Have we hit 12 messages on this topic yet?)

mikael.l...@spray.se wrote:
>
> How true it is what you are saying ...

> I almost suspect that when people come with a lot of difficult theory to
> explain why it's impossible to solve this or that problem, they actually
> really don't have a clue. The fact remains, the most powerful tool you
> have is your mind. If your attitude is that nothing is impossible and you
> try and try again, NEVER EVER giving up, then the problem is destined to
> be solved no matter what the NO-sayers tries to tell you!
> Where would we be if not Edison, Marconi or Einstein, with this attitude,
> had not done what they did?

--

### Antaeus Feldspar

Jun 22, 2000, 3:00:00 AM6/22/00
to
Phil Norman <Phil....@genedata.com> wrote:

> On Wed, 21 Jun 2000 17:46:10 -0500, Dale King <Ki...@TCE.com> wrote:
> >But do you include this message in the count, since it is kind of a
>
> Well, it is caused by the original counting question, so it
> probably counts as part of the discussion. Of course, the real
> fun starts when sub-threads start getting renamed. Hopefully this
> one won't be so enormous.
>
>

> >On this topic though, is there a good web site that plainly explains the
> >counting principle? I have never been able to find a good one. Note, I
> >understand the counting principle, I just want a nice site to point
> >people to to cut threads like this short without having to take 1/2 an
> >hour to explain it myself.
>
> Has anyone mentioned the FAQ yet? I don't think they have.
> There's a wonderful explanation in section 9.2 of the FAQ which
> provides a particularly clear example anyone should be able to

> follow. URLs for the FAQ are:

Actually, it's not all that clear. IMHO, anyways. I've been
trying for quite a while to come up with a way to express it that will
make the principle involved even clearer. I *think* I have something,
but I haven't had a lot of time to work on it.

> Cheers,
> Phil

-jc

--
* -jc IS *NOW* feld...@cryogen.com
* The home of >>Failed Pilots Playhouse<<
* "Better you hold me close than understand..." Thomas Dolby

### D.A.Kopf

Jun 22, 2000, 3:00:00 AM6/22/00
to
Dale King wrote:
>
> And I don't care how much effort you put forth you cannot get 10 gallons
> of water into a 5 gallon container (perhaps with a black hole, but then
> we would have to be more rigorous on our definition of a gallon). That
> in a nutshell is the counting argument! Or alternatively as the
> pigeon-hole principle you can't put n pigeons into n-1 holes without
> putting 2 pigeons in the same hole. Those aren't complicated theories,
> they are simple logic.

Pigeons? Why pigeons? I thought we were talking about information. But if you
insist on pigeons, it appears that unlimited geographical information can be
crammed into their tiny finite brains.

>
> Feel free to try all you want and don't give up. When you find a way
> around these problems be sure to let us know :-)
>
> And funny you should mention Einstein. He seemed to put some limits on
> the universe like not being able to go beyond the speed of light. He
> must be wrong. If we put forth enough effort we should be able to break
> the speed of light barrier.

The current thinking is that tachyons actually have to gain energy to slow down.

>
> (Have we hit 12 messages on this topic yet?)

One more, by my count.

### mikael.l...@spray.se

Jun 23, 2000, 3:00:00 AM6/23/00
to
It's the attitude and spirit I'm trying to explain and encourage.
If we let "the common man", mr Smith, decide what is possible and what we
should do, then you can count on that innovations, inventions or great
break throughs will never happen because he's saying "It's impossible".
Those great men and women through history have always been laughed at by
people who did not even understand what they did. We all still benefit a
lot from what those great men and women did.

Mikael

In article <3952941D...@TCE.com>,

Dale King <Ki...@TCE.com> wrote:
> And I don't care how much effort you put forth you cannot get 10 gallons
> of water into a 5 gallon container (perhaps with a black hole, but then
> we would have to be more rigorous on our definition of a gallon). That
> in a nutshell is the counting argument! Or alternatively as the
> pigeon-hole principle you can't put n pigeons into n-1 holes without
> putting 2 pigeons in the same hole. Those aren't complicated theories,
> they are simple logic.
>

> Feel free to try all you want and don't give up. When you find a way
> around these problems be sure to let us know :-)
>
> And funny you should mention Einstein. He seemed to put some limits on
> the universe like not being able to go beyond the speed of light. He
> must be wrong. If we put forth enough effort we should be able to break
> the speed of light barrier.
>

> (Have we hit 12 messages on this topic yet?)
>

> mikael.l...@spray.se wrote:
> >
> > How true it is what you are saying ...
> > I almost suspect that when people come with a lot of difficult theory to
> > explain why it's impossible to solve this or that problem, they actually
> > really don't have a clue. The fact remains, the most powerful tool you
> > have is your mind. If your attitude is that nothing is impossible and you
> > try and try again, NEVER EVER giving up, then the problem is destined to
> > be solved no matter what the NO-sayers tries to tell you!
> > Where would we be if not Edison, Marconi or Einstein, with this attitude,
> > had not done what they did?
>
> --
> --- Dale King
>
> Recruiters: I am not willing to relocate outside of Indianapolis!
>

### kaptke...@my-deja.com

Jun 23, 2000, 3:00:00 AM6/23/00
to

> One more, by my count.
>

Well, let me be that one more but adding that you can fit more pigeons
per hole if you first apply a blender transform. But then we are

-Michael

--
Michael A Maniscalco
Visit my homepage on the new M99 data compression scheme at
http://michaelmaniscalco.homepage.com/M99index.htm

### s...@nospam.unt.edu

Jun 23, 2000, 3:00:00 AM6/23/00
to
mikael.l...@spray.se wrote:

> It's the attitude and spirit I'm trying to explain and encourage.
> If we let "the common man", mr Smith, decide what is possible and what we
> should do, then you can count on that innovations, inventions or great
> break throughs will never happen because he's saying "It's impossible".

But the attitude and spirit should be directed in a fruitful
direction. When something is easily and clearly shown to be
impossible, encouraging someone to try the impossible is just plain
stupid.

You mentioned Einstein in another posting. Here's a quote from
Einstein:

"One reason why mathematics enjoys special esteem, above all other
sciences, is that its laws are absolutely certain and
indisputable, while those of all other sciences are to some extent
debatable and in constant danger of being overthrown by newly
discovered facts." -- Albert Einstein, 1921

That sums up this situation perfectly. While the physical sciences
(where these other breakthroughs happened that you talked about) might
say something is impossible it is always with the caveat that it's
impossible if our understanding of the basic physical laws is correct.
On the other hand, compression is a purely mathematical process, and
we understand the laws completely and unequivocally. That's why we
can say such compression is impossible, and it really does mean
impossible.

Finally, in a later posting you made some bizarre claims about entropy
and compression. First you demonstrated that you didn't know what
entropy was (by referring to entropy of a file rather than entropy of
a source), and then you claimed that people "overcame the impossible"
by compressing better than entropy. Well, here's some news for you:
You *CAN'T* compress better than entropy. The compressors you
mentioned (BWT, PPM, etc) in fact compress slightly worse than the
entropy of a source, and there will never be a compressor that can
compress (on average) to less than the entropy of the source. Shannon
proved this back in the 40's, and it's one of those absolutely true
facts that will never, ever be contradicted... (now there are some
arguments about whether "real world data" is accurately modeled with
the probabilistic models that you have to use for entropy, but that's
a different question).

--
Steve Tate --- srt[At]cs.unt.edu | Gratuitously stolen quote:
Dept. of Computer Sciences | "The box said 'Requires Windows 95, NT,
University of North Texas | or better,' so I installed Linux."
Denton, TX 76201 |

### Tim

Jun 23, 2000, 3:00:00 AM6/23/00
to

s...@nospam.unt.edu wrote in article <8ium9f\$477\$1...@hermes.acs.unt.edu>...

> mikael.l...@spray.se wrote:
>
>(now there are some
> arguments about whether "real world data" is accurately modeled with
> the probabilistic models that you have to use for entropy, but that's
> a different question).
>

No it is not a different question. It goes right to the heart of the matter
of the thread. If these arguments exist and have merit (to which I have no
idea), then we have an 'alternate universe' of models that we need to deal
with. I doubt that it is the case, given the maturity of the science (info
theory), but I suppose its possible.

Tim

### mikael.l...@spray.se

Jun 23, 2000, 3:00:00 AM6/23/00
to
In article <8ium9f\$477\$1...@hermes.acs.unt.edu>,

s...@nospam.unt.edu wrote:
> mikael.l...@spray.se wrote:
>
> > It's the attitude and spirit I'm trying to explain and encourage.
> > If we let "the common man", mr Smith, decide what is possible and what we
> > should do, then you can count on that innovations, inventions or great
> > break throughs will never happen because he's saying "It's impossible".
>
> But the attitude and spirit should be directed in a fruitful
> direction. When something is easily and clearly shown to be
> impossible, encouraging someone to try the impossible is just plain
> stupid.
>
Of course! Eagerness should not replace good judgement!
In Sweden we have a saying though, very roughly:
"It's better to try and fail than not to try at all."

> You mentioned Einstein in another posting. Here's a quote from
> Einstein:
>
> "One reason why mathematics enjoys special esteem, above all other
> sciences, is that its laws are absolutely certain and
> indisputable, while those of all other sciences are to some extent
> debatable and in constant danger of being overthrown by newly
> discovered facts." -- Albert Einstein, 1921
>
> That sums up this situation perfectly. While the physical sciences
> (where these other breakthroughs happened that you talked about) might
> say something is impossible it is always with the caveat that it's
> impossible if our understanding of the basic physical laws is correct.
> On the other hand, compression is a purely mathematical process, and
> we understand the laws completely and unequivocally. That's why we
> can say such compression is impossible, and it really does mean
> impossible.
>

It's a fact though that all of today's universal compression algorithms
and implementations are far from the "magical limit" or what the entropy
allows, if I dare to use that word. We can also say that in the practical
world we don't want to pack a file that's been packed already. That would
be plain stupid too! (So what the heck are we debating about?)

> Finally, in a later posting you made some bizarre claims about entropy
> and compression. First you demonstrated that you didn't know what
> entropy was (by referring to entropy of a file rather than entropy of
> a source), and then you claimed that people "overcame the impossible"
> by compressing better than entropy. Well, here's some news for you:
> You *CAN'T* compress better than entropy. The compressors you
> mentioned (BWT, PPM, etc) in fact compress slightly worse than the
> entropy of a source, and there will never be a compressor that can
> compress (on average) to less than the entropy of the source. Shannon
> proved this back in the 40's, and it's one of those absolutely true

> facts that will never, ever be contradicted... (now there are some

> arguments about whether "real world data" is accurately modeled with
> the probabilistic models that you have to use for entropy, but that's
> a different question).
>

I know what entropy means. "Disorder" or "randomness". I should have used
And it's easy to pack a file better than with huffman only, if we use
"bwt, ppm or ctw as preprocessor" isn't it?
I think this also very clearly demonstrates the difference between
scientists like you and engineers like me. For me it's not a lot of
theory, math or carefully laid up proofs of something. For me it's much
more interesting to see what I can do with it, practically. And I don't
always trust what other people tell me, even if it's told by a scientist!
Critical thinking is a good and necessary tool too!
Are you really sure that YOU know how to measure the "source's" entropy?
If you don't, then what is all this theory worth?

Thanks for the feedback! I suppose one should be careful with the words
one choose. The whole world could be reading it ... I'm not good with
words. I suppose that's why I became a programmer instead!

Mikael

### Phil Norman

Jun 23, 2000, 3:00:00 AM6/23/00
to
On Thu, 22 Jun 2000 20:39:32 GMT,
mikael.l...@spray.se <mikael.l...@spray.se> wrote:
>How true it is what you are saying ...
>I almost suspect that when people come with a lot of difficult theory to
>explain why it's impossible to solve this or that problem, they actually
>really don't have a clue. The fact remains, the most powerful tool you
>have is your mind. If your attitude is that nothing is impossible and you
>try and try again, NEVER EVER giving up, then the problem is destined to
>be solved no matter what the NO-sayers tries to tell you!
>Where would we be if not Edison, Marconi or Einstein, with this attitude,
>had not done what they did?

Absolutely. I, personally, have been trying to get 1+1 to equal 3
for a good number of years now. I've not managed it yet (when I
do, I'll post the results on http://inkvine.fluff.org/~forrey/),
but *I'm* certainly not giving up just because mathematicians say
it's impossible. What do they know? They're just constrained by
all the prejudices they've been taught during their studies.

What the world needs is new, original thought. Good luck breaking
the counting argument! Einstein, Newton and all their friends
will be proud.

Cheers,
Phil

### Phil Norman

Jun 23, 2000, 3:00:00 AM6/23/00
to
On Fri, 23 Jun 2000 02:35:32 GMT,
kaptke...@my-deja.com <kaptke...@my-deja.com> wrote:
>
>> One more, by my count.
>
>Well, let me be that one more but adding that you can fit more pigeons
>per hole if you first apply a blender transform. But then we are

However, one of those rare forms of lossy compression where the
end result is better than the uncompressed form.

Cheers,
Phil

--
:wq

### Phil Norman

Jun 23, 2000, 3:00:00 AM6/23/00
to
On 22 Jun 2000 20:17:25 EDT, D.A.Kopf <d...@dakx.com> wrote:
>
>Pigeons? Why pigeons? I thought we were talking about information. But if you
>insist on pigeons, it appears that unlimited geographical information can be
>crammed into their tiny finite brains.

Pigeon navigation might be impressive, but let's not blow the
whole thing out of proportion, eh? Unless you can come up with
some evidence that there's no limit to the amount of geographical
information which can be stored in a pigeon's brain, I'm not going
to take the above paragraph at all seriously.

Cheers,
Phil

--
:wq

### Phil Norman

Jun 23, 2000, 3:00:00 AM6/23/00
to
On Thu, 22 Jun 2000 19:59:03 -0400,
Antaeus Feldspar <feld...@cryogen.com> wrote:

>Phil Norman <Phil....@genedata.com> wrote:
>
>> Has anyone mentioned the FAQ yet? I don't think they have.
>> There's a wonderful explanation in section 9.2 of the FAQ which
>> provides a particularly clear example anyone should be able to
>> follow. URLs for the FAQ are:
>
> Actually, it's not all that clear. IMHO, anyways. I've been
>trying for quite a while to come up with a way to express it that will
>make the principle involved even clearer. I *think* I have something,
>but I haven't had a lot of time to work on it.

Well, when I read it (I refer particularly to the bit which
states that since there are 2^n possible n-bit files, you can't
represent all those values in 2^(n-1) bits) I found that
explanation really quite beautiful in its simplicity. However,
I've watched that argument fail to be absorbed by enough people to
realise that it's probably not clear enough.

Good luck in finding something better though - I've not thought a
great deal about how one would explain it, but it strikes me as a
difficult concept to get across in unquestionably simple terms.

Cheers,
Phil

--
:wq

### D.A.Kopf

Jun 23, 2000, 3:00:00 AM6/23/00
to

No, I have no data on their maximum capacity. This was to subtly convey the
point of view that observation of what actually happens in the real world
should take precedence over mathematical theories about what ought to happen.

### Alan Morgan

Jun 23, 2000, 3:00:00 AM6/23/00
to
In article <slrn8l613a.t6...@jalua.ch.genedata.com>,

Phil Norman <Phil....@genedata.com> wrote:
>On Thu, 22 Jun 2000 20:39:32 GMT,
> mikael.l...@spray.se <mikael.l...@spray.se> wrote:
>>How true it is what you are saying ...
>>I almost suspect that when people come with a lot of difficult theory to
>>explain why it's impossible to solve this or that problem, they actually
>>really don't have a clue. The fact remains, the most powerful tool you
>>have is your mind. If your attitude is that nothing is impossible and you
>>try and try again, NEVER EVER giving up, then the problem is destined to
>>be solved no matter what the NO-sayers tries to tell you!
>>Where would we be if not Edison, Marconi or Einstein, with this attitude,
>>had not done what they did?
>
>Absolutely. I, personally, have been trying to get 1+1 to equal 3
>for a good number of years now. I've not managed it yet (when I
>do, I'll post the results on http://inkvine.fluff.org/~forrey/),
>but *I'm* certainly not giving up just because mathematicians say
>it's impossible. What do they know? They're just constrained by
>all the prejudices they've been taught during their studies.

Good work! Might I suggest that, when you have finished this, you
start to look for the even prime numbers greater than 2. Mathematicians
claim they don't exist, but they haven't checked all of them so they
can't possibly know. I'm pretty sure that given the number of even
numbers greater than two that *one* of them has to be prime.

Alan

### s...@nospam.unt.edu

Jun 23, 2000, 3:00:00 AM6/23/00
to

> s...@nospam.unt.edu wrote in article <8ium9f\$477\$1...@hermes.acs.unt.edu>...
>> mikael.l...@spray.se wrote:
>>

>>(now there are some
>> arguments about whether "real world data" is accurately modeled with
>> the probabilistic models that you have to use for entropy, but that's
>> a different question).
>>

> No it is not a different question. It goes right to the heart of the matter

> of the thread. If these arguments exist and have merit (to which I have no
> idea), then we have an 'alternate universe' of models that we need to deal
> with. I doubt that it is the case, given the maturity of the science (info
> theory), but I suppose its possible.

Of course it's a different question. I can define a source, compute
the entropy, and guarantee you that no compressor you can come up with
compress data from that source any smaller than the entropy allows. I
can prove that you can't do better than entropy. Ever.

But the question of whether it makes sense to talk about entropy and
probabilistic sources of real world data is different. But I was
responding to someone talking about entropy, not real world data.

### s...@nospam.unt.edu

Jun 23, 2000, 3:00:00 AM6/23/00
to
mikael.l...@spray.se wrote:

> I know what entropy means. "Disorder" or "randomness". I should have used
> the word "statistics" instead, I was thinking about e.g. huffman coding.
> And it's easy to pack a file better than with huffman only, if we use
> "bwt, ppm or ctw as preprocessor" isn't it?

But an order-0 Huffman coder is not a universal compressor. The fact
that it can be beat isn't a surprise to anyone (in fact, Shannon's

> I think this also very clearly demonstrates the difference between
> scientists like you and engineers like me. For me it's not a lot of
> theory, math or carefully laid up proofs of something.

But don't you think you'd be better off if you understood the theory?
I'm not saying everyone has to be a scientist and work on new
theories, but you need to understand what's there. An engineer
shouldn't just try random stuff in the hopes that something will
work. You need to be strong in the foundations if you expect to make
progress.

> Are you really sure that YOU know how to measure the "source's" entropy?
> If you don't, then what is all this theory worth?

Yes, I'm absolutely sure I know how to measure a source's entropy.
It's a fairly simple theory and formula. Here's a simple example: my
source generates numbers 0-255 (bytes) all random, uniform, and
independent. The entropy of that source is exactly 8 bits per symbol,
and you will never be able to compress it any better than that.

### Dale King

Jun 23, 2000, 3:00:00 AM6/23/00
to
mikael.l...@spray.se wrote:
>
> It's the attitude and spirit I'm trying to explain and encourage.
> If we let "the common man", mr Smith, decide what is possible and what we
> should do, then you can count on that innovations, inventions or great
> break throughs will never happen because he's saying "It's impossible".
> Those great men and women through history have always been laughed at by
> people who did not even understand what they did. We all still benefit a
> lot from what those great men and women did.

I think you've got it backwards in this case. The common man (you)
thinks it can be done because he doesn't understand the mathematics.
Those who do understand know that it is impossible and are laughing at
those who claim it is possible.

### Dale King

Jun 23, 2000, 3:00:00 AM6/23/00
to
mikael.l...@spray.se wrote:
>
> Hello!
> I'm a software engineer who have been developing compression software a
> couple of years. In the 70's scientists said that it was impossible to
> compress to less than what the entropy of the file described. Today it's
> possible to compress far far better than to this limit using bwt, ppm or
> ctw algorithms as preprocessors.

To reiterate what someone else said, a file does not have an entropy
limit. A source has an entropy limit. And I can certainly compress A
file below the source's entropy limit, but any such lossless compression
algorithm MUST necessarily compress most of the other files from that
source to a size greater than the entropy limit. If you average the
compression over all possible messages from that source you will be at
or above the entropy limit. That is an inescapable mathematical law.

All those algorithms that you mentioned do is try to be better about
selecting which of the possible messages from a source get mapped to the
smaller files. They look for specific types of non-randomness to the
data. Some algorithms are better than others on detecting the non random
patterns. None will let you compress all files from a random source
below the source's entropy limit.

> When speculations about sending people to the moon came in the 50's, the
> common man laughed at it and stated categorically that IT-IS-IMPOSSIBLE
> and laughed some more. We all know the outcome of that too ...

Once again you are confusing impossible with difficult or impractical.
The impossible in this might be better stated as illogical. Putting a
man on the moon in the 50's would definitely appear impossible, but that
is simply because it would be very difficult and impractical. It was not
however illogical. Logically there was no reason why someone couldn't be
on the moon. Compression below a source's entropy limit, like the even
prime number > 2, are logical impossibilities. The existance of such a
prime number or such a compressor is not logical.

> From Apple:
>
> While some see them as the crazy ones,
> we see genius.
>
> Because the people who are crazy enough to think
> they can change the world, are the ones who do.
>
> You're right, there should always be some realism included when making
> decisions what to do with problems through life. But we don't know
> everything now and will most certainly never do. So, that's why we need
> these crazy people that are crazy enough to think they can change the
> world. We should not inhibit them with ill-educated "advice" but
> encourage them!
>
> Best regards,
> Mikael Lundqvist
> Software engineer
> Sweden
>
> In article <B5784A58.7FD9%juand...@airtel.net>,
> juand...@airtel.net wrote:
> > en artículo 8itti2\$m7t\$1...@nnrp1.deja.com, mikael.l...@spray.se
> > <mikael.l...@spray.se> escribió:
> >

> > > I almost suspect that when people come with a lot of difficult theory to
> > > explain why it's impossible to solve this or that problem, they actually
> > > really don't have a clue. The fact remains, the most powerful tool you
> > > have is your mind. If your attitude is that nothing is impossible and you
> > > try and try again, NEVER EVER giving up, then the problem is destined to
> > > be solved no matter what the NO-sayers tries to tell you!
> > > Where would we be if not Edison, Marconi or Einstein, with this attitude,
> > > had not done what they did?
> >

> > It's not only a matter of attitude: they realized which were the difficul-
> > ties... and went other way. If you want to compress EVERY possible stream
> > losslessly, you're under the counting argument. So if you want to get rid
> > of it, you'll have to be out of its range.
> >
> > And, about the NO-sayers, anyone with a perpetual movement machine? Every-
> > body says you can't, but if you try and try...
> >
> > --
> > Juan de Dios Santander Vela
> > Ingeniero en Electrónica (Electronics Engineer)
> > España (Spain)
> > e-mail: juand...@airtel.net ICQ: 22903952
> >
> >
>

> Sent via Deja.com http://www.deja.com/

--

### Dale King

Jun 23, 2000, 3:00:00 AM6/23/00
to
Note, a blender transform is not necessary. The pigeon-hole principle
places no restrictions on the size of the holes. It doesn't matter if
your holes can hold 1000 pigeons. If you have n pigeons and n - 1 holes,
you cannot put the pigeons in the holes without having more than one
pigeon per whole.

All a blender transform will let you do is spread that extra pigeon
among several holes, but you will still end up with a quantity of
pigeons > 1 in at least one hole. If you fully apply the blender
transform you will get 1 + 1/n pigeons per hole. ;-). As n approaches
infinity the average number of pigeons per hole approaches 1 but does
not equal 1.

kaptke...@my-deja.com wrote:
>
> > One more, by my count.
> >
>
> Well, let me be that one more but adding that you can fit more pigeons
> per hole if you first apply a blender transform. But then we are

--

### Dale King

Jun 23, 2000, 3:00:00 AM6/23/00
to
"D.A.Kopf" wrote:
>
> Pigeons? Why pigeons? I thought we were talking about information. But if you
> insist on pigeons, it appears that unlimited geographical information can be
> crammed into their tiny finite brains.

Form the pigeons I've seen I would think the information capacity of
their brains was one or two bits at most ;-)

### D.A.Kopf

Jun 23, 2000, 3:00:00 AM6/23/00
to
s...@nospam.unt.edu wrote:
<snip>

>
> Yes, I'm absolutely sure I know how to measure a source's entropy.
> It's a fairly simple theory and formula. Here's a simple example: my
> source generates numbers 0-255 (bytes) all random, uniform, and
> independent. The entropy of that source is exactly 8 bits per symbol,
> and you will never be able to compress it any better than that.
>
Gotcha, Hah! Please describe the implementation of your source, along with the
necessary theoretical or empirical evidence that its output is impossible for
others to predict.

PS - What do you think about the possibility of googleflop quantum computers?

### s...@nospam.unt.edu

Jun 23, 2000, 3:00:00 AM6/23/00
to
D.A.Kopf <d...@dakx.com> wrote:
> s...@nospam.unt.edu wrote:

>> Yes, I'm absolutely sure I know how to measure a source's entropy.
>> It's a fairly simple theory and formula. Here's a simple example: my
>> source generates numbers 0-255 (bytes) all random, uniform, and
>> independent. The entropy of that source is exactly 8 bits per symbol,
>> and you will never be able to compress it any better than that.

> along with the necessary theoretical or empirical evidence that its
> output is impossible for others to predict.

Huh? I just completely described the source above. You can plug it
into the entropy formula to see that the entropy is indeed 8
bits/symbol. And you can read Shannon's proof that no compressor can
ever beat that. What's possibly left?

> PS - What do you think about the possibility of googleflop quantum
> computers?

Irrelevant. Quantum computers don't change this one iota...

### D.A.Kopf

Jun 23, 2000, 3:00:00 AM6/23/00
to
s...@nospam.unt.edu wrote:
>
> D.A.Kopf <d...@dakx.com> wrote:
> > s...@nospam.unt.edu wrote:
>
> >> Yes, I'm absolutely sure I know how to measure a source's entropy.
> >> It's a fairly simple theory and formula. Here's a simple example: my
> >> source generates numbers 0-255 (bytes) all random, uniform, and
> >> independent. The entropy of that source is exactly 8 bits per symbol,
> >> and you will never be able to compress it any better than that.
>
> > along with the necessary theoretical or empirical evidence that its
> > output is impossible for others to predict.
>
> Huh? I just completely described the source above. You can plug it
> into the entropy formula to see that the entropy is indeed 8
> bits/symbol. And you can read Shannon's proof that no compressor can
> ever beat that. What's possibly left?

substantial penalties may apply if it doesn't perform as advertised!

>
> > PS - What do you think about the possibility of googleflop quantum
> > computers?
>
> Irrelevant. Quantum computers don't change this one iota...

Interesting origin of this expression. Has (I think) to do with the added iota
going from the greek "homoousios" (made of the same substance) to homoiousios
(made of a similar substance). When applied to the relationship between Jesus
and God, the early Christian church came to regard the latter as a dangerous
heresy. So however insignificant one iota may seem today, it meant a great
deal to those who were executed because their theoretical constructs
conflicted with the contemporary reality.

### kaptke...@my-deja.com

Jun 23, 2000, 3:00:00 AM6/23/00
to
Ah, but if we didn't scrape the blender clean after transform (leaving
just over one pigeon's worth on the interior of the blender) then we
could market such a transform as able to compress losslessly and
recursively on all data (plus some marginal bookkeeping (the remainder
of the pigeon in the blender). We could sell it to anyone who does not
comprehend math. (ofcorse we would never actually release any working
product).

-michael

--
Michael A Maniscalco
Visit my homepage on the new M99 data compression scheme at
http://michaelmaniscalco.homepage.com/M99index.htm

In article <3953C26C...@TCE.com>,

> --
> --- Dale King
>
> Recruiters: I am not willing to relocate outside of Indianapolis!
>

Sent via Deja.com http://www.deja.com/

### Nasir Rajpoot

Jun 23, 2000, 3:00:00 AM6/23/00
to
Phil Norman wrote:

> What the world needs is new, original thought. Good luck breaking
> the counting argument! Einstein, Newton and all their friends
> will be proud.

Oh, but I am sure Shannon will not be so proud :)-

Cheers,
Nasir

### Antaeus Feldspar

Jun 23, 2000, 3:00:00 AM6/23/00
to
Phil Norman <Phil....@genedata.com> wrote:

Well, here's the basic approach I'm working on: telling people
to visualize all the possible files *numbered*... numbering the two
one-bit files[1] 0 and 1, the four two-bit files 2, 3, 4 and 5, the
eight three-bit files... well, you get the idea.
Then I'm examining what it means to 'compress' a file: a file
with one number actually takes the place of a file with a lower number
(not all files with lower numbers are shorter files, but all shorter
files must have lower numbers.) You are now telling the decompressor:
"When you see the file whose number is 4,506; actually restore the file
whose number is 789,054."
But, what happens to the file whose number *started out* as
4,506? That file has lost its number, and needs another one... and if
our claim that any file can be compressed is true, it must be an even
lower number... and hopefully anyone really *can* see from here that
sooner or later you wind up having to try and find numbers lower than 0.
Hopefully, it's easier from there to handle claims like
"compresses everything, as long as it's above x number of bytes" (that
identifies a *fixed* number of files that can be evicted from their
numbers; therefore only a *fixed* number of files can be awarded their
places) and the issue of bytestrings rather than bitstrings (bytestrings
are actually worse; if you reduce a bytestring to a bitstring 15 bits
long, that actually evicts *two* bytestrings, not just one...)

-jc

[1] To force people to confront the principle first, I'm starting with
bitstrings rather than bytestrings.

### Tom Gutman

Jun 23, 2000, 3:00:00 AM6/23/00
to
"Dale King" <Ki...@TCE.com> wrote in message
news:395145B2...@TCE.com...

> But do you include this message in the count, since it is kind
of a
>
> Of course, how long these threads go on can only be governed by
how long
> it takes to sink into the person claiming the impossibile is
possible.
> Hopefully it will sink into Itai quickly.

>
> On this topic though, is there a good web site that plainly
explains the
> counting principle? I have never been able to find a good one.
Note, I
> understand the counting principle, I just want a nice site to
point
> people to to cut threads like this short without having to take
1/2 an
> hour to explain it myself.
>
> "D.A.Kopf" wrote:
> >
> > Harvey Rook wrote:
> > >
> > > Oh no! It's another infinite compressions is possible
> > > how long this will go on?
> >
> > Of course, it must be realized that *all possible files* could
take much
> > longer than the universe will exist to ever generate.
> >
> > \$1 says at least another twelve posts?

> --
> --- Dale King
>
> Recruiters: I am not willing to relocate outside of
Indianapolis!

This could take a while, because so much of what is said is
absolutely true, and absolutely meaningless. Itai's statement
that any file can be compressed -- you just need to find the right
algorithm -- is quite true (for any given file the simplest best
algorithm is one that exchanges that file with the zero length
file and leaves other files unchanged) and quite meaningless (what
good is compressing just one possible file?)

The counting argument also has a major flaw. It is again quite
true, but in a sense quite meaningless. The overwhelming majority
of possible files do not exist, and will never exist. A real
compression program need only deal with files that actually do or
will exist -- its behavior on files that will never exist is
irrelevant.

One can argue that almost any sufficiently large actual file is
highly compressible, regardless of how random it might appear to
be. Any such file must have been generated by some process, and
that process is likely to be limited in size. For instance, a
common way to get "random" file for testing is to use a
pseudo-random number generator. But any such file is obviously
compressible down to the size required to describe the algorithm,
and its starting state. A compressed version of the C source for
generating the file is not likely to be very big.

Interesting thought on the counting argument. If you modify it to
count only files that do, or will, (two separate cases) exist,
what sort of limits do you get on compressibility?

--
Tom Gutman