Mathematical Combinations and Compression.

Bill Tuttle

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

It isn't a matter of finding the right algorithm.

Compression boils down to two factors, understanding the uniqueness of the
data and representing it efficiently.

This is generally considered as modeling the data and encoding it. This is
taking advantage of unique knowledge of the distribution of the data.
Whether this is done statically or dynamically.

Replace the term 'any' with 'random' and this can be proven mathematically.

If you are referring to 'any' as being a specific and determinalistic
dataset, then yes you can create a unique 'algorithym' or model to compress
it efficiently.

There will be many others who can describe these issues in far greater
detail and correctness

Bill

"Itai Bar-Haim" <av...@bezeqint.net> wrote in message
news:3951...@news.bezeqint.net...
> Hi.
> I have read in several posts already that it is impossible to compress
data
> to less than X bits since the number of mathematical combinations of bits
in
> a certain form has to be represented by X bits.
> That is not true.
> When dealing with compression, mathematical combinations has many times
> NOTHING to do with the compression algorithm.Compression usually uses
> dictionarries, sorting, and other types of not-mathematical ways to handle
> the data.
> That's why there must be a way to compress any data whatsoever, by finding
> the right algoritm.
>
> Itai.
>
>

Harvey Rook

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

Oh no! It's another infinite compressions is possible thread.. Any bets on
how long this will go on?

"Itai Bar-Haim" <av...@bezeqint.net> wrote in message
news:3951...@news.bezeqint.net...
> Hi.
> I have read in several posts already that it is impossible to compress
data
> to less than X bits since the number of mathematical combinations of bits
in
> a certain form has to be represented by X bits.
> That is not true.
> When dealing with compression, mathematical combinations has many times
> NOTHING to do with the compression algorithm.Compression usually uses
> dictionarries, sorting, and other types of not-mathematical ways to handle
> the data.

What is your definiation of a mathematical combination? From what I see,
dictionaries, huffman trees, sorted lists, etc are all types of
mathematical combinations.

> That's why there must be a way to compress any data whatsoever, by finding
> the right algoritm.
>

But, then you'd also have to include the algorithm in the compressed
results. This
sucks up space. Weither you like it or not, the Counting Argument still
holds true.

> Itai.

Harvey Rook

D.A.Kopf

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

Harvey Rook wrote:
>
> Oh no! It's another infinite compressions is possible thread.. Any bets on
> how long this will go on?

Of course, it must be realized that *all possible files* could take much
longer than the universe will exist to ever generate.

$1 says at least another twelve posts?

Dale King

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

Itai Bar-Haim wrote:
>
> Hi.
> I have read in several posts already that it is impossible to compress data
> to less than X bits since the number of mathematical combinations of bits in
> a certain form has to be represented by X bits.
> That is not true.
> When dealing with compression, mathematical combinations has many times
> NOTHING to do with the compression algorithm.Compression usually uses
> dictionarries, sorting, and other types of not-mathematical ways to handle
> the data.

> That's why there must be a way to compress any data whatsoever, by finding
> the right algoritm.

The process of compression and decompression can be considered as this
process:

input data ----> compressed data ----> original data

The mathematical combinations part tells you how many possible input
files you can possibly see. In your case that was shown to be around
2^817 possible input files. Let's imagine that we had a compression
algorithm that could compress all 2^817 input files by at least one bit.
That means that the compressed data would contain no more than 816 bits.
The problem is that would require multiple sets of input data to map to
the same compressed data file. This is because the number of data files
containing at most 816 bits is 2^817 - 1 (I am including a zero bit
file). That seems to say that we can compress all the files but 1, but
then there's a little problem of needing to know how many bits are in
the output. That has to somehow be encoded into the compressed data as
well and must be added to the size of the data. The naive way would be
to simply include 10 extra bits giving the size in bits. But if you add
these 10 bits to all of the compressed data files you now have a net
loss if you average over all possible input files. Certainly, some of
the input files will indeed compress down to less than 817 bits, but
they are in the vast minority. Most will actually grow in size. You can
cut down how much they grow by adding a bit that says whether the data
compressed or not. But this still means that over half of the input
files will grow by 1 bit. Still a net loss on average.

Note that this analysis did not look at any particular algorithm. It
merely looked at the input and output. Therefore the algorithm used does
not matter to the compressability of random data. You cannot escape the
fact that there is a theoretical lower limit for compression. That
depends on the characteristics of the input data source. In information
theory that is quanitified by entropy.

What compression software does is try to map the most frequently
occurring input files onto those compressed files that are smaller than
the input file. By doing so, there must exist input files that are
actually expanded. It is a mathematical law. But what we try to do is
make sure that the ones that are expanded by the compressor are very
rare.

You will see various compressors that do wonders on certain input files.
For instance compressing English text will often yield better than 50%
compression. But that is because the input source does not have high
entropy. The dictionary compressors you mention are simply exploiting
the fact that text is not random data. Dictionary compressors rely on
the fact that phrases are often repeated in the text. If that is not
true of your text it may actuall expand it. Other compressors may
exploit the facts like, if you see the the letters "elephan", the odds
of the next letter being t are nearly 100%. The odds of the next
character being Z is pretty much zero.

You have said that your data is random. That means it has high entropy.
There is way to predict what the next bit will be outside of the already
established rules for how many bits are set and how many total bits
there are. That information gets us down to 817 bits. Without any more
knowledge of the data you cannot do better than that for all possible
inputs. You can pick some that will be compressed, but at the cost of
expanding others.

--
--- Dale King

Recruiters: I am not willing to relocate outside of Indianapolis!

Dale King

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

But do you include this message in the count, since it is kind of a
meta-thread (a thread about the thread)?

Of course, how long these threads go on can only be governed by how long
it takes to sink into the person claiming the impossibile is possible.
Hopefully it will sink into Itai quickly.

On this topic though, is there a good web site that plainly explains the
counting principle? I have never been able to find a good one. Note, I
understand the counting principle, I just want a nice site to point
people to to cut threads like this short without having to take 1/2 an
hour to explain it myself.

"D.A.Kopf" wrote:
>
> Harvey Rook wrote:
> >
> > Oh no! It's another infinite compressions is possible thread.. Any bets on
> > how long this will go on?
>
> Of course, it must be realized that *all possible files* could take much
> longer than the universe will exist to ever generate.
>
> $1 says at least another twelve posts?

D.A.Kopf

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

Dale King wrote:
>
> But do you include this message in the count, since it is kind of a
> meta-thread (a thread about the thread)?

Certainly not. At least from now on. (That's four and counting)

>
> Of course, how long these threads go on can only be governed by how long
> it takes to sink into the person claiming the impossibile is possible.
> Hopefully it will sink into Itai quickly.
>
> On this topic though, is there a good web site that plainly explains the
> counting principle? I have never been able to find a good one. Note, I
> understand the counting principle, I just want a nice site to point
> people to to cut threads like this short without having to take 1/2 an
> hour to explain it myself.

Sesame Street has a quite understandable introduction to the topic. As I
recall, hosted by an animated character named "Count Dracula". They present
him with animated objects that occupy his animated space, and he counts them.
Don't know if he exists in a virtual-world web site. Even more unsure about
any relevance of this to the world *we* live in!

Itai Bar-Haim

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

Hi.
I have read in several posts already that it is impossible to compress data
to less than X bits since the number of mathematical combinations of bits in
a certain form has to be represented by X bits.
That is not true.
When dealing with compression, mathematical combinations has many times
NOTHING to do with the compression algorithm.Compression usually uses
dictionarries, sorting, and other types of not-mathematical ways to handle
the data.
That's why there must be a way to compress any data whatsoever, by finding
the right algoritm.

Itai.

Juan de Dios Santander Vela

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

en artículo 8ird1m$bnn$1...@pyrite.mv.net, Bill Tuttle
<bill....@imagauto.com> escribió:

> If you are referring to 'any' as being a specific and determinalistic
> dataset, then yes you can create a unique 'algorithym' or model to compress
> it efficiently.

An example: you can compress a zillion bytes file into a single bit...
as long those are exactly the zillion bytes in a row that bit is supposed
to represent... but if you knew it in the first time, why bother to trans-
mit even that bit? Then we have compressed a zillion bytes to nothing...
but then we have no information, for we knew what was going to be transmi-
tted...

So, if we want a general method to be useful to every possible stream,
there's no way around the counting principle.

--
Juan de Dios Santander Vela
Ingeniero en Electrónica (Electronics Engineer)
España (Spain)
e-mail: juand...@airtel.net ICQ: 22903952

Phil Norman

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

On Wed, 21 Jun 2000 17:46:10 -0500, Dale King <Ki...@TCE.com> wrote:
>But do you include this message in the count, since it is kind of a
>meta-thread (a thread about the thread)?

Well, it is caused by the original counting question, so it
probably counts as part of the discussion. Of course, the real
fun starts when sub-threads start getting renamed. Hopefully this
one won't be so enormous.

>On this topic though, is there a good web site that plainly explains the
>counting principle? I have never been able to find a good one. Note, I
>understand the counting principle, I just want a nice site to point
>people to to cut threads like this short without having to take 1/2 an
>hour to explain it myself.

Has anyone mentioned the FAQ yet? I don't think they have.
There's a wonderful explanation in section 9.2 of the FAQ which
provides a particularly clear example anyone should be able to
follow. URLs for the FAQ are:

ftp://rtfm.mit.edu/pub/usenet/news.answers/compression-faq/
http://www.landfield.com/faqs/compression-faq/part1/preamble.html
http://www.cis.ohio-state.edu/hypertext/faq/usenet/compression-faq/top.html

There's also a section slightly later on about software patents
which is rather amusing.

Cheers,
Phil

Paul McCanny

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

> An example: you can compress a zillion bytes file into a single bit...
>
> as long those are exactly the zillion bytes in a row that bit is
> supposed
> to represent... but if you knew it in the first time, why bother to
> trans-
> mit even that bit? Then we have compressed a zillion bytes to
> nothing...
> but then we have no information, for we knew what was going to be
> transmi-
> tted...

The bit is transmitted to let the decoder know that the zillion bit
message is included in the original message. If no information was
transmitted then no assumption could be made about what the message
contained (unless this is explicity defined in the decoder syntax).

paul

D.A.Kopf

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

Paul McCanny wrote:
>
> > An example: you can compress a zillion bytes file into a single bit...
> >
> > as long those are exactly the zillion bytes in a row that bit is supposed
> > to represent... but if you knew it in the first time, why bother to

> > transmit even that bit? Then we have compressed a zillion bytes to
> > nothing...but then we have no information, for we knew what was going to
> > be transmitted...

>
> The bit is transmitted to let the decoder know that the zillion bit
> message is included in the original message. If no information was
> transmitted then no assumption could be made about what the message
> contained (unless this is explicity defined in the decoder syntax).
>

And also to tell the decoder precisely when the message occurs, so that it can
make the right selection from among its vast store of bit-to-file mappings!-)

Dale King

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

I don't think the explanation is quite rigorous enough in the FAQ. What
it says is certainly true but for someone not familiar with the
arguments, it can easily be seen as "handwaving". I previously have gone
through the rigorous calculations. One are it doesn't plainly address is
where you consider a fixed size input buffer. It shows all files >= N,
but not all files exactly equal to N. And it doesn't address cases like
this where the size of the input file is large, but not all possible
input files are allowed.

Phil Norman wrote:
>
> >On this topic though, is there a good web site that plainly explains the
> >counting principle? I have never been able to find a good one. Note, I
> >understand the counting principle, I just want a nice site to point
> >people to to cut threads like this short without having to take 1/2 an
> >hour to explain it myself.
>
> Has anyone mentioned the FAQ yet? I don't think they have.
> There's a wonderful explanation in section 9.2 of the FAQ which
> provides a particularly clear example anyone should be able to
> follow. URLs for the FAQ are:
>
> ftp://rtfm.mit.edu/pub/usenet/news.answers/compression-faq/
> http://www.landfield.com/faqs/compression-faq/part1/preamble.html
> http://www.cis.ohio-state.edu/hypertext/faq/usenet/compression-faq/top.html
>
> There's also a section slightly later on about software patents
> which is rather amusing.

mikael.l...@spray.se

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

Hi!
How true it is what you are saying ...
I almost suspect that when people come with a lot of difficult theory to
explain why it's impossible to solve this or that problem, they actually
really don't have a clue. The fact remains, the most powerful tool you
have is your mind. If your attitude is that nothing is impossible and you
try and try again, NEVER EVER giving up, then the problem is destined to
be solved no matter what the NO-sayers tries to tell you!
Where would we be if not Edison, Marconi or Einstein, with this attitude,
had not done what they did?

Best regards and good luck,
Mikael

In article <3951...@news.bezeqint.net>,

Sent via Deja.com http://www.deja.com/
Before you buy.

Juan de Dios Santander Vela

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

en artículo 8itti2$m7t$1...@nnrp1.deja.com, mikael.l...@spray.se
<mikael.l...@spray.se> escribió:

> I almost suspect that when people come with a lot of difficult theory to
> explain why it's impossible to solve this or that problem, they actually
> really don't have a clue. The fact remains, the most powerful tool you
> have is your mind. If your attitude is that nothing is impossible and you
> try and try again, NEVER EVER giving up, then the problem is destined to
> be solved no matter what the NO-sayers tries to tell you!
> Where would we be if not Edison, Marconi or Einstein, with this attitude,
> had not done what they did?

It's not only a matter of attitude: they realized which were the difficul-
ties... and went other way. If you want to compress EVERY possible stream
losslessly, you're under the counting argument. So if you want to get rid
of it, you'll have to be out of its range.

And, about the NO-sayers, anyone with a perpetual movement machine? Every-
body says you can't, but if you try and try...

mikael.l...@spray.se

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

Hello!
I'm a software engineer who have been developing compression software a
couple of years. In the 70's scientists said that it was impossible to
compress to less than what the entropy of the file described. Today it's
possible to compress far far better than to this limit using bwt, ppm or
ctw algorithms as preprocessors.
When speculations about sending people to the moon came in the 50's, the
common man laughed at it and stated categorically that IT-IS-IMPOSSIBLE
and laughed some more. We all know the outcome of that too ...
From Apple:

While some see them as the crazy ones,
we see genius.

Because the people who are crazy enough to think
they can change the world, are the ones who do.

You're right, there should always be some realism included when making
decisions what to do with problems through life. But we don't know
everything now and will most certainly never do. So, that's why we need
these crazy people that are crazy enough to think they can change the
world. We should not inhibit them with ill-educated "advice" but
encourage them!

Best regards,
Mikael Lundqvist
Software engineer
Sweden

In article <B5784A58.7FD9%juand...@airtel.net>,

Dale King

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

And I don't care how much effort you put forth you cannot get 10 gallons
of water into a 5 gallon container (perhaps with a black hole, but then
we would have to be more rigorous on our definition of a gallon). That
in a nutshell is the counting argument! Or alternatively as the
pigeon-hole principle you can't put n pigeons into n-1 holes without
putting 2 pigeons in the same hole. Those aren't complicated theories,
they are simple logic.

Feel free to try all you want and don't give up. When you find a way
around these problems be sure to let us know :-)

And funny you should mention Einstein. He seemed to put some limits on
the universe like not being able to go beyond the speed of light. He
must be wrong. If we put forth enough effort we should be able to break
the speed of light barrier.

(Have we hit 12 messages on this topic yet?)

mikael.l...@spray.se wrote:
>
> How true it is what you are saying ...

> I almost suspect that when people come with a lot of difficult theory to
> explain why it's impossible to solve this or that problem, they actually
> really don't have a clue. The fact remains, the most powerful tool you
> have is your mind. If your attitude is that nothing is impossible and you
> try and try again, NEVER EVER giving up, then the problem is destined to
> be solved no matter what the NO-sayers tries to tell you!
> Where would we be if not Edison, Marconi or Einstein, with this attitude,
> had not done what they did?

--

Antaeus Feldspar

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

Phil Norman <Phil....@genedata.com> wrote:

> On Wed, 21 Jun 2000 17:46:10 -0500, Dale King <Ki...@TCE.com> wrote:
> >But do you include this message in the count, since it is kind of a
> >meta-thread (a thread about the thread)?
>
> Well, it is caused by the original counting question, so it
> probably counts as part of the discussion. Of course, the real
> fun starts when sub-threads start getting renamed. Hopefully this
> one won't be so enormous.
>
>

> >On this topic though, is there a good web site that plainly explains the
> >counting principle? I have never been able to find a good one. Note, I
> >understand the counting principle, I just want a nice site to point
> >people to to cut threads like this short without having to take 1/2 an
> >hour to explain it myself.
>
> Has anyone mentioned the FAQ yet? I don't think they have.
> There's a wonderful explanation in section 9.2 of the FAQ which
> provides a particularly clear example anyone should be able to

> follow. URLs for the FAQ are:

Actually, it's not all that clear. IMHO, anyways. I've been
trying for quite a while to come up with a way to express it that will
make the principle involved even clearer. I *think* I have something,
but I haven't had a lot of time to work on it.

> ftp://rtfm.mit.edu/pub/usenet/news.answers/compression-faq/
> http://www.landfield.com/faqs/compression-faq/part1/preamble.html
> http://www.cis.ohio-state.edu/hypertext/faq/usenet/compression-faq/top.html
>
> There's also a section slightly later on about software patents
> which is rather amusing.
>

> Cheers,
> Phil

-jc

--
* -jc IS *NOW* feld...@cryogen.com
* Home page: http://members.tripod.com/~afeldspar/index.html
* The home of >>Failed Pilots Playhouse<<
* "Better you hold me close than understand..." Thomas Dolby

D.A.Kopf

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

Dale King wrote:
>
> And I don't care how much effort you put forth you cannot get 10 gallons
> of water into a 5 gallon container (perhaps with a black hole, but then
> we would have to be more rigorous on our definition of a gallon). That
> in a nutshell is the counting argument! Or alternatively as the
> pigeon-hole principle you can't put n pigeons into n-1 holes without
> putting 2 pigeons in the same hole. Those aren't complicated theories,
> they are simple logic.

Pigeons? Why pigeons? I thought we were talking about information. But if you
insist on pigeons, it appears that unlimited geographical information can be
crammed into their tiny finite brains.

>
> Feel free to try all you want and don't give up. When you find a way
> around these problems be sure to let us know :-)
>
> And funny you should mention Einstein. He seemed to put some limits on
> the universe like not being able to go beyond the speed of light. He
> must be wrong. If we put forth enough effort we should be able to break
> the speed of light barrier.

The current thinking is that tachyons actually have to gain energy to slow down.

>
> (Have we hit 12 messages on this topic yet?)

One more, by my count.

mikael.l...@spray.se

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

It's the attitude and spirit I'm trying to explain and encourage.
If we let "the common man", mr Smith, decide what is possible and what we
should do, then you can count on that innovations, inventions or great
break throughs will never happen because he's saying "It's impossible".
Those great men and women through history have always been laughed at by
people who did not even understand what they did. We all still benefit a
lot from what those great men and women did.

Mikael

In article <3952941D...@TCE.com>,

Dale King <Ki...@TCE.com> wrote:
> And I don't care how much effort you put forth you cannot get 10 gallons
> of water into a 5 gallon container (perhaps with a black hole, but then
> we would have to be more rigorous on our definition of a gallon). That
> in a nutshell is the counting argument! Or alternatively as the
> pigeon-hole principle you can't put n pigeons into n-1 holes without
> putting 2 pigeons in the same hole. Those aren't complicated theories,
> they are simple logic.
>

> Feel free to try all you want and don't give up. When you find a way
> around these problems be sure to let us know :-)
>
> And funny you should mention Einstein. He seemed to put some limits on
> the universe like not being able to go beyond the speed of light. He
> must be wrong. If we put forth enough effort we should be able to break
> the speed of light barrier.
>

> (Have we hit 12 messages on this topic yet?)
>

> mikael.l...@spray.se wrote:
> >
> > How true it is what you are saying ...
> > I almost suspect that when people come with a lot of difficult theory to
> > explain why it's impossible to solve this or that problem, they actually
> > really don't have a clue. The fact remains, the most powerful tool you
> > have is your mind. If your attitude is that nothing is impossible and you
> > try and try again, NEVER EVER giving up, then the problem is destined to
> > be solved no matter what the NO-sayers tries to tell you!
> > Where would we be if not Edison, Marconi or Einstein, with this attitude,
> > had not done what they did?
>
> --
> --- Dale King
>
> Recruiters: I am not willing to relocate outside of Indianapolis!
>

kaptke...@my-deja.com

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

> One more, by my count.
>

Well, let me be that one more but adding that you can fit more pigeons
per hole if you first apply a blender transform. But then we are
talking about lossy compression!

-Michael

--
Michael A Maniscalco
Visit my homepage on the new M99 data compression scheme at
http://michaelmaniscalco.homepage.com/M99index.htm

s...@nospam.unt.edu

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

mikael.l...@spray.se wrote:

> It's the attitude and spirit I'm trying to explain and encourage.
> If we let "the common man", mr Smith, decide what is possible and what we
> should do, then you can count on that innovations, inventions or great
> break throughs will never happen because he's saying "It's impossible".

But the attitude and spirit should be directed in a fruitful
direction. When something is easily and clearly shown to be
impossible, encouraging someone to try the impossible is just plain
stupid.

You mentioned Einstein in another posting. Here's a quote from
Einstein:

"One reason why mathematics enjoys special esteem, above all other
sciences, is that its laws are absolutely certain and
indisputable, while those of all other sciences are to some extent
debatable and in constant danger of being overthrown by newly
discovered facts." -- Albert Einstein, 1921

That sums up this situation perfectly. While the physical sciences
(where these other breakthroughs happened that you talked about) might
say something is impossible it is always with the caveat that it's
impossible if our understanding of the basic physical laws is correct.
On the other hand, compression is a purely mathematical process, and
we understand the laws completely and unequivocally. That's why we
can say such compression is impossible, and it really does mean
impossible.

Finally, in a later posting you made some bizarre claims about entropy
and compression. First you demonstrated that you didn't know what
entropy was (by referring to entropy of a file rather than entropy of
a source), and then you claimed that people "overcame the impossible"
by compressing better than entropy. Well, here's some news for you:
You *CAN'T* compress better than entropy. The compressors you
mentioned (BWT, PPM, etc) in fact compress slightly worse than the
entropy of a source, and there will never be a compressor that can
compress (on average) to less than the entropy of the source. Shannon
proved this back in the 40's, and it's one of those absolutely true
facts that will never, ever be contradicted... (now there are some
arguments about whether "real world data" is accurately modeled with
the probabilistic models that you have to use for entropy, but that's
a different question).

--
Steve Tate --- srt[At]cs.unt.edu | Gratuitously stolen quote:
Dept. of Computer Sciences | "The box said 'Requires Windows 95, NT,
University of North Texas | or better,' so I installed Linux."
Denton, TX 76201 |

Tim

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

s...@nospam.unt.edu wrote in article <8ium9f$477$1...@hermes.acs.unt.edu>...

> mikael.l...@spray.se wrote:
>
>(now there are some
> arguments about whether "real world data" is accurately modeled with
> the probabilistic models that you have to use for entropy, but that's
> a different question).
>

No it is not a different question. It goes right to the heart of the matter
of the thread. If these arguments exist and have merit (to which I have no
idea), then we have an 'alternate universe' of models that we need to deal
with. I doubt that it is the case, given the maturity of the science (info
theory), but I suppose its possible.

Tim

mikael.l...@spray.se

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

In article <8ium9f$477$1...@hermes.acs.unt.edu>,

s...@nospam.unt.edu wrote:
> mikael.l...@spray.se wrote:
>
> > It's the attitude and spirit I'm trying to explain and encourage.
> > If we let "the common man", mr Smith, decide what is possible and what we
> > should do, then you can count on that innovations, inventions or great
> > break throughs will never happen because he's saying "It's impossible".
>
> But the attitude and spirit should be directed in a fruitful
> direction. When something is easily and clearly shown to be
> impossible, encouraging someone to try the impossible is just plain
> stupid.
>

Of course! Eagerness should not replace good judgement!
In Sweden we have a saying though, very roughly:
"It's better to try and fail than not to try at all."

> You mentioned Einstein in another posting. Here's a quote from
> Einstein:
>
> "One reason why mathematics enjoys special esteem, above all other
> sciences, is that its laws are absolutely certain and
> indisputable, while those of all other sciences are to some extent
> debatable and in constant danger of being overthrown by newly
> discovered facts." -- Albert Einstein, 1921
>
> That sums up this situation perfectly. While the physical sciences
> (where these other breakthroughs happened that you talked about) might
> say something is impossible it is always with the caveat that it's
> impossible if our understanding of the basic physical laws is correct.
> On the other hand, compression is a purely mathematical process, and
> we understand the laws completely and unequivocally. That's why we
> can say such compression is impossible, and it really does mean
> impossible.
>

It's a fact though that all of today's universal compression algorithms
and implementations are far from the "magical limit" or what the entropy
allows, if I dare to use that word. We can also say that in the practical
world we don't want to pack a file that's been packed already. That would
be plain stupid too! (So what the heck are we debating about?)

> Finally, in a later posting you made some bizarre claims about entropy
> and compression. First you demonstrated that you didn't know what
> entropy was (by referring to entropy of a file rather than entropy of
> a source), and then you claimed that people "overcame the impossible"
> by compressing better than entropy. Well, here's some news for you:
> You *CAN'T* compress better than entropy. The compressors you
> mentioned (BWT, PPM, etc) in fact compress slightly worse than the
> entropy of a source, and there will never be a compressor that can
> compress (on average) to less than the entropy of the source. Shannon
> proved this back in the 40's, and it's one of those absolutely true

> facts that will never, ever be contradicted... (now there are some

> arguments about whether "real world data" is accurately modeled with
> the probabilistic models that you have to use for entropy, but that's
> a different question).
>

I know what entropy means. "Disorder" or "randomness". I should have used
the word "statistics" instead, I was thinking about e.g. huffman coding.
And it's easy to pack a file better than with huffman only, if we use
"bwt, ppm or ctw as preprocessor" isn't it?
I think this also very clearly demonstrates the difference between
scientists like you and engineers like me. For me it's not a lot of
theory, math or carefully laid up proofs of something. For me it's much
more interesting to see what I can do with it, practically. And I don't
always trust what other people tell me, even if it's told by a scientist!
Critical thinking is a good and necessary tool too!
Are you really sure that YOU know how to measure the "source's" entropy?
If you don't, then what is all this theory worth?

Thanks for the feedback! I suppose one should be careful with the words
one choose. The whole world could be reading it ... I'm not good with
words. I suppose that's why I became a programmer instead!

Mikael

Phil Norman

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

On Thu, 22 Jun 2000 20:39:32 GMT,
mikael.l...@spray.se <mikael.l...@spray.se> wrote:
>How true it is what you are saying ...
>I almost suspect that when people come with a lot of difficult theory to
>explain why it's impossible to solve this or that problem, they actually
>really don't have a clue. The fact remains, the most powerful tool you
>have is your mind. If your attitude is that nothing is impossible and you
>try and try again, NEVER EVER giving up, then the problem is destined to
>be solved no matter what the NO-sayers tries to tell you!
>Where would we be if not Edison, Marconi or Einstein, with this attitude,
>had not done what they did?

Absolutely. I, personally, have been trying to get 1+1 to equal 3
for a good number of years now. I've not managed it yet (when I
do, I'll post the results on http://inkvine.fluff.org/~forrey/),
but *I'm* certainly not giving up just because mathematicians say
it's impossible. What do they know? They're just constrained by
all the prejudices they've been taught during their studies.

What the world needs is new, original thought. Good luck breaking
the counting argument! Einstein, Newton and all their friends
will be proud.

Cheers,
Phil

Phil Norman

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

On Fri, 23 Jun 2000 02:35:32 GMT,
kaptke...@my-deja.com <kaptke...@my-deja.com> wrote:
>
>> One more, by my count.
>
>Well, let me be that one more but adding that you can fit more pigeons
>per hole if you first apply a blender transform. But then we are
>talking about lossy compression!

However, one of those rare forms of lossy compression where the
end result is better than the uncompressed form.

Cheers,
Phil

--
:wq

Phil Norman

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

On 22 Jun 2000 20:17:25 EDT, D.A.Kopf <d...@dakx.com> wrote:
>
>Pigeons? Why pigeons? I thought we were talking about information. But if you
>insist on pigeons, it appears that unlimited geographical information can be
>crammed into their tiny finite brains.

Pigeon navigation might be impressive, but let's not blow the
whole thing out of proportion, eh? Unless you can come up with
some evidence that there's no limit to the amount of geographical
information which can be stored in a pigeon's brain, I'm not going
to take the above paragraph at all seriously.

Cheers,
Phil

--
:wq

Phil Norman

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

On Thu, 22 Jun 2000 19:59:03 -0400,
Antaeus Feldspar <feld...@cryogen.com> wrote:

>Phil Norman <Phil....@genedata.com> wrote:
>
>> Has anyone mentioned the FAQ yet? I don't think they have.
>> There's a wonderful explanation in section 9.2 of the FAQ which
>> provides a particularly clear example anyone should be able to
>> follow. URLs for the FAQ are:
>
> Actually, it's not all that clear. IMHO, anyways. I've been
>trying for quite a while to come up with a way to express it that will
>make the principle involved even clearer. I *think* I have something,
>but I haven't had a lot of time to work on it.

Well, when I read it (I refer particularly to the bit which
states that since there are 2^n possible n-bit files, you can't
represent all those values in 2^(n-1) bits) I found that
explanation really quite beautiful in its simplicity. However,
I've watched that argument fail to be absorbed by enough people to
realise that it's probably not clear enough.

Good luck in finding something better though - I've not thought a
great deal about how one would explain it, but it strikes me as a
difficult concept to get across in unquestionably simple terms.

Cheers,
Phil

--
:wq

D.A.Kopf

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

No, I have no data on their maximum capacity. This was to subtly convey the
point of view that observation of what actually happens in the real world
should take precedence over mathematical theories about what ought to happen.

Alan Morgan

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

In article <slrn8l613a.t6...@jalua.ch.genedata.com>,

Phil Norman <Phil....@genedata.com> wrote:
>On Thu, 22 Jun 2000 20:39:32 GMT,
> mikael.l...@spray.se <mikael.l...@spray.se> wrote:
>>How true it is what you are saying ...
>>I almost suspect that when people come with a lot of difficult theory to
>>explain why it's impossible to solve this or that problem, they actually
>>really don't have a clue. The fact remains, the most powerful tool you
>>have is your mind. If your attitude is that nothing is impossible and you
>>try and try again, NEVER EVER giving up, then the problem is destined to
>>be solved no matter what the NO-sayers tries to tell you!
>>Where would we be if not Edison, Marconi or Einstein, with this attitude,
>>had not done what they did?
>
>Absolutely. I, personally, have been trying to get 1+1 to equal 3
>for a good number of years now. I've not managed it yet (when I
>do, I'll post the results on http://inkvine.fluff.org/~forrey/),
>but *I'm* certainly not giving up just because mathematicians say
>it's impossible. What do they know? They're just constrained by
>all the prejudices they've been taught during their studies.

Good work! Might I suggest that, when you have finished this, you
start to look for the even prime numbers greater than 2. Mathematicians
claim they don't exist, but they haven't checked all of them so they
can't possibly know. I'm pretty sure that given the number of even
numbers greater than two that *one* of them has to be prime.

Alan

s...@nospam.unt.edu

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

Tim <ther...@earthlink.net> wrote:

> s...@nospam.unt.edu wrote in article <8ium9f$477$1...@hermes.acs.unt.edu>...
>> mikael.l...@spray.se wrote:
>>

>>(now there are some
>> arguments about whether "real world data" is accurately modeled with
>> the probabilistic models that you have to use for entropy, but that's
>> a different question).
>>

> No it is not a different question. It goes right to the heart of the matter

> of the thread. If these arguments exist and have merit (to which I have no
> idea), then we have an 'alternate universe' of models that we need to deal
> with. I doubt that it is the case, given the maturity of the science (info
> theory), but I suppose its possible.

Of course it's a different question. I can define a source, compute
the entropy, and guarantee you that no compressor you can come up with
compress data from that source any smaller than the entropy allows. I
can prove that you can't do better than entropy. Ever.

But the question of whether it makes sense to talk about entropy and
probabilistic sources of real world data is different. But I was
responding to someone talking about entropy, not real world data.

s...@nospam.unt.edu

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

mikael.l...@spray.se wrote:

> I know what entropy means. "Disorder" or "randomness". I should have used
> the word "statistics" instead, I was thinking about e.g. huffman coding.
> And it's easy to pack a file better than with huffman only, if we use
> "bwt, ppm or ctw as preprocessor" isn't it?

But an order-0 Huffman coder is not a universal compressor. The fact
that it can be beat isn't a surprise to anyone (in fact, Shannon's
original paper defining entropy talks about this).

> I think this also very clearly demonstrates the difference between
> scientists like you and engineers like me. For me it's not a lot of
> theory, math or carefully laid up proofs of something.

But don't you think you'd be better off if you understood the theory?
I'm not saying everyone has to be a scientist and work on new
theories, but you need to understand what's there. An engineer
shouldn't just try random stuff in the hopes that something will
work. You need to be strong in the foundations if you expect to make
progress.

> Are you really sure that YOU know how to measure the "source's" entropy?
> If you don't, then what is all this theory worth?

Yes, I'm absolutely sure I know how to measure a source's entropy.
It's a fairly simple theory and formula. Here's a simple example: my
source generates numbers 0-255 (bytes) all random, uniform, and
independent. The entropy of that source is exactly 8 bits per symbol,
and you will never be able to compress it any better than that.

Dale King

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

mikael.l...@spray.se wrote:
>
> It's the attitude and spirit I'm trying to explain and encourage.
> If we let "the common man", mr Smith, decide what is possible and what we
> should do, then you can count on that innovations, inventions or great
> break throughs will never happen because he's saying "It's impossible".

> Those great men and women through history have always been laughed at by
> people who did not even understand what they did. We all still benefit a
> lot from what those great men and women did.

I think you've got it backwards in this case. The common man (you)
thinks it can be done because he doesn't understand the mathematics.
Those who do understand know that it is impossible and are laughing at
those who claim it is possible.

Dale King

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

mikael.l...@spray.se wrote:
>
> Hello!
> I'm a software engineer who have been developing compression software a
> couple of years. In the 70's scientists said that it was impossible to
> compress to less than what the entropy of the file described. Today it's
> possible to compress far far better than to this limit using bwt, ppm or
> ctw algorithms as preprocessors.

To reiterate what someone else said, a file does not have an entropy
limit. A source has an entropy limit. And I can certainly compress A
file below the source's entropy limit, but any such lossless compression
algorithm MUST necessarily compress most of the other files from that
source to a size greater than the entropy limit. If you average the
compression over all possible messages from that source you will be at
or above the entropy limit. That is an inescapable mathematical law.

All those algorithms that you mentioned do is try to be better about
selecting which of the possible messages from a source get mapped to the
smaller files. They look for specific types of non-randomness to the
data. Some algorithms are better than others on detecting the non random
patterns. None will let you compress all files from a random source
below the source's entropy limit.

> When speculations about sending people to the moon came in the 50's, the
> common man laughed at it and stated categorically that IT-IS-IMPOSSIBLE
> and laughed some more. We all know the outcome of that too ...

Once again you are confusing impossible with difficult or impractical.
The impossible in this might be better stated as illogical. Putting a
man on the moon in the 50's would definitely appear impossible, but that
is simply because it would be very difficult and impractical. It was not
however illogical. Logically there was no reason why someone couldn't be
on the moon. Compression below a source's entropy limit, like the even
prime number > 2, are logical impossibilities. The existance of such a
prime number or such a compressor is not logical.

> From Apple:
>
> While some see them as the crazy ones,
> we see genius.
>
> Because the people who are crazy enough to think
> they can change the world, are the ones who do.
>
> You're right, there should always be some realism included when making
> decisions what to do with problems through life. But we don't know
> everything now and will most certainly never do. So, that's why we need
> these crazy people that are crazy enough to think they can change the
> world. We should not inhibit them with ill-educated "advice" but
> encourage them!
>
> Best regards,
> Mikael Lundqvist
> Software engineer
> Sweden
>
> In article <B5784A58.7FD9%juand...@airtel.net>,
> juand...@airtel.net wrote:
> > en artículo 8itti2$m7t$1...@nnrp1.deja.com, mikael.l...@spray.se
> > <mikael.l...@spray.se> escribió:
> >

> > > I almost suspect that when people come with a lot of difficult theory to
> > > explain why it's impossible to solve this or that problem, they actually
> > > really don't have a clue. The fact remains, the most powerful tool you
> > > have is your mind. If your attitude is that nothing is impossible and you
> > > try and try again, NEVER EVER giving up, then the problem is destined to
> > > be solved no matter what the NO-sayers tries to tell you!
> > > Where would we be if not Edison, Marconi or Einstein, with this attitude,
> > > had not done what they did?
> >

> > It's not only a matter of attitude: they realized which were the difficul-
> > ties... and went other way. If you want to compress EVERY possible stream
> > losslessly, you're under the counting argument. So if you want to get rid
> > of it, you'll have to be out of its range.
> >
> > And, about the NO-sayers, anyone with a perpetual movement machine? Every-
> > body says you can't, but if you try and try...
> >
> > --
> > Juan de Dios Santander Vela
> > Ingeniero en Electrónica (Electronics Engineer)
> > España (Spain)
> > e-mail: juand...@airtel.net ICQ: 22903952
> >
> >
>

> Sent via Deja.com http://www.deja.com/
> Before you buy.

--

Dale King

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

Note, a blender transform is not necessary. The pigeon-hole principle
places no restrictions on the size of the holes. It doesn't matter if
your holes can hold 1000 pigeons. If you have n pigeons and n - 1 holes,
you cannot put the pigeons in the holes without having more than one
pigeon per whole.

All a blender transform will let you do is spread that extra pigeon
among several holes, but you will still end up with a quantity of
pigeons > 1 in at least one hole. If you fully apply the blender
transform you will get 1 + 1/n pigeons per hole. ;-). As n approaches
infinity the average number of pigeons per hole approaches 1 but does
not equal 1.

kaptke...@my-deja.com wrote:
>
> > One more, by my count.
> >
>
> Well, let me be that one more but adding that you can fit more pigeons
> per hole if you first apply a blender transform. But then we are
> talking about lossy compression!

--

Dale King

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

"D.A.Kopf" wrote:
>
> Pigeons? Why pigeons? I thought we were talking about information. But if you
> insist on pigeons, it appears that unlimited geographical information can be
> crammed into their tiny finite brains.

Form the pigeons I've seen I would think the information capacity of
their brains was one or two bits at most ;-)

D.A.Kopf

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

s...@nospam.unt.edu wrote:
<snip>

>
> Yes, I'm absolutely sure I know how to measure a source's entropy.
> It's a fairly simple theory and formula. Here's a simple example: my
> source generates numbers 0-255 (bytes) all random, uniform, and
> independent. The entropy of that source is exactly 8 bits per symbol,
> and you will never be able to compress it any better than that.
>

Gotcha, Hah! Please describe the implementation of your source, along with the
necessary theoretical or empirical evidence that its output is impossible for
others to predict.

PS - What do you think about the possibility of googleflop quantum computers?

s...@nospam.unt.edu

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

D.A.Kopf <d...@dakx.com> wrote:
> s...@nospam.unt.edu wrote:

>> Yes, I'm absolutely sure I know how to measure a source's entropy.
>> It's a fairly simple theory and formula. Here's a simple example: my
>> source generates numbers 0-255 (bytes) all random, uniform, and
>> independent. The entropy of that source is exactly 8 bits per symbol,
>> and you will never be able to compress it any better than that.

> Gotcha, Hah! Please describe the implementation of your source,
> along with the necessary theoretical or empirical evidence that its
> output is impossible for others to predict.

Huh? I just completely described the source above. You can plug it
into the entropy formula to see that the entropy is indeed 8
bits/symbol. And you can read Shannon's proof that no compressor can
ever beat that. What's possibly left?

> PS - What do you think about the possibility of googleflop quantum
> computers?

Irrelevant. Quantum computers don't change this one iota...

D.A.Kopf

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

s...@nospam.unt.edu wrote:
>
> D.A.Kopf <d...@dakx.com> wrote:
> > s...@nospam.unt.edu wrote:
>
> >> Yes, I'm absolutely sure I know how to measure a source's entropy.
> >> It's a fairly simple theory and formula. Here's a simple example: my
> >> source generates numbers 0-255 (bytes) all random, uniform, and
> >> independent. The entropy of that source is exactly 8 bits per symbol,
> >> and you will never be able to compress it any better than that.
>
> > Gotcha, Hah! Please describe the implementation of your source,
> > along with the necessary theoretical or empirical evidence that its
> > output is impossible for others to predict.
>
> Huh? I just completely described the source above. You can plug it
> into the entropy formula to see that the entropy is indeed 8
> bits/symbol. And you can read Shannon's proof that no compressor can
> ever beat that. What's possibly left?

OK, I'll buy your guaranteed random source, please send it COD. Warning,
substantial penalties may apply if it doesn't perform as advertised!

>
> > PS - What do you think about the possibility of googleflop quantum
> > computers?
>
> Irrelevant. Quantum computers don't change this one iota...

Interesting origin of this expression. Has (I think) to do with the added iota
going from the greek "homoousios" (made of the same substance) to homoiousios
(made of a similar substance). When applied to the relationship between Jesus
and God, the early Christian church came to regard the latter as a dangerous
heresy. So however insignificant one iota may seem today, it meant a great
deal to those who were executed because their theoretical constructs
conflicted with the contemporary reality.

kaptke...@my-deja.com

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

Ah, but if we didn't scrape the blender clean after transform (leaving
just over one pigeon's worth on the interior of the blender) then we
could market such a transform as able to compress losslessly and
recursively on all data (plus some marginal bookkeeping (the remainder
of the pigeon in the blender). We could sell it to anyone who does not
comprehend math. (ofcorse we would never actually release any working
product).

-michael

--
Michael A Maniscalco
Visit my homepage on the new M99 data compression scheme at
http://michaelmaniscalco.homepage.com/M99index.htm

In article <3953C26C...@TCE.com>,

> --
> --- Dale King
>
> Recruiters: I am not willing to relocate outside of Indianapolis!
>

Sent via Deja.com http://www.deja.com/
Before you buy.

Nasir Rajpoot

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

Phil Norman wrote:

> What the world needs is new, original thought. Good luck breaking
> the counting argument! Einstein, Newton and all their friends
> will be proud.

Oh, but I am sure Shannon will not be so proud :)-

Cheers,
Nasir

Antaeus Feldspar

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

Phil Norman <Phil....@genedata.com> wrote:

Well, here's the basic approach I'm working on: telling people
to visualize all the possible files *numbered*... numbering the two
one-bit files[1] 0 and 1, the four two-bit files 2, 3, 4 and 5, the
eight three-bit files... well, you get the idea.
Then I'm examining what it means to 'compress' a file: a file
with one number actually takes the place of a file with a lower number
(not all files with lower numbers are shorter files, but all shorter
files must have lower numbers.) You are now telling the decompressor:
"When you see the file whose number is 4,506; actually restore the file
whose number is 789,054."
But, what happens to the file whose number *started out* as
4,506? That file has lost its number, and needs another one... and if
our claim that any file can be compressed is true, it must be an even
lower number... and hopefully anyone really *can* see from here that
sooner or later you wind up having to try and find numbers lower than 0.
Hopefully, it's easier from there to handle claims like
"compresses everything, as long as it's above x number of bytes" (that
identifies a *fixed* number of files that can be evicted from their
numbers; therefore only a *fixed* number of files can be awarded their
places) and the issue of bytestrings rather than bitstrings (bytestrings
are actually worse; if you reduce a bytestring to a bitstring 15 bits
long, that actually evicts *two* bytestrings, not just one...)

-jc

[1] To force people to confront the principle first, I'm starting with
bitstrings rather than bytestrings.

Tom Gutman

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

"Dale King" <Ki...@TCE.com> wrote in message
news:395145B2...@TCE.com...

> But do you include this message in the count, since it is kind
of a
> meta-thread (a thread about the thread)?
>

> Of course, how long these threads go on can only be governed by
how long
> it takes to sink into the person claiming the impossibile is
possible.
> Hopefully it will sink into Itai quickly.

>
> On this topic though, is there a good web site that plainly
explains the
> counting principle? I have never been able to find a good one.
Note, I
> understand the counting principle, I just want a nice site to
point
> people to to cut threads like this short without having to take
1/2 an
> hour to explain it myself.
>

> "D.A.Kopf" wrote:
> >
> > Harvey Rook wrote:
> > >
> > > Oh no! It's another infinite compressions is possible
thread.. Any bets on
> > > how long this will go on?
> >
> > Of course, it must be realized that *all possible files* could
take much
> > longer than the universe will exist to ever generate.
> >
> > $1 says at least another twelve posts?

> --
> --- Dale King
>
> Recruiters: I am not willing to relocate outside of
Indianapolis!

This could take a while, because so much of what is said is
absolutely true, and absolutely meaningless. Itai's statement
that any file can be compressed -- you just need to find the right
algorithm -- is quite true (for any given file the simplest best
algorithm is one that exchanges that file with the zero length
file and leaves other files unchanged) and quite meaningless (what
good is compressing just one possible file?)

The counting argument also has a major flaw. It is again quite
true, but in a sense quite meaningless. The overwhelming majority
of possible files do not exist, and will never exist. A real
compression program need only deal with files that actually do or
will exist -- its behavior on files that will never exist is
irrelevant.

One can argue that almost any sufficiently large actual file is
highly compressible, regardless of how random it might appear to
be. Any such file must have been generated by some process, and
that process is likely to be limited in size. For instance, a
common way to get "random" file for testing is to use a
pseudo-random number generator. But any such file is obviously
compressible down to the size required to describe the algorithm,
and its starting state. A compressed version of the C source for
generating the file is not likely to be very big.

Interesting thought on the counting argument. If you modify it to
count only files that do, or will, (two separate cases) exist,
what sort of limits do you get on compressibility?

--
Tom Gutman

Alan Morgan

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

In article <3953DB04...@dakx.com>, D.A.Kopf <d...@dakx.com> wrote:
>s...@nospam.unt.edu wrote:
>>
>> D.A.Kopf <d...@dakx.com> wrote:
>> > s...@nospam.unt.edu wrote:
>>
>> >> Yes, I'm absolutely sure I know how to measure a source's entropy.
>> >> It's a fairly simple theory and formula. Here's a simple example: my
>> >> source generates numbers 0-255 (bytes) all random, uniform, and
>> >> independent. The entropy of that source is exactly 8 bits per symbol,
>> >> and you will never be able to compress it any better than that.
>>
>> > Gotcha, Hah! Please describe the implementation of your source,
>> > along with the necessary theoretical or empirical evidence that its
>> > output is impossible for others to predict.
>>
>> Huh? I just completely described the source above. You can plug it
>> into the entropy formula to see that the entropy is indeed 8
>> bits/symbol. And you can read Shannon's proof that no compressor can
>> ever beat that. What's possibly left?
>
>OK, I'll buy your guaranteed random source, please send it COD. Warning,
>substantial penalties may apply if it doesn't perform as advertised!

He didn't claim that he actually *had* one, he was just describing
one. Fortunatly, I can supply you with one. Take a look at
http://www.fourmilab.to/hotbits.

If you believe that radioactive decay isn't actually random then feel
free to try to compress this data. Let us know when you give up.

Alan

D.A.Kopf

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

Alan Morgan wrote:
>
> In article <3953DB04...@dakx.com>, D.A.Kopf <d...@dakx.com> wrote:
> >s...@nospam.unt.edu wrote:

<snip>

> >
> >OK, I'll buy your guaranteed random source, please send it COD. Warning,
> >substantial penalties may apply if it doesn't perform as advertised!
>
> He didn't claim that he actually *had* one, he was just describing
> one. Fortunatly, I can supply you with one. Take a look at
> http://www.fourmilab.to/hotbits.

I've tried compressing 1024 of their bytes, no luck so far. Will keep trying
as the spirit motivates. Be patient, it may take a while. But there *is* a
theoretical bias, as they do discard duplicate decay-time differences. Wonder
why? Mathematical foresight? [evil grin] I think not!

>
> If you believe that radioactive decay isn't actually random then feel
> free to try to compress this data. Let us know when you give up.

Well, of course if it was totally random there would be little point in the
concept of a radioactive half-life. The authors realize this, and even the
layperson reading their explanation of how it works
http://www.fourmilab.to/hotbits/how.html
will wonder about the need for so many experimental corrections (inverting the
computer clock every event!) to what is putatively a random measurement.

[Adopt avatar of modesty] As I understand it, Fermi's theory of beta decay
involved the nuclear electron "trying" to overcome the potential barrier many
times per second, and ultimately succeeding. This gave good fits to existing
data and good quantitative estimates for the half-lives of then-yet
undiscovered nuclei. But IMHO it did beg the question of why all nuclei that
had been created at the same time just didn't simultaneously go poof after a
fixed time interval.

Samuel Paik

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

"D.A.Kopf" wrote:
> > He didn't claim that he actually *had* one, he was just describing
> > one. Fortunatly, I can supply you with one. Take a look at
> > http://www.fourmilab.to/hotbits.
>
> I've tried compressing 1024 of their bytes, no luck so far. Will keep trying
> as the spirit motivates. Be patient, it may take a while. But there *is* a
> theoretical bias, as they do discard duplicate decay-time differences.

Discarding equal time intervals is to ensure a bias doesn't occur.

--
Samuel S. Paik | http://www.webnexus.com/users/paik/
3D and multimedia, architecture and implementation

Harvey Rook

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

Actually, all deterministic digital algorithms can be reduced to a Turing
machine, or to predicate calculus. This means they can be completely
analyzed with mathematics. It might be very hard, but it can be done.

In order to violate the counting argument, you have to abandon digital
computation, so perhaps an analog, or a quantum computer can do better.
But, we'll have to wait until analog or quantum computers have the real
power that digital computers do.

Harvey Rook
harve...@hotmail.com

"Itai Bar-Haim" <av...@bezeqint.net> wrote in message
news:3954...@news.bezeqint.net...
> Hi.
> Since Math actually based on definitions, proven to work in many cases
(well
> all of them, actually :-), I wouldn't try to break that theory. If some
> smart greek guys many years ago have decided that 1+1=2, than you can't
tell
> them that you found a way to break their own definition and that from now
on
> 1+1=3. That's why it's definition. What is possible, though, is to create
a
> "newer version of math". You will define your own rules of the new
language
> of numbers, and than everything will work the way you want it.
> I think that theories that are based on definitions are not supposed to be
> broken. Actually, I think they cannot be broken. But pure theories based
on
> logical thinking only can be broken.
> The theory of data compression is based on logical thinking only, and many
> of the compression algorithms are not based on math at all. That's why I
> believe that the compression theory that says that infinate compression is
> impossible is wrong. It is logical, but yet, I think it's wrong. That's
why
> I'm going to try to prove that it is wrong, by developing an algorithm
that
> will actually compress any data whatsoever (probably under minimum block
> size restrictions, so what?).
> If anyone think the same and want to try to help me, he/she may contact
me,
> and I'll be glad to discuss this issue.
>
> Itai.
>
>
>
> Alan Morgan <amo...@Xenon.Stanford.EDU> wrote in message
> news:8j03u9$d2k$1...@nntp.Stanford.EDU...
> > In article <slrn8l613a.t6...@jalua.ch.genedata.com>,

> > Phil Norman <Phil....@genedata.com> wrote:
> > >On Thu, 22 Jun 2000 20:39:32 GMT,
> > > mikael.l...@spray.se <mikael.l...@spray.se> wrote:
> > >>How true it is what you are saying ...

> > >>I almost suspect that when people come with a lot of difficult theory
to
> > >>explain why it's impossible to solve this or that problem, they
actually
> > >>really don't have a clue. The fact remains, the most powerful tool you
> > >>have is your mind. If your attitude is that nothing is impossible and
> you
> > >>try and try again, NEVER EVER giving up, then the problem is destined
to
> > >>be solved no matter what the NO-sayers tries to tell you!
> > >>Where would we be if not Edison, Marconi or Einstein, with this
> attitude,
> > >>had not done what they did?
> > >

Itai Bar-Haim

unread,

Jun 24, 2000, 3:00:00 AM6/24/00

to

mikael.l...@spray.se

unread,

Jun 24, 2000, 3:00:00 AM6/24/00

to

Please, I understand that you're ironic.
It IS just plain stupid to try what is so easily proven impossible.
At least some intelligence must be included in the argument.
I've just heard TOO much of that negative attitude where trying is
absolutely out of the question. Just see how life have changed the last
100 years and try to understand why we need the crazy people who have
no respect for the law of nature as described by others.
They just try!

Mikael

In article <slrn8l613a.t6...@jalua.ch.genedata.com>,

Phil....@genedata.com (Phil Norman) wrote:
> On Thu, 22 Jun 2000 20:39:32 GMT,
> mikael.l...@spray.se <mikael.l...@spray.se> wrote:
> >How true it is what you are saying ...
> >I almost suspect that when people come with a lot of difficult
theory to
> >explain why it's impossible to solve this or that problem, they
actually
> >really don't have a clue. The fact remains, the most powerful tool
you
> >have is your mind. If your attitude is that nothing is impossible
and you
> >try and try again, NEVER EVER giving up, then the problem is
destined to
> >be solved no matter what the NO-sayers tries to tell you!
> >Where would we be if not Edison, Marconi or Einstein, with this
attitude,
> >had not done what they did?
>
> Absolutely. I, personally, have been trying to get 1+1 to equal 3
> for a good number of years now. I've not managed it yet (when I
> do, I'll post the results on http://inkvine.fluff.org/~forrey/),
> but *I'm* certainly not giving up just because mathematicians say
> it's impossible. What do they know? They're just constrained by
> all the prejudices they've been taught during their studies.
>

> What the world needs is new, original thought. Good luck breaking
> the counting argument! Einstein, Newton and all their friends
> will be proud.
>

> Cheers,
> Phil

mikael.l...@spray.se

unread,

Jun 24, 2000, 3:00:00 AM6/24/00

to

You didn't answer the question!
Do you have an exact method to measure the entropy of a given source
that actually CAN be compressed? Like Calgary corpus geo or news?

I'd actually be surprised if you could.

Mikael

In article <8j08t9$g6p$2...@hermes.acs.unt.edu>,

s...@nospam.unt.edu wrote:
> mikael.l...@spray.se wrote:
>
> > I know what entropy means. "Disorder" or "randomness". I should
have used
> > the word "statistics" instead, I was thinking about e.g. huffman
coding.
> > And it's easy to pack a file better than with huffman only, if we
use
> > "bwt, ppm or ctw as preprocessor" isn't it?
>
> But an order-0 Huffman coder is not a universal compressor. The fact
> that it can be beat isn't a surprise to anyone (in fact, Shannon's
> original paper defining entropy talks about this).
>
> > I think this also very clearly demonstrates the difference between
> > scientists like you and engineers like me. For me it's not a lot of
> > theory, math or carefully laid up proofs of something.
>
> But don't you think you'd be better off if you understood the theory?
> I'm not saying everyone has to be a scientist and work on new
> theories, but you need to understand what's there. An engineer
> shouldn't just try random stuff in the hopes that something will
> work. You need to be strong in the foundations if you expect to make
> progress.
>
> > Are you really sure that YOU know how to measure the "source's"
entropy?
> > If you don't, then what is all this theory worth?
>

> Yes, I'm absolutely sure I know how to measure a source's entropy.
> It's a fairly simple theory and formula. Here's a simple example: my
> source generates numbers 0-255 (bytes) all random, uniform, and
> independent. The entropy of that source is exactly 8 bits per symbol,
> and you will never be able to compress it any better than that.
>

> --
> Steve Tate --- srt[At]cs.unt.edu | Gratuitously stolen quote:
> Dept. of Computer Sciences | "The box said 'Requires Windows
95, NT,
> University of North Texas | or better,' so I installed Linux."
> Denton, TX 76201 |
>

Harry

unread,

Jun 24, 2000, 3:00:00 AM6/24/00

to

Hi,

The compression can't get a better result than a theoretical value of
information.
Rather worse, because the practical compress algorithm don't get
the theoretical effectivity.
Partially you have the truth. Any (any!) variations of data (your 4096
bites)
can get the compressed size under 817 bits.
E.g. first 128 bits set, next (4096-128) bits non-set, etc., this is a
very cool
variation for compression.
But we must solve the max. post-compressed size for a most inefficient
variations of
data!

Harry

Samuel Paik

unread,

Jun 24, 2000, 3:00:00 AM6/24/00

to

mikael.l...@spray.se wrote:
> You didn't answer the question!
> Do you have an exact method to measure the entropy of a given source
> that actually CAN be compressed? Like Calgary corpus geo or news?

There is a body of mathematical theory called "information theory"
which assigns definitions to the terms "source", "message", "model",
"entropy", and derives a number of interesting and fundamental results,
including Shannon's noiseless source coding theorem. In this theory, the
elements of the Calgary corpus are not sources, but are specific messages.

So you are asking a meaningless question--however, this is somewhat
understandable since people are often a bit loose and talk about the
entropy of messages without reference to a model.

You seem to have had a smattering of information theory without really
grasping some of its consequences, Itai Bar-Haim obviously does not have
any understanding.

A good example is this statement of yours:

And it's easy to pack a file better than with huffman only, if we use
"bwt, ppm or ctw as preprocessor" isn't it?

mikael.l...@spray.se

unread,

Jun 25, 2000, 3:00:00 AM6/25/00

to

Hi!
The core of my "smattering" is that I think it's utterly pointless with
a lot of theory if we can't or will not do something real with it.
I know that there are three main methods of compression: dictionary,
statistical and prediction (ppm) based methods along with different
methods to better adapt sources to the main methods. Until someone come
up with something new that can be practically used then all this talk
is really-utterly-pointless isn't it?!?!

Mikael

In article <395561F0...@webnexus.com>,

s...@nospam.unt.edu

unread,

Jun 25, 2000, 3:00:00 AM6/25/00

to

mikael.l...@spray.se wrote:

> You didn't answer the question!
> Do you have an exact method to measure the entropy of a given source
> that actually CAN be compressed? Like Calgary corpus geo or news?

The Calgary Corpus is not a source, it's a set of files.

Here's a source that I can measure the entropy of and compress. I'm
going to generate bytes, with a geometric distribution: probability
of value 0 is 1/2, probability of value 1 is 1/4, value 2 is 1/8,
and so on, until the probability of value 254 is 1/2^255 and the
probability of value 255 is also 1/2^255 (so everything adds up to 1).

That's a source, and the entropy is almost exactly 2 bits per symbol.
I can also encode (or compress) this to exactly entropy -- in this
case it's very easy: order-0 Huffman coding works perfectly here,
since the probabilities are all powers of 1/2.

So now I've given you two different sources that I've calculated the
entropy for. The first was incompressible, and the second was
compressible with a 4:1 ratio.

mikael.l...@spray.se

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

Yes, this seems to be correct!
But you have only considered the statistics of the symbols (I think)!
If we leave the word "entropy" and use "possible compression" instead
then statistics is easily proven not enough, we know several other
methods to compress! I conclude that science has no method to beyond
any doubts to show where the theoretical limit is for a given file (at
least I have not seen any), without preinstalled dictionaries.
Then the word "impossible" should quickly loose its meaning.
(E.g. what do you think about Malcolm Taylor's rk archiver and its
utterly fantastic compression?)

Mikael

In article <8j5ipd$m9c$1...@hermes.acs.unt.edu>,

Phil Norman

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

On Fri, 23 Jun 2000 19:59:10 -0700, Samuel Paik <pa...@webnexus.com> wrote:
>"D.A.Kopf" wrote:
>>
>> I've tried compressing 1024 of their bytes, no luck so far. Will keep trying
>> as the spirit motivates. Be patient, it may take a while. But there *is* a
>> theoretical bias, as they do discard duplicate decay-time differences.
>
>Discarding equal time intervals is to ensure a bias doesn't occur.

Just wondering, but why do you need to ensure against bias if your
source of data is a random nuclear decaying source?

Incidentally, Mr Kopf, yes there will be a shift of the
distribution of time-differences with time due to the half-life,
but I imagine in this case a source has been chosen which has a
half-life so huge that it won't really affect the time differences
in any noticable way when measured in the order of a few days.

Cheers,
Phil

--
:wq

Phil Norman

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

On Fri, 23 Jun 2000 22:42:10 GMT,
kaptke...@my-deja.com <kaptke...@my-deja.com> wrote:
>Ah, but if we didn't scrape the blender clean after transform (leaving
>just over one pigeon's worth on the interior of the blender) then we
>could market such a transform as able to compress losslessly and
>recursively on all data (plus some marginal bookkeeping (the remainder
>of the pigeon in the blender). We could sell it to anyone who does not
>comprehend math. (ofcorse we would never actually release any working
>product).

Well, unless you count ready blended pigeon. I'm sure there are a
number of supermarkets who'd be interested.

Cheers,
Phil

Phil Norman

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

On Sun, 25 Jun 2000 12:18:51 GMT,
mikael.l...@spray.se <mikael.l...@spray.se> wrote:
>I know that there are three main methods of compression: dictionary,
>statistical and prediction

This is something that's been puzzling me. Is 'prediction' really
viewed as a separate type of compression? It would seem to me
that prediction can take place only as a result of knowledge of
what is likely to occur in the future of the input stream.
However, this knowledge is going to be either knowing the
distribution of occurrences of symbols, or knowing what kind of
strings of symbols are likely to occur (so either statistical or
dictionary). Or both, of course.

In fact, I'd be inclined to think that prediction isn't really a
method of compression of itself at all, but is simply something
one does while trying to compress. In order to compress some
data in the future, one must construct a mapping from the possible
set of inputs to the possible set of outputs which favours what we
currently view as predictably more likely. This is what all the
compressors I've come across effectively do.

Or is there some specific definition of what predictive
compression means which I've completely missed?

Cheers,
Phil

--
:wq

Phil Norman

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

On Sat, 24 Jun 2000 10:54:10 GMT,
mikael.l...@spray.se <mikael.l...@spray.se> wrote:
>Please, I understand that you're ironic.

That wasn't irony, that was sarcasm.

>It IS just plain stupid to try what is so easily proven impossible.

Thankyou. Now, go read the counting argument and notice how
easily proven it is.

And yes, computing (and compression) is based on mathematics.

Cheers,
Phil

--
:wq

Dale King

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

Itai Bar-Haim wrote:
>
> Hi.
> Since Math actually based on definitions, proven to work in many cases (well
> all of them, actually :-), I wouldn't try to break that theory. If some
> smart greek guys many years ago have decided that 1+1=2, than you can't tell
> them that you found a way to break their own definition and that from now on
> 1+1=3. That's why it's definition. What is possible, though, is to create a
> "newer version of math". You will define your own rules of the new language
> of numbers, and than everything will work the way you want it.

As far as you new math, you can certainly come up new symbology that
reuses the symbols in other ways. Just like you can translate words into
other languages. But this does not change the physical universe, merely
the words or symbols you use to describe it. 1+1=3 is valid if any of
the symbols changes meaning from the standard mathematical definitions.

No, everything will still work the same way, you just describe it
differently.

> I think that theories that are based on definitions are not supposed to be
> broken. Actually, I think they cannot be broken. But pure theories based on
> logical thinking only can be broken.

You are confusing two similar sounding words:

theory - general principles drawn from any body of facts; a plausible or
scientifically acceptable general principle offered to explain observed
facts; hypothesis, guess.

theorem - a statment in mathematics that has been or is to be proved; an
idea accepted or proposed as a demonstrable truth.

Theories often are proven wrong. But the counting argument is a theorem,
not a theory. It is easily proven true.

> The theory of data compression is based on logical thinking only, and many
> of the compression algorithms are not based on math at all. That's why I
> believe that the compression theory that says that infinate compression is
> impossible is wrong. It is logical, but yet, I think it's wrong. That's why
> I'm going to try to prove that it is wrong, by developing an algorithm that
> will actually compress any data whatsoever (probably under minimum block
> size restrictions, so what?).

Once again, don't confuse information theory from the theorems that it
is made up of.

If you succeed you could easily become rich enough to buy the planet.
And once you succeed, then you can set to work on creating that
perpetual motion machine.

On a more serious note, back to your original question which was how to
compress blocks of 4096 bits which have exactly 128 bits set to less
than 500 bits. It has already been shown that there are approximately
2^817 such files. There are approximately 2^500 files whose size is less
than 500 bits. If it were possible to compress all those input files to
less than 500 bits, it is obvious that you must compress more than one
of your input files to the same compressed file (in fact on average you
would compress 2^317 files to each of the possible compressed files).
Now look at the decompression side. How can you correctly map from 2^500
input files to all 2^817 output files?

That is the pigeon-hole principle: If A and B are finite sets and |A| >
|B|, then there is no one-to one function. Here A is the set of all
files with 4096 bits of which 128 are set. |A| = 2^817. B is set of all
files whose size is <= 500 bits. |B| = 2^500. In doing compression you
are trying to come up with a function f(a) = b, where a is in A and b is
in B. For lossless compression f must be invertible as f' in that f'( f(
a ) ) = a. But because |A| > |B| you cannot escape that there must be
some case where f(a1) = f(a2) when a1 != a2. This leads you to:

a1 = f'( f(a1) )
a1 = f'( f(a2) )
a1 = a2

But this contradicts the statement above that a1 != a2. Therefore any
such function you come up with is not invertible. You basically have a
lossy compressor.

s...@nospam.unt.edu

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

mikael.l...@spray.se wrote:

> But you have only considered the statistics of the symbols (I think)!
> If we leave the word "entropy" and use "possible compression" instead
> then statistics is easily proven not enough, we know several other
> methods to compress! I conclude that science has no method to beyond
> any doubts to show where the theoretical limit is for a given file (at
> least I have not seen any), without preinstalled dictionaries.

Well, you're the one who brought up the word "entropy", so that's what
I was talking about.

If you want to use "possible compression" instead, particularly for a
particular file instead of a probabilistic source, then you're looking
at something entirely different. And there is a perfect answer to
questions about maximum possible compression, and that answer is
Kolmogorov complexity (there's are several books on the topics of
Kolmogorov complexity if you're interested). Of course, there's a
downside: while we can define what we mean by the "best possible
compression", it turns out that it's uncomputable -- there's no single
algorithm that can compute the best possible compression for all files
(or even a significant fraction of files), even if you had all the
time in the world (in other words, it's a computability problem, not a
complexity problem). So one step forward in theory, and an infinite
leap backwards in practicality.

Dale King

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

Tom Gutman wrote:
>
> The counting argument also has a major flaw. It is again quite
> true, but in a sense quite meaningless. The overwhelming majority
> of possible files do not exist, and will never exist. A real
> compression program need only deal with files that actually do or
> will exist -- its behavior on files that will never exist is
> irrelevant.

No the counting argument has no flaw. Let's look at it from the
pigeon-hole priniciple which says if you have 2 sets A and B and the
number of elements in set A is larger than the number of elements in set
B, you cannot have a function f(a) = b to map elements from set A to set
B such that f(a1) != f(a2) for all a1 and a2 in A. Or alternatively
there must be an a1 and an a2 in A such that f(a1) == f(a2).

No your saying that some of the files don't exist. All that say is the
size of A is not as large as it would appear.

But this is all handled very well in information theory. You don't even
have to go so far as to say those files won't eixst. You can just say
that they aren't as likely. Entropy of a source looks at the
probabilities of the source and even if all 2^n are possible but some
are more likely than others you can still compress ON AVERAGE better
than n bits. Of course, we haven't escaped the counting argument, those
rare messages will require more than n bits, but more likely messages
will hopefully require less.

> One can argue that almost any sufficiently large actual file is
> highly compressible, regardless of how random it might appear to
> be. Any such file must have been generated by some process, and
> that process is likely to be limited in size. For instance, a
> common way to get "random" file for testing is to use a
> pseudo-random number generator. But any such file is obviously
> compressible down to the size required to describe the algorithm,
> and its starting state. A compressed version of the C source for
> generating the file is not likely to be very big.

But the question is can you necessarily specify the process in less
bytes than the process outputs. Not in all cases.

And the biggest problem for compression is that want a general purpose
compressor to be able to try and guess what the process is without any
prior knowledge of it. If you know the process ahead of time then it is
easier to devise a compression scheme. It is not too difficult to devise
simple sources that follow easy to express patterns that are nearly
impossible to detect.

> Interesting thought on the counting argument. If you modify it to
> count only files that do, or will, (two separate cases) exist,
> what sort of limits do you get on compressibility?

You get a lower entropy limit, because you are reducing the entropy of
your source.

D.A.Kopf

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

Dale King wrote:
<snip>

>
> On a more serious note, back to your original question which was how to
> compress blocks of 4096 bits which have exactly 128 bits set to less
> than 500 bits. It has already been shown that there are approximately
> 2^817 such files. There are approximately 2^500 files whose size is less
> than 500 bits. If it were possible to compress all those input files to
> less than 500 bits, it is obvious that you must compress more than one
> of your input files to the same compressed file (in fact on average you
> would compress 2^317 files to each of the possible compressed files).
> Now look at the decompression side. How can you correctly map from 2^500
> input files to all 2^817 output files?

You can theoretically map the first 2^500 files to any of 2^817 output files.
Then, if anybody is still alive to care, you'd best get outta town.

Tom Gutman

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

"Dale King" <Ki...@TCE.com> wrote in message

news:3957AE70...@TCE.com...

>
> Tom Gutman wrote:
> >
> > The counting argument also has a major flaw. It is again
quite
> > true, but in a sense quite meaningless. The overwhelming
majority
> > of possible files do not exist, and will never exist. A real
> > compression program need only deal with files that actually do
or
> > will exist -- its behavior on files that will never exist is
> > irrelevant.
>
> No the counting argument has no flaw. Let's look at it from the
> pigeon-hole priniciple which says if you have 2 sets A and B and
the
> number of elements in set A is larger than the number of
elements in set
> B, you cannot have a function f(a) = b to map elements from set
A to set
> B such that f(a1) != f(a2) for all a1 and a2 in A. Or
alternatively
> there must be an a1 and an a2 in A such that f(a1) == f(a2).

You need to look again at my second sentence. I agree that the
counting argument is true. Perhaps I need to qualify my original
statement a bit more -- the counting argument is flawed as applied
to the compression issues.

>
> No your saying that some of the files don't exist. All that say
is the
> size of A is not as large as it would appear.

And that is exactly the flaw. The counting argument says that you
have 2^N files of length N, but only (2^N)-1 files of length <N,
and so you cannot compress all files of length N. But if there
are actually only 2^n files of length N, with n<<N, then the
counting argument allows you to compress all actual files of
length N down to n or less bits.

That is not to say that it is possible to do such a
compression -- only that the counting argument does not exclude
it. Perhaps a more complex argument, based on the decidability of
the existence of a file, would demonstrate the impossibility But
such arguments are considerably more difficult to develop, and,
being difficult to understand, likely to be less convincing.

>
> But this is all handled very well in information theory. You
don't even
> have to go so far as to say those files won't eixst. You can
just say
> that they aren't as likely. Entropy of a source looks at the
> probabilities of the source and even if all 2^n are possible but
some
> are more likely than others you can still compress ON AVERAGE
better
> than n bits. Of course, we haven't escaped the counting
argument, those
> rare messages will require more than n bits, but more likely
messages
> will hopefully require less.

Sure, and that is what makes compression useful for its common
use -- cutting down average storage requirements and transmission
times. But then you run into the guy whose brand new device has
limited (storage space, bandwidth, ...) and he therefore needs an
algorithm that guarantees a particular level of compression. So
somebody trots out the counting argument to show that that is not
possible. But the guy has an intuitive feel, which he cannot
properly state, that there aren't *really* all that many files and
that the counting argument somehow doesn't really apply. And so
the argument meanders on and on and ....

>
> > One can argue that almost any sufficiently large actual file
is
> > highly compressible, regardless of how random it might appear
to
> > be. Any such file must have been generated by some process,
and
> > that process is likely to be limited in size. For instance, a
> > common way to get "random" file for testing is to use a
> > pseudo-random number generator. But any such file is
obviously
> > compressible down to the size required to describe the
algorithm,
> > and its starting state. A compressed version of the C source
for
> > generating the file is not likely to be very big.
>
> But the question is can you necessarily specify the process in
less
> bytes than the process outputs. Not in all cases.

No -- hence the qualification fo being sufficiently large. A 100
byte file is quite likely to have a source that is larger than the
file. A 1,000,000,000 byte file is unlikely to have such a source
.

But the issue of source size is one that could use a bit more
attention. There was a message elsewhere in this thread where
somebody was claiming to be adding a high degree of randomness
with just a few simple lines of code. Impossible -- at most what
was added was pseudo-randomness, and the results are still highly
compressible (in principle).

>
> And the biggest problem for compression is that want a general
purpose
> compressor to be able to try and guess what the process is
without any
> prior knowledge of it. If you know the process ahead of time
then it is
> easier to devise a compression scheme. It is not too difficult
to devise
> simple sources that follow easy to express patterns that are
nearly
> impossible to detect.

Quite. The difference between "here is a proof of the existence
of a solution to this problem" and "here is a proof of the
existence of an algorithm to calculate a solution to this problem"
and "here is an algorithm to calculate a solution to this
problem".

>
> > Interesting thought on the counting argument. If you modify
it to
> > count only files that do, or will, (two separate cases) exist,
> > what sort of limits do you get on compressibility?
>
> You get a lower entropy limit, because you are reducing the
entropy of
> your source.

--
Tom Gutman

Samuel Paik

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

Tom Gutman wrote:
> You need to look again at my second sentence. I agree that the
> counting argument is true. Perhaps I need to qualify my original
> statement a bit more -- the counting argument is flawed as applied
> to the compression issues.

More or less agreed, for practical coding, the counting argument is
irrelevent (but not FLAWED). However, we do get kooks from time
to time in this newsgroup who are looking for or claim to be able
to reduce the size of *all* files. [Itai and Mikael are this month's
examples...]

Samuel Paik

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

mikael.l...@spray.se wrote:
> > More or less agreed, for practical coding, the counting argument is
> > irrelevent (but not FLAWED). However, we do get kooks from time
> > to time in this newsgroup who are looking for or claim to be able
> > to reduce the size of *all* files. [Itai and Mikael are this month's
> > examples...]
> >

> You never give up, do you?

I apologize, you irked me by belittling me in the "Very Specific
Compression Problem" thread after I had spent more time than
I should have in analyzing the poster's problem and your proposed
solution.

> I did certainly NOT say anything about everything being compressible!
> I really don't think Itai did either. It's just too stupid.

Itai most certainly did in <3954...@news.bezeqint.net>:

That's why I believe that the compression theory that says that
infinate compression is impossible is wrong. It is logical, but yet,
I think it's wrong. That's why I'm going to try to prove that it is
wrong, by developing an algorithm that will actually compress any
data whatsoever (probably under minimum block size restrictions,
so what?).

mikael.l...@spray.se

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

> More or less agreed, for practical coding, the counting argument is
> irrelevent (but not FLAWED). However, we do get kooks from time
> to time in this newsgroup who are looking for or claim to be able
> to reduce the size of *all* files. [Itai and Mikael are this month's
> examples...]
>
You never give up, do you?

I did certainly NOT say anything about everything being compressible!
I really don't think Itai did either. It's just too stupid.

I was talking about attitude.
I think Itai had his compression problem in mind.
You're just pouting because it was not you who could help him.
Grow up!

Mikael

Dale King

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

Tom Gutman wrote:
>
> "Dale King" <Ki...@TCE.com> wrote in message
> news:3957AE70...@TCE.com...
>

> You need to look again at my second sentence. I agree that the
> counting argument is true. Perhaps I need to qualify my original
> statement a bit more -- the counting argument is flawed as applied
> to the compression issues.

I understood your second sentence perfectly well. There is no flaw in
the counting argument. Some people misapply it, that is where the flaw
is.

> > No your saying that some of the files don't exist. All that say
> is the
> > size of A is not as large as it would appear.
>
> And that is exactly the flaw. The counting argument says that you
> have 2^N files of length N, but only (2^N)-1 files of length <N,
> and so you cannot compress all files of length N. But if there
> are actually only 2^n files of length N, with n<<N, then the
> counting argument allows you to compress all actual files of
> length N down to n or less bits.

Counting argument works just fine here. That is because it is concerned
with the number of input files not the size of the files. And the
counting argument says you have 2^n input files and you cannot compress
all of them below 2^n bits, because that would be the entropy limit of
that source, provided that all input files are equally likely.

> > But this is all handled very well in information theory. You
> don't even
> > have to go so far as to say those files won't eixst. You can
> just say
> > that they aren't as likely. Entropy of a source looks at the
> > probabilities of the source and even if all 2^n are possible but
> some
> > are more likely than others you can still compress ON AVERAGE
> better
> > than n bits. Of course, we haven't escaped the counting
> argument, those
> > rare messages will require more than n bits, but more likely
> messages
> > will hopefully require less.
>
> Sure, and that is what makes compression useful for its common
> use -- cutting down average storage requirements and transmission
> times. But then you run into the guy whose brand new device has
> limited (storage space, bandwidth, ...) and he therefore needs an
> algorithm that guarantees a particular level of compression. So
> somebody trots out the counting argument to show that that is not
> possible. But the guy has an intuitive feel, which he cannot
> properly state, that there aren't *really* all that many files and
> that the counting argument somehow doesn't really apply. And so
> the argument meanders on and on and ....

Once again, that is misapplying the counting argument. The counting
argument doesn't take into account variations in probability of
messages. Perhaps that is what you are calling the flaw. But information
theory has entropy to handle the probabilities.

> > But the question is can you necessarily specify the process in
> less
> > bytes than the process outputs. Not in all cases.
>
> No -- hence the qualification fo being sufficiently large. A 100
> byte file is quite likely to have a source that is larger than the
> file. A 1,000,000,000 byte file is unlikely to have such a source
> .

How about a 1,000,000,000 byte file that is the compressed version of
2,000,000,000 byte file.

> > And the biggest problem for compression is that want a general purpose
> > compressor to be able to try and guess what the process is without any
> > prior knowledge of it. If you know the process ahead of time then it is
> > easier to devise a compression scheme. It is not too difficult to devise
> > simple sources that follow easy to express patterns that are nearly
> > impossible to detect.
>
> Quite. The difference between "here is a proof of the existence
> of a solution to this problem" and "here is a proof of the
> existence of an algorithm to calculate a solution to this problem"
> and "here is an algorithm to calculate a solution to this
> problem".

I think somehow one could use diagonalization to prove that such an
algorithm cannot exist, just as is done with the halting problem. For
those unfamiliar with this the idea for the halting problem is:

- Assume the existence of an algorithm HALTS( P, X ) that accepts a
program P and input X and returns a boolean result if the program P will
halt on input X.
- We create a program diagonal( X ) that is:
if HALTS( X, X ) then loop forever else halt
- Call diagonal on itself, i.e. diagonal( diagonal )

You then have a paradox, which leads you to the conclusion that HALTS
cannot exist. I'm sure the same thing can be done to show that you
cannot have an algorithm that can devise another algorithm by its
output.

Dale King

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

mikael.l...@spray.se wrote:
>
> But you have only considered the statistics of the symbols (I think)!
> If we leave the word "entropy" and use "possible compression" instead
> then statistics is easily proven not enough, we know several other
> methods to compress! I conclude that science has no method to beyond
> any doubts to show where the theoretical limit is for a given file (at
> least I have not seen any), without preinstalled dictionaries.

It's fairly simple. The theoretical limit for compression of a single
file is basically 100% compression. For any given file there is some
compression algorithm that will compress that file to 1 bit. Of course
it will not do so well on all other files and if you average the amount
of compression over all possible input files you will have a net loss.
Once again you have to talk about the source, not about individual
files.

While on the way home last night, I think I have come up with a very
good analogy that makes the couting argument much clearer. The analogy
is the telephone system and telephone numbers.

Imagine we have a large city or state and we want to assign telephone
numbers to 20 million homes or businesses, etc. Just assume that we want
20 million telephones to be individually callable. Can we do this using
the typical US 7-digit telephone number? Using only 7-digit phone
numbers only gives us 10^7 or 10 million unique telephone numbers. It is
therefore impossible to map all 20 million telephone destinations on to
only 10 million telephone numbers. If you do so, you will be giving the
same phone number to more than one person. This is the counting argument
or the pigeon-hole principle in action.

So the solution used in the US is to use area codes which are three
digits. Therefore each telephone destination in the US is uniquely
identified using a 10 digit number. So theoretically you could have 10
billion phone numbers all of which are 10 digits. But that is not where
it stops. It would be cumbersome to always have to dial all 10 digits,
considering how most of the calls you make are local and will have the
same area code. Therefore they add some "compression". For calls within
the same area code you can simply dial the 7 digit number. At first that
seems like some savings. For 10 million numbers you save 3 digits. That
saves 30 million digits. But what is the consequence. For any non-local
number you have to dial a 1 before the 10 digit number. So for
9,990,000,000 telephone numbers you now have to dial an extra digit.
Averaged over all telephone numbers you have a net loss. And you also
don't get the original 30 million savings because you can't use any of
the 1 million phone numbers that begin with 1. But it is still a good
compression because of the characteristics of the source are such that
local numbers are more likely. But what this shows is that while you can
compress some files below the entropy limit of a source, that requires
expanding others.

And the analogy works with the concept of prefix codes. Let's look at
just within an area code. Let's say we want to go below 7 digits for
some numbers. Could we have a service where we charge a premium for
users to get shorter phone numbers. We could have 10 people that single
digit phone numbers, 100 people get 2 digit numbers, and so on. In some
sense this is a real scenario with things like dialing 911 for
emergencies. But this doesn't work because telephone numbers must be
prefix codes. We can't have one person with the telephone of 32 and
someone else with a phone number of 321 because if we see 3-2 whether to
call the first person or wait on more digits. For the case of 911 we
basically can't use any of the numbers that start with 911.

I'm sure someone would likely point out that we could have the person
hit # to signal that those are all the digits and go ahead and dial.
First this cuts down on the compression since once again you are adding
a key. Secondly, that is cheating because you changed the parameters of
the problem by changing the alphabet of the output. If you allow # to be
part of the output you would do much better allowing it in any location
rather than just the last one.

The only difference from compression is that you are working with only 0
and 1 instead of 0 through 9. So think of Itai's problem as trying to
assign 2^500 phone numbers to 2^817 people. It cannot be done without
losing the ability to call 2^817-2^500 of the people.

Itai Bar-Haim

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

Just a question:
Suppose I do have (and I don't, and probably won't have in the near time :-)
a compression algorithm that can compress ALL files. How much would you pay
for such algorithm?
I estimate roughly the value of such algorithm in about 50 billion dollars
US. After all, it does solve the world's data transfer and storage
problems...
Now: with such great sum of money in mind, wouldn't you try to achive this
goal?
:-)

Samuel Paik <pa...@webnexus.com> wrote in message
news:3957EC5C...@webnexus.com...

> Tom Gutman wrote:
> > You need to look again at my second sentence. I agree that the
> > counting argument is true. Perhaps I need to qualify my original
> > statement a bit more -- the counting argument is flawed as applied
> > to the compression issues.
>

> More or less agreed, for practical coding, the counting argument is
> irrelevent (but not FLAWED). However, we do get kooks from time
> to time in this newsgroup who are looking for or claim to be able
> to reduce the size of *all* files. [Itai and Mikael are this month's
> examples...]
>

D.A.Kopf

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

Dale King wrote:
>
> Tom Gutman wrote:
> >
<snip>

> > And that is exactly the flaw. The counting argument says that you
> > have 2^N files of length N, but only (2^N)-1 files of length <N,
> > and so you cannot compress all files of length N. But if there
> > are actually only 2^n files of length N, with n<<N, then the
> > counting argument allows you to compress all actual files of
> > length N down to n or less bits.
>

> Counting argument works just fine here. That is because it is concerned
> with the number of input files not the size of the files. And the
> counting argument says you have 2^n input files and you cannot compress
> all of them below 2^n bits, because that would be the entropy limit of
> that source, provided that all input files are equally likely.
>

This strikes me as an imprecise mixture of the concept of files with the
concept of the entropy of a source. A source emits a continuous stream of
information. If you want to partition that into files of length N, you have to
make the decision of whether or not to keep track of the order of all the
files. If you throw all the files into a hat then the order is lost and
consequently the number of bits needed to represent the files is reduced from
that calculated from the entropy of the source.

An extreme case would be partitioning into bytes without order information,
a.k.a the source histogram. Thousands of files (i.e. bytes) could be contained
in 256 bins. Given a bin I could tell which files were in it, and even how
many there were, but would not be able to say *when* the files sere sent.

A more complicated example would involved storing, say, 1025 byte files in
1024 byte sectors, where the files are sorted by the discarded first byte,
such that the sector number could be indexed through a table to recover that
byte. This is no cheat, it would allow compression of files from a completely
random source, because the source entropy is reduced by the subsequent file
sorting operation.

Methinks the same consideration applies to pigeonholes; all the pigeons
without beaks could be put in head-first; all the others could be debeaked and
put in facing out (yuk, let's do that the other way around!) so all the
pigeonholes could be downsized by a beak-length allowing more holes per unit
volume. (Well, if nothing else, this example might suggest an essential
difference between pigeons and information)

Binder Edgar

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

Hi,

Itai Bar-Haim wrote:

[compression of *all files*]

> I estimate roughly the value of such algorithm in about 50 billion dollars
> US. After all, it does solve the world's data transfer and storage
> problems...
> Now: with such great sum of money in mind, wouldn't you try to achive this
> goal?
> :-)

Well, one could first try:

1 + 1 = 50 billion (dollars)

(you have a very, very slight chance to get 50$ billion by investing 2$ -
but there isn't however any chance to get a working algorithm of what you
propose :-)

--
Edgar

mikael.l...@spray.se

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

The problem is that we probably don't know all there is to know about
compression algorithms. We know three different methods: statistical,
dictionary and prediction based methods.
Now, if a given file have 256 symbols of exactly the same amount,
randomly spread in the file so that no strings are repeated or followed
(or preceded) by the same symbol. Then how on earth is this file going to
be compressed?
Impossible until we have more information about it. Or learn about new
"revolutionary" methods. It's a good thing to have visions. They should
however be realistic.

Mikael

In article <3959...@news.bezeqint.net>,

"Itai Bar-Haim" <av...@bezeqint.net> wrote:
> Just a question:
> Suppose I do have (and I don't, and probably won't have in the near time :-)
> a compression algorithm that can compress ALL files. How much would you pay
> for such algorithm?

> I estimate roughly the value of such algorithm in about 50 billion dollars
> US. After all, it does solve the world's data transfer and storage
> problems...
> Now: with such great sum of money in mind, wouldn't you try to achive this
> goal?
> :-)
>

> Samuel Paik <pa...@webnexus.com> wrote in message
> news:3957EC5C...@webnexus.com...
> > Tom Gutman wrote:

> > > You need to look again at my second sentence. I agree that the
> > > counting argument is true. Perhaps I need to qualify my original
> > > statement a bit more -- the counting argument is flawed as applied
> > > to the compression issues.
> >

> > More or less agreed, for practical coding, the counting argument is
> > irrelevent (but not FLAWED). However, we do get kooks from time
> > to time in this newsgroup who are looking for or claim to be able
> > to reduce the size of *all* files. [Itai and Mikael are this month's
> > examples...]
> >
> > --
> > Samuel S. Paik | http://www.webnexus.com/users/paik/
> > 3D and multimedia, architecture and implementation
> >
> >
>
>

mikael.l...@spray.se

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

Hi!
I was thinking about e.g. Calgary corpus and how much better it can be
compressed. The results mainly from BWT and PPM methods are constantly
getting better and better. My point was that we don't really know about
any (realistic) limits.
Your analogy is interesting but only on a very theoretical level. In real
every day life in front of our computers, an overwhelming majority of our
files are compressible unless they've been compressed already.
I don't think it's healthy to stare too much on these limits even if they
are indeed true. It could scare us from progress, making our computers
and information flow even more efficient. I agree though that every
effort we make should be on a realistic base.

Mikael

In article <3958EF04...@TCE.com>,

Dale King

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

D.A.Kopf wrote in message <395905AD...@dakx.com>...

>Dale King wrote:
>>
>> Tom Gutman wrote:
>> >
><snip>

>> > And that is exactly the flaw. The counting argument says that you
>> > have 2^N files of length N, but only (2^N)-1 files of length <N,
>> > and so you cannot compress all files of length N. But if there
>> > are actually only 2^n files of length N, with n<<N, then the
>> > counting argument allows you to compress all actual files of
>> > length N down to n or less bits.
>>

>> Counting argument works just fine here. That is because it is concerned
>> with the number of input files not the size of the files. And the
>> counting argument says you have 2^n input files and you cannot compress
>> all of them below 2^n bits, because that would be the entropy limit of
>> that source, provided that all input files are equally likely.
>>
>This strikes me as an imprecise mixture of the concept of files with the
>concept of the entropy of a source. A source emits a continuous stream of
>information. If you want to partition that into files of length N, you have
to
>make the decision of whether or not to keep track of the order of all the
>files. If you throw all the files into a hat then the order is lost and
>consequently the number of bits needed to represent the files is reduced
from
>that calculated from the entropy of the source.

It is slightly imprecise in terms of mathematics. The original problem was
basically a source that emitted a "message" or "file" of 4096 bits (128 of
which are set). It is convenient to think of the 2^817 possible files to be
compressed.

There is no order to keep track of. We are given one of these messages and
told to compress it.

>Methinks the same consideration applies to pigeonholes; all the pigeons
>without beaks could be put in head-first; all the others could be debeaked
and
>put in facing out (yuk, let's do that the other way around!) so all the
>pigeonholes could be downsized by a beak-length allowing more holes per
unit
>volume. (Well, if nothing else, this example might suggest an essential
>difference between pigeons and information)

Once again, the pigeon-hole principle does not say that the holes can hold
only one pigeon. Your holes could be the size of Yankee stadium. But if
you've got more pigeons than holes, you have to put more than one pigeon in
a hole. The application to compression is that as part of the decompression
you want to request a certain pigeon by its hole number. If there are 2
pigeons in the hole you don't know which pigeon is the correct one. In other
words if two source files compress to the same compressed file you cannot
decompress them uniquely. You will get one file or the other. It has already
been arguee that you could get both and decide which is the correct one, but
if one is disallowed, then it really was not a legal message from your

Dale King

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

If by "if anybody is still alive to care" you mean to imply that you will
have to try 2^500 different files before anyone notices that your
compression algorithm only works on 1 out of 2^317 files, I think you are
quite mistaken. The very first test of compressing a file and getting back
the original only has a 1 in 2.7 x 10^95 chance of succeeding.
---
-- Dale King

D.A.Kopf wrote in message <3957C79E...@dakx.com>...

Dale King

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

But we know an awful lot about the limits. We know about the entropy of a
source. Unfortunately, many sources are quite complex.

Sure, there is still work to be done on compressors, but that work will
necessarily come down to trying to match compression and decompression to
various sources. Consider this, if we knew exactly the entropy of various
sources could you design a general purpose compressor and decompressor such
that you feed that entropy information along with an input file to the
compressor and be able to compress the information from all those sources to
their entropy limit? Note the only input to the decompressor is the output
of compressor. The answer is no, because of the counting argument. It is
easy to prove. Consider a source that can send a series of symbols A and B
with equal probability. It's entropy limit is 1 bit per symbol and we can
easily compress that as 0 means A and 1 means B. But if we have another
source that is the same except that it sends C and D. It has the same
entropy limit of 1 bit per symbol. But if we could compress both to their
entropy limit a decompressor has no way to distinguish which source it was
from. Any information about the source in the compressed output means we
can't compress to the entropy limit.

So the bottom line is BWT et al. may be allowing us to get better on certain
sources, but they cannot get better on all sources.
--
-- Dale King

mikael.l...@spray.se wrote in message <8jb8jv$1tu$1...@nnrp1.deja.com>...

mikael.l...@spray.se

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

Yes, it's true that random data are very hard to compress ...
Seriously, with A and B in a file by the same amount e.g. ABABABAB ...
PPM or BWT would produce a long, long sequence of zeros. Use runlength-
coding on that along with a huffman- or arithmetic coder and I promise
you a lot less than 1 bit/symbol.
Yes, it's very hard to compress a file to the theoretical limit. I think
we have to know more about a specific file to do that. E.g. if it's a
sound file, then maybe it could be compared to a sinewave to compress
better? If it's text then maybe using a different alphabet could enhance
the file for PPM or BWT? Etc.
We could work to improve existing universal algorithms but specializing
the compressor to a couple of the most common filetypes is an easy way to
get amazing results. All this in consideration, I think it should be
really hard to calculate a realistic theoretical limit for a given file.
I also think that there's still a lot of room for progress in both
universal compression and specialized lossy methods e.g. for images or
sounds.
The counting argument could be a good way to realize why it's so
unreasonable to expect a compressor to compress any file with top
results, why it should be easy to be happy if it at least can compress
most files well. Bloody obvious, isn't it?

Mikael

In article <3959...@news.tce.com>,

"Dale King" <Ki...@TCE.com> wrote:
> But we know an awful lot about the limits. We know about the entropy of a
> source. Unfortunately, many sources are quite complex.
>
> Sure, there is still work to be done on compressors, but that work will
> necessarily come down to trying to match compression and decompression to
> various sources. Consider this, if we knew exactly the entropy of various
> sources could you design a general purpose compressor and decompressor such
> that you feed that entropy information along with an input file to the
> compressor and be able to compress the information from all those sources to
> their entropy limit? Note the only input to the decompressor is the output
> of compressor. The answer is no, because of the counting argument. It is
> easy to prove. Consider a source that can send a series of symbols A and B
> with equal probability. It's entropy limit is 1 bit per symbol and we can
> easily compress that as 0 means A and 1 means B. But if we have another
> source that is the same except that it sends C and D. It has the same
> entropy limit of 1 bit per symbol. But if we could compress both to their
> entropy limit a decompressor has no way to distinguish which source it was
> from. Any information about the source in the compressed output means we
> can't compress to the entropy limit.
>
> So the bottom line is BWT et al. may be allowing us to get better on certain
> sources, but they cannot get better on all sources.
> --
> -- Dale King

Phil Norman

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

On Wed, 28 Jun 2000 05:11:04 GMT,
mikael.l...@spray.se <mikael.l...@spray.se> wrote:
>Yes, it's true that random data are very hard to compress ...

I've noticed this, too. *cough* Very hard.

>Seriously, with A and B in a file by the same amount e.g. ABABABAB ...
>PPM or BWT would produce a long, long sequence of zeros. Use runlength-
>coding on that along with a huffman- or arithmetic coder and I promise
>you a lot less than 1 bit/symbol.

Wonderful. New compression algorithm for any kind of data.

Step 1: Read in each bit of the input, and convert it to the byte
'A', or 'B', depending on whether it's a 0 or a 1.

Step 2: Use PPM or BWT to produce a long stream of zeroes.

Step 3: Use RLE with huffman or arithmetic coding for your
guaranteed compression rate of under 1 bit per symbol.

Do please think before you make such promises - you've just
claimed something which 'breaks' the counting argument ;-).

Cheers,
Phil

--
:wq

paul womack

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

mikael.l...@spray.se wrote:
>
> Yes, it's true that random data are very hard to compress ...

> Seriously, with A and B in a file by the same amount e.g. ABABABAB ...
> PPM or BWT would produce a long, long sequence of zeros. Use runlength-
> coding on that along with a huffman- or arithmetic coder and I promise
> you a lot less than 1 bit/symbol.

> Yes, it's very hard to compress a file to the theoretical limit. I think
> we have to know more about a specific file to do that. E.g. if it's a
> sound file, then maybe it could be compared to a sinewave to compress
> better? If it's text then maybe using a different alphabet could enhance
> the file for PPM or BWT? Etc.
> We could work to improve existing universal algorithms but specializing
> the compressor to a couple of the most common filetypes is an easy way to
> get amazing results. All this in consideration, I think it should be
> really hard to calculate a realistic theoretical limit for a given file.
> I also think that there's still a lot of room for progress in both
> universal compression and specialized lossy methods e.g. for images or
> sounds.
> The counting argument could be a good way to realize why it's so
> unreasonable to expect a compressor to compress any file with top
> results, why it should be easy to be happy if it at least can compress
> most files well. Bloody obvious, isn't it?
>
> Mikael

Yes, we want to compress "real files". So why are we obsessed with the
counting
argument? Mainly because on a 4 week cycle, someone wants to use
recursive/iterative
compression. The "idea" goes (stripped down):

I have a very simple compression algorithm that will compress any file
to 99% of its size
But I can apply it to it's own output.

(the commonest refinement of this "hides" information in the count of
howmany
time this process has been applied)

The actual techniques used are normally "complex" (dare one say
obfuscated?).
The easiest way to show that they are rubbish (without going down the
highways and byways of whatever hand-waving buzzwords the proposal uses)
is the counting argument. Which is why we consider it important, and
relevant.

BugBear

D.A.Kopf

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

Dale King wrote:
><snip>

>
> It is slightly imprecise in terms of mathematics. The original problem was
> basically a source that emitted a "message" or "file" of 4096 bits (128 of
> which are set). It is convenient to think of the 2^817 possible files to be
> compressed.
>
> There is no order to keep track of. We are given one of these messages and
> told to compress it.

I get it now! These 2^817 possible files are an abstraction that allows a
rigorous assertion of how many bits it must take to represent the one file
that actually exists?

<snip>

>
> Once again, the pigeon-hole principle does not say that the holes can hold
> only one pigeon. Your holes could be the size of Yankee stadium. But if
> you've got more pigeons than holes, you have to put more than one pigeon in
> a hole. The application to compression is that as part of the decompression
> you want to request a certain pigeon by its hole number. If there are 2
> pigeons in the hole you don't know which pigeon is the correct one. In other
> words if two source files compress to the same compressed file you cannot
> decompress them uniquely. You will get one file or the other. It has already
> been arguee that you could get both and decide which is the correct one, but
> if one is disallowed, then it really was not a legal message from your
> source.

But if the number of pigeonholes could be doubled (e.g. by size sorting)
without changing the volume of the dovecote, isn't that compression? You seem
to be retaining an aspect of the "addressibility" of a message that is not
explicitly stated.

D.A.Kopf

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

Dale King wrote:
>
> If by "if anybody is still alive to care" you mean to imply that you will
> have to try 2^500 different files before anyone notices that your
> compression algorithm only works on 1 out of 2^317 files, I think you are
> quite mistaken. The very first test of compressing a file and getting back
> the original only has a 1 in 2.7 x 10^95 chance of succeeding.

No fair! You have to first give me the files you want to compress! [As I
recall, the original poster said there were some blocks (s)he wanted to compress...]

Tim

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

paul womack <bug...@NOSPAMmediabridge.net> wrote in article
<3959CCAA...@NOSPAMmediabridge.net>...

> mikael.l...@spray.se wrote:
> >
> > Yes, it's true that random data are very hard to compress ...

> The easiest way to show that they are rubbish (without going down the

> highways and byways of whatever hand-waving buzzwords the proposal uses)
> is the counting argument. Which is why we consider it important, and
> relevant.
>
> BugBear
>

Is it not true that within a sufficiently long sample output from a random
source one could find a sequence that could be compressed by *any* method?
even given the overhead of the offset to that sequence and the length?
While impractical due to the likely size of the sample, this suggests there
may be a theoretical exception to the counting argument. I of course could
be way off base here (have been before).

Just a thought

Tim

mikael.l...@spray.se

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

Hi!
The three step procedure you describe is almost how PPM or BWT works.
"Prediction by Partial Matching" and "Burrows and Wheeler Transform".
They are prediction algorithms. In a long stream of ABABABAB ... it's
very easy to predict a B after an A and and A after a B, don't you think?
I don't think you have understood the counting argument either. It simply
says that if a compressed file is smaller than the uncompressed original
and the compressed file must decompress to one unique original then
according to (unbreakable according to Tate) mathematical laws there are
less possible combinations in a smaller file, why it's impossible to
compress *all* files. Just think about it!
It does not say however that it's impossible to compress one file with
stunning results!

Mikael

In article <slrn8lj6ii.io...@jalua.ch.genedata.com>,

Phil....@genedata.com (Phil Norman) wrote:
> On Wed, 28 Jun 2000 05:11:04 GMT,

> mikael.l...@spray.se <mikael.l...@spray.se> wrote:
> >Yes, it's true that random data are very hard to compress ...
>

> I've noticed this, too. *cough* Very hard.
>

> >Seriously, with A and B in a file by the same amount e.g. ABABABAB ...
> >PPM or BWT would produce a long, long sequence of zeros. Use runlength-
> >coding on that along with a huffman- or arithmetic coder and I promise
> >you a lot less than 1 bit/symbol.
>

> Wonderful. New compression algorithm for any kind of data.
>
> Step 1: Read in each bit of the input, and convert it to the byte
> 'A', or 'B', depending on whether it's a 0 or a 1.
>
> Step 2: Use PPM or BWT to produce a long stream of zeroes.
>
> Step 3: Use RLE with huffman or arithmetic coding for your
> guaranteed compression rate of under 1 bit per symbol.
>
> Do please think before you make such promises - you've just
> claimed something which 'breaks' the counting argument ;-).
>
> Cheers,
> Phil
>
> --
> :wq
>

mikael.l...@spray.se

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

Hi!
Don't forget though that most of these people are simply curious. They
want to know. Why it's great with this newsgroup where it hopefully will
be someone or several with kind harts to help. Even if the history of
human kind is full of war and violent death, we're still good at
cooperation, aren't we?

Mikael

> Yes, we want to compress "real files". So why are we obsessed with the
> counting
> argument? Mainly because on a 4 week cycle, someone wants to use
> recursive/iterative
> compression. The "idea" goes (stripped down):
>
> I have a very simple compression algorithm that will compress any file
> to 99% of its size
> But I can apply it to it's own output.
>
> (the commonest refinement of this "hides" information in the count of
> howmany
> time this process has been applied)
>
> The actual techniques used are normally "complex" (dare one say
> obfuscated?).

> The easiest way to show that they are rubbish (without going down the
> highways and byways of whatever hand-waving buzzwords the proposal uses)
> is the counting argument. Which is why we consider it important, and
> relevant.
>
> BugBear
>

mikael.l...@spray.se

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

No, he's right. It is in the nature of random data to be uncompressible.
How could you possibly see any repeating patterns there?
The counting argument (or theorem) just say that it's impossible to
compress *all* files. It does not however say it's impossible to compress
*one* file.

Mikael

In article <01bfe104$704126e0$4b51143f@tim>,

"Tim" <ther...@earthlink.net> wrote:
>
>
> paul womack <bug...@NOSPAMmediabridge.net> wrote in article
> <3959CCAA...@NOSPAMmediabridge.net>...

> > mikael.l...@spray.se wrote:
> > >
> > > Yes, it's true that random data are very hard to compress ...
>

> > The easiest way to show that they are rubbish (without going down the
> > highways and byways of whatever hand-waving buzzwords the proposal uses)
> > is the counting argument. Which is why we consider it important, and
> > relevant.
> >
> > BugBear
> >
>

> Is it not true that within a sufficiently long sample output from a random
> source one could find a sequence that could be compressed by *any* method?
> even given the overhead of the offset to that sequence and the length?
> While impractical due to the likely size of the sample, this suggests there
> may be a theoretical exception to the counting argument. I of course could
> be way off base here (have been before).
>
> Just a thought
>
> Tim
>

Phil Norman

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

On Wed, 28 Jun 2000 13:51:48 GMT,
mikael.l...@spray.se <mikael.l...@spray.se> wrote:
>The three step procedure you describe is almost how PPM or BWT works.
>"Prediction by Partial Matching" and "Burrows and Wheeler Transform".
>They are prediction algorithms. In a long stream of ABABABAB ... it's
>very easy to predict a B after an A and and A after a B, don't you think?

Oh, my appologies. When you said "a stream of As and Bs, eg:
'ABABAB'" (or something very similar), I assumed that was just an
example of a stream of As and Bs, and that you weren't specifying
any particular pattern. According to your words, you did actually
guarantee compression of any stream of As and Bs, not just the
particular one you gave as an example.

>I don't think you have understood the counting argument either. It simply
>says that if a compressed file is smaller than the uncompressed original
>and the compressed file must decompress to one unique original then
>according to (unbreakable according to Tate) mathematical laws there are

>less possible combinations in a smaller file, why it's impossible to
>compress *all* files. Just think about it!

I'm quite aware of what the counting argument is, how it works,
why it works, and that it indeed is defined by unbreakable
mathematical laws.

>It does not say however that it's impossible to compress one file with
>stunning results!

This is patently obvious to anyone who's ever tried compressing
large files of zeroes.

Cheers,
Phil

--
:wq

Phil Norman

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

On Wed, 28 Jun 2000 14:05:21 GMT,
mikael.l...@spray.se <mikael.l...@spray.se> wrote:
>Don't forget though that most of these people are simply curious. They
>want to know. Why it's great with this newsgroup where it hopefully will
>be someone or several with kind harts to help.

And when people turn up with ideas about amazing compression
algorithms, they do initially tend to get gently brought down with
news of the counting argument by people who spend a lot of time
and patience explaining it. However, depressingly often the
response to this is "Oh, you're all blinded by your beliefs -
infinite compression must exist!".

>Even if the history of human kind is full of war and violent
>death, we're still good at cooperation, aren't we?

Yes - sometimes we team up in order to impose war and violent
death on a larger scale.

Cheers,
Phil

--
:wq

SCOTT19U.ZIP_GUY

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

mikael.l...@spray.se wrote in <8jd2oq$baa$1...@nnrp1.deja.com>:

>No, he's right. It is in the nature of random data to be uncompressible.
>How could you possibly see any repeating patterns there?
>The counting argument (or theorem) just say that it's impossible to

>compress *all* files. It does not however say it's impossible to
>compress *one* file.

>
>Mikael
>
>In article <01bfe104$704126e0$4b51143f@tim>,
> "Tim" <ther...@earthlink.net> wrote:
>>
>>
>> paul womack <bug...@NOSPAMmediabridge.net> wrote in article
>> <3959CCAA...@NOSPAMmediabridge.net>...
>> > mikael.l...@spray.se wrote:
>> > >
>> > > Yes, it's true that random data are very hard to compress ...
>>
>> > The easiest way to show that they are rubbish (without going down
>> > the highways and byways of whatever hand-waving buzzwords the
>> > proposal uses) is the counting argument. Which is why we consider it
>> > important, and relevant.
>> >
>> > BugBear
>> >
>>
>> Is it not true that within a sufficiently long sample output from a
>> random source one could find a sequence that could be compressed by
>> *any* method? even given the overhead of the offset to that sequence
>> and the length? While impractical due to the likely size of the
>> sample, this suggests there may be a theoretical exception to the
>> counting argument. I of course could be way off base here (have been
>> before).
>>
>> Just a thought
>>
>> Tim
>>
>
>

>Sent via Deja.com http://www.deja.com/
>Before you buy.

The counting argument only means not all files can be compressed to
a smaller size by a given compression method. But no file is truely
random in the sense. That just because come certain compression routine
does not compress it to a smaller size. Then some other compression routine
will compress it to a smaller size. No finite file is truely random to
every possible compressor routine.

David A. Scott
--
SCOTT19U.ZIP NOW AVAILABLE WORLD WIDE
http://www.jim.com/jamesd/Kong/scott19u.zip
Scott famous encryption website NOT FOR WIMPS **no JavaScript allowed**
http://members.xoom.com/ecil/index.htm
Scott rejected paper for the ACM
http://members.xoom.com/ecil/dspaper.htm
Scott famous Compression Page WIMPS allowed ** JavaScript OK**
http://members.xoom.com/ecil/compress.htm
**NOTE EMAIL address is for SPAMERS***
I leave you with this final thought from President Bill Clinton:
"The road to tyranny, we must never forget, begins with the destruction
of the truth."

s...@nospam.unt.edu

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

Itai Bar-Haim <av...@bezeqint.net> wrote:

> Suppose I do have (and I don't, and probably won't have in the near time :-)
> a compression algorithm that can compress ALL files. How much would you pay
> for such algorithm?
> I estimate roughly the value of such algorithm in about 50 billion dollars
> US. After all, it does solve the world's data transfer and storage
> problems...
> Now: with such great sum of money in mind, wouldn't you try to achive this
> goal?

Here's a chance to practice your probability theory to calculate the
expected value of your invention. To get the expected value, you take
the set of all possible events, and multiply the probability of each
even by the value of the event. In this case:

Event=you find such an algorithm. Probability=0, Value=50,000,000,000
Event=you don't find it. Probability=1, Value=0

Now you're expected value is 0*50000000000 + 1*0 = 0. Doesn't look
like you should waste much time on this, does it? :-)

--
Steve Tate --- srt[At]cs.unt.edu | Gratuitously stolen quote:
Dept. of Computer Sciences | "The box said 'Requires Windows 95, NT,
University of North Texas | or better,' so I installed Linux."
Denton, TX 76201 |

D.A.Kopf

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

mikael.l...@spray.se wrote:
<snip>

> I don't think you have understood the counting argument either. It simply
> says that if a compressed file is smaller than the uncompressed original
> and the compressed file must decompress to one unique original then
> according to (unbreakable according to Tate) mathematical laws there are

> less possible combinations in a smaller file, why it's impossible to
> compress *all* files. Just think about it!

It's easy to compress *all* files of a given length. Sort them into ascending
order, take the forward difference, run-length encode, and (if you're really
anal) save the result for the decompresser.

> It does not say however that it's impossible to compress one file with
> stunning results!

Or any given set of files.

Alan Morgan

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

In article <8jd2oq$baa$1...@nnrp1.deja.com>, <mikael.l...@spray.se> wrote:
>No, he's right. It is in the nature of random data to be uncompressible.
>How could you possibly see any repeating patterns there?
>The counting argument (or theorem) just say that it's impossible to

>compress *all* files. It does not however say it's impossible to compress
>*one* file.

No kidding. Give me a file and I'll write a compression program that
will compress it down to one bit. I warn you, the decompression code
will be rather large. Roughly the size of the original file, as a matter
of fact.

If you require that you include the size of the decompression code
in your total (not unreasonable. It helps you avoid stunt programs
like the one I just described) then there are some files that are
incompressible. Quite a lot of files, actually. You will rarely
run into these files in practice, but they do exist.

Alan

Itai Bar-Haim

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

Who said I am going to invest 2 dollars? My plan is to make 50 billion
dollars of nothing ;-)

Binder Edgar <Binde...@T-Online.de> wrote in message
news:39591311...@T-Online.de...

> Hi,
>
> Itai Bar-Haim wrote:
>
> [compression of *all files*]

> > I estimate roughly the value of such algorithm in about 50 billion
dollars
> > US. After all, it does solve the world's data transfer and storage
> > problems...
> > Now: with such great sum of money in mind, wouldn't you try to achive
this
> > goal?

Antaeus Feldspar

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

Itai Bar-Haim <av...@bezeqint.net> wrote:

> Just a question:

> Suppose I do have (and I don't, and probably won't have in the near time :-)
> a compression algorithm that can compress ALL files. How much would you pay
> for such algorithm?

> I estimate roughly the value of such algorithm in about 50 billion dollars
> US. After all, it does solve the world's data transfer and storage
> problems...
> Now: with such great sum of money in mind, wouldn't you try to achive this
> goal?
> :-)

Well, if I could find someone willing to pay me $5 million USD
yearly for watching porn videos eight hours day, I could retire young
and rich.

Now, with such a great sum of money in mind, *shouldn't* I
assume, without a shred of evidence to support it, that there *is* such
a job opportunity out there waiting for me? And that I should expend
effort trying to find it?

The situation I'm describing is only *unlikely.* And I'd be
silly to pursue it, seeing as how I have no evidence that it exists.

What you are proposing is not unlikely, it is IMPOSSIBLE. And
you are not just seeking it with no evidence that it exists; you are
seeking it *despite* evidence that it *cannot.*

-jc

--
* -jc IS *NOW* feld...@cryogen.com
* Home page: http://members.tripod.com/~afeldspar/index.html
* The home of >>Failed Pilots Playhouse<<
* "Better you hold me close than understand..." Thomas Dolby

kaptke...@my-deja.com

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

Your arguement is absolutely absurd. It may not be obvious that all
compression schemes are based in math but ALL are none the less. In
fact, (while we have doubtlessly colored thinking) it is none the less
true that math is a most pure science and that math can be used to
model anything in existance (this is surely true of compression
schemes). It is the fact that math is such an unbiased and pure
science that allows us to conclude that which we could otherwise not
prove (assuming that our postulates are correct). The counting
arguement is based on some simple postulates and through it we can
conclude that not all sources can be modeled more efficiently that in
their original state. It is not an assumption, but an undeniable
conclusion.

But goodluck just the same.

-mike
--
Michael A Maniscalco
Visit my homepage on the new M99 data compression scheme at
http://michaelmaniscalco.homepage.com/M99index.htm

In article <3954...@news.bezeqint.net>,
"Itai Bar-Haim" <av...@bezeqint.net> wrote:
> Hi.
> Since Math actually based on definitions, proven to work in many
cases (well
> all of them, actually :-), I wouldn't try to break that theory. If
some
> smart greek guys many years ago have decided that 1+1=2, than you
can't tell
> them that you found a way to break their own definition and that from
now on
> 1+1=3. That's why it's definition. What is possible, though, is to
create a
> "newer version of math". You will define your own rules of the new
language
> of numbers, and than everything will work the way you want it.
> I think that theories that are based on definitions are not supposed
to be
> broken. Actually, I think they cannot be broken. But pure theories
based on
> logical thinking only can be broken.
> The theory of data compression is based on logical thinking only, and
many
> of the compression algorithms are not based on math at all. That's
why I
> believe that the compression theory that says that infinate
compression is
> impossible is wrong. It is logical, but yet, I think it's wrong.
That's why
> I'm going to try to prove that it is wrong, by developing an
algorithm that
> will actually compress any data whatsoever (probably under minimum
block
> size restrictions, so what?).
> If anyone think the same and want to try to help me, he/she may
contact me,
> and I'll be glad to discuss this issue.
>
> Itai.
>
> Alan Morgan <amo...@Xenon.Stanford.EDU> wrote in message
> news:8j03u9$d2k$1...@nntp.Stanford.EDU...
> > In article <slrn8l613a.t6...@jalua.ch.genedata.com>,
> > Phil Norman <Phil....@genedata.com> wrote:
> > >On Thu, 22 Jun 2000 20:39:32 GMT,
> > > mikael.l...@spray.se <mikael.l...@spray.se> wrote:
> > >>How true it is what you are saying ...
> > >>I almost suspect that when people come with a lot of difficult
theory to
> > >>explain why it's impossible to solve this or that problem, they
actually
> > >>really don't have a clue. The fact remains, the most powerful
tool you
> > >>have is your mind. If your attitude is that nothing is impossible
and
> you
> > >>try and try again, NEVER EVER giving up, then the problem is
destined to
> > >>be solved no matter what the NO-sayers tries to tell you!
> > >>Where would we be if not Edison, Marconi or Einstein, with this
> attitude,
> > >>had not done what they did?
> > >
> > >Absolutely. I, personally, have been trying to get 1+1 to equal 3
> > >for a good number of years now. I've not managed it yet (when I
> > >do, I'll post the results on http://inkvine.fluff.org/~forrey/),
> > >but *I'm* certainly not giving up just because mathematicians say
> > >it's impossible. What do they know? They're just constrained by
> > >all the prejudices they've been taught during their studies.
> >
> > Good work! Might I suggest that, when you have finished this, you
> > start to look for the even prime numbers greater than 2.
Mathematicians
> > claim they don't exist, but they haven't checked all of them so they
> > can't possibly know. I'm pretty sure that given the number of even
> > numbers greater than two that *one* of them has to be prime.
> >
> > Alan

Dror Baron

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

>
> Here's a chance to practice your probability theory to calculate the
> expected value of your invention. To get the expected value, you take
> the set of all possible events, and multiply the probability of each
> even by the value of the event. In this case:
>
> Event=you find such an algorithm. Probability=0, Value=50,000,000,000
> Event=you don't find it. Probability=1, Value=0
>
> Now you're expected value is 0*50000000000 + 1*0 = 0. Doesn't look
> like you should waste much time on this, does it? :-)
>

Interesting approach.
Some of Itai's comments have been so ridiculous that
I wonder whether he actually knows what 'math' and
'logic' and 'proof' means.

Dror

Dror Baron

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

> Who said I am going to invest 2 dollars? My plan is to make 50 billion
> dollars of nothing ;-)

At least Itai knows his current situation.

Dror

mikael.l...@spray.se

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

You and Tate are completely right of course. However not so polite.
Why not create something positive out of this and make a compromise?
How about a compressor with compression similar to or better than Malcolm
Taylor's rk and its fantastic compression combined with memory use and
speed similar to or faster than some of the lz based compressors?
Isn't this realistic?
Isn't this of high commercial value?

Mikael

In article <Pine.GSO.4.21.000629...@chopin.ifp.uiuc.edu>
,

Edgar Binder

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

mikael.l...@spray.se wrote:

> How about a compressor with compression similar to or better than Malcolm
> Taylor's rk and its fantastic compression combined with memory use and
> speed similar to or faster than some of the lz based compressors?
> Isn't this realistic?
> Isn't this of high commercial value?

Hmm, I don't think that has much commercial potential, because Zip got too
much of a standard - but that's only my oppinion...

--
Edgar

Dror Baron

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

Hi Mikael,

I agree completely that sometimes I lack politeness.
However, I was quiet for several weeks, but I've really
had it. Itai began his series of postings asking how to
compress his interesting source. And people explained
to him in detail why theoretically his requirement was
not realistic (actually, it was impossible).
But apparently Itai was not satisfied with these answers
and decided that all of information theory was useless.
So I'm sorry to be blunt, but Itai's approach of 'you are
all wrong I'll show you' and the '50 billion' claims,
all these got me pissed.
Perhaps Itai is a talented programmer/engineer, but he
needs to brush up on his theory.

Dror

> You and Tate are completely right of course. However not so polite.
> Why not create something positive out of this and make a compromise?

> How about a compressor with compression similar to or better than Malcolm
> Taylor's rk and its fantastic compression combined with memory use and
> speed similar to or faster than some of the lz based compressors?
> Isn't this realistic?
> Isn't this of high commercial value?
>

David Thompson

unread,

Jul 5, 2000, 3:00:00 AM7/5/00

to

Dale King <Ki...@TCE.com> wrote :
...

> While on the way home last night, I think I have come up with a very
> good analogy that makes the couting argument much clearer. The analogy
> is the telephone system and telephone numbers.
>
> Imagine we have a large city or state and we want to assign telephone

> numbers to 20 million homes or businesses, etc. ... Using only 7-digit
phone
> numbers only gives us 10^7 or 10 million unique telephone numbers. ...

>
> So the solution used in the US is to use area codes which are three
> digits. Therefore each telephone destination in the US is uniquely
> identified using a 10 digit number. So theoretically you could have 10

> billion phone numbers all of which are 10 digits. ... For calls within
> the same area code you can simply dial the 7 digit number. ... But
> [f]or any non-local number you have to dial a 1 [first]....
>
A few minor points, perhaps more than needed for your analogy:

"US" telephone exchanges must actually begin with a digit 2-9
so that a "long-distance" prefix of 1 or 0 is unambiguous
(since as you note later tel. nos. are required to be a prefix code)
thus you can have at most 8e6 subscriber numbers per area code.
For historical reasons area codes also must begin with 2-9,
so there are at most 800 areas, and some of them are reserved
for things like 911, 800 (now also 877 and 866), 411, etc.
(Originally, exchanges were constrained to be (2-9)(2-9)(0-9)
and area codes to be (2-9)(0-1)(0-9), but these restrictions
became technologically unnecessary and politically unacceptable
about 10 years ago. And in practice because of people starting
and stopping phone service in ways that are not finely predictable,
you can never actually have all possible numbers assigned.)

In big metropolitan areas like New York and Los Angeles,
nowadays shortcut 7-digit dialing is almost useless because
many calls even to nearby locations are a different area code,
and if and where new area codes are added by overlay rather than
(as traditional) geographic split, the FCC *requires* that 10-digit
dialing be mandated for "fairness", to prevent the incumbent carrier(s)
from being more convenient to use initially than newer competitors.

The "US" telephone numbering scheme actually covers Canada
and most of the Caribbean as well and is formally called
the North American Numbering Plan (NANP).

--
- David.Thompson 1 now at worldnet.att.net