Compression boils down to two factors, understanding the uniqueness of the
data and representing it efficiently.
This is generally considered as modeling the data and encoding it. This is
taking advantage of unique knowledge of the distribution of the data.
Whether this is done statically or dynamically.
Replace the term 'any' with 'random' and this can be proven mathematically.
If you are referring to 'any' as being a specific and determinalistic
dataset, then yes you can create a unique 'algorithym' or model to compress
it efficiently.
There will be many others who can describe these issues in far greater
detail and correctness
Bill
"Itai Bar-Haim" <av...@bezeqint.net> wrote in message
news:3951...@news.bezeqint.net...
> Hi.
> I have read in several posts already that it is impossible to compress
data
> to less than X bits since the number of mathematical combinations of bits
in
> a certain form has to be represented by X bits.
> That is not true.
> When dealing with compression, mathematical combinations has many times
> NOTHING to do with the compression algorithm.Compression usually uses
> dictionarries, sorting, and other types of not-mathematical ways to handle
> the data.
> That's why there must be a way to compress any data whatsoever, by finding
> the right algoritm.
>
> Itai.
>
>
"Itai Bar-Haim" <av...@bezeqint.net> wrote in message
news:3951...@news.bezeqint.net...
> Hi.
> I have read in several posts already that it is impossible to compress
data
> to less than X bits since the number of mathematical combinations of bits
in
> a certain form has to be represented by X bits.
> That is not true.
> When dealing with compression, mathematical combinations has many times
> NOTHING to do with the compression algorithm.Compression usually uses
> dictionarries, sorting, and other types of not-mathematical ways to handle
> the data.
What is your definiation of a mathematical combination? From what I see,
dictionaries, huffman trees, sorted lists, etc are all types of
mathematical combinations.
> That's why there must be a way to compress any data whatsoever, by finding
> the right algoritm.
>
But, then you'd also have to include the algorithm in the compressed
results. This
sucks up space. Weither you like it or not, the Counting Argument still
holds true.
> Itai.
Harvey Rook
Of course, it must be realized that *all possible files* could take much
longer than the universe will exist to ever generate.
$1 says at least another twelve posts?
The process of compression and decompression can be considered as this
process:
input data ----> compressed data ----> original data
The mathematical combinations part tells you how many possible input
files you can possibly see. In your case that was shown to be around
2^817 possible input files. Let's imagine that we had a compression
algorithm that could compress all 2^817 input files by at least one bit.
That means that the compressed data would contain no more than 816 bits.
The problem is that would require multiple sets of input data to map to
the same compressed data file. This is because the number of data files
containing at most 816 bits is 2^817 - 1 (I am including a zero bit
file). That seems to say that we can compress all the files but 1, but
then there's a little problem of needing to know how many bits are in
the output. That has to somehow be encoded into the compressed data as
well and must be added to the size of the data. The naive way would be
to simply include 10 extra bits giving the size in bits. But if you add
these 10 bits to all of the compressed data files you now have a net
loss if you average over all possible input files. Certainly, some of
the input files will indeed compress down to less than 817 bits, but
they are in the vast minority. Most will actually grow in size. You can
cut down how much they grow by adding a bit that says whether the data
compressed or not. But this still means that over half of the input
files will grow by 1 bit. Still a net loss on average.
Note that this analysis did not look at any particular algorithm. It
merely looked at the input and output. Therefore the algorithm used does
not matter to the compressability of random data. You cannot escape the
fact that there is a theoretical lower limit for compression. That
depends on the characteristics of the input data source. In information
theory that is quanitified by entropy.
What compression software does is try to map the most frequently
occurring input files onto those compressed files that are smaller than
the input file. By doing so, there must exist input files that are
actually expanded. It is a mathematical law. But what we try to do is
make sure that the ones that are expanded by the compressor are very
rare.
You will see various compressors that do wonders on certain input files.
For instance compressing English text will often yield better than 50%
compression. But that is because the input source does not have high
entropy. The dictionary compressors you mention are simply exploiting
the fact that text is not random data. Dictionary compressors rely on
the fact that phrases are often repeated in the text. If that is not
true of your text it may actuall expand it. Other compressors may
exploit the facts like, if you see the the letters "elephan", the odds
of the next letter being t are nearly 100%. The odds of the next
character being Z is pretty much zero.
You have said that your data is random. That means it has high entropy.
There is way to predict what the next bit will be outside of the already
established rules for how many bits are set and how many total bits
there are. That information gets us down to 817 bits. Without any more
knowledge of the data you cannot do better than that for all possible
inputs. You can pick some that will be compressed, but at the cost of
expanding others.
--
--- Dale King
Recruiters: I am not willing to relocate outside of Indianapolis!
Of course, how long these threads go on can only be governed by how long
it takes to sink into the person claiming the impossibile is possible.
Hopefully it will sink into Itai quickly.
On this topic though, is there a good web site that plainly explains the
counting principle? I have never been able to find a good one. Note, I
understand the counting principle, I just want a nice site to point
people to to cut threads like this short without having to take 1/2 an
hour to explain it myself.
"D.A.Kopf" wrote:
>
> Harvey Rook wrote:
> >
> > Oh no! It's another infinite compressions is possible thread.. Any bets on
> > how long this will go on?
>
> Of course, it must be realized that *all possible files* could take much
> longer than the universe will exist to ever generate.
>
> $1 says at least another twelve posts?
Certainly not. At least from now on. (That's four and counting)
>
> Of course, how long these threads go on can only be governed by how long
> it takes to sink into the person claiming the impossibile is possible.
> Hopefully it will sink into Itai quickly.
>
> On this topic though, is there a good web site that plainly explains the
> counting principle? I have never been able to find a good one. Note, I
> understand the counting principle, I just want a nice site to point
> people to to cut threads like this short without having to take 1/2 an
> hour to explain it myself.
Sesame Street has a quite understandable introduction to the topic. As I
recall, hosted by an animated character named "Count Dracula". They present
him with animated objects that occupy his animated space, and he counts them.
Don't know if he exists in a virtual-world web site. Even more unsure about
any relevance of this to the world *we* live in!
Itai.
> If you are referring to 'any' as being a specific and determinalistic
> dataset, then yes you can create a unique 'algorithym' or model to compress
> it efficiently.
An example: you can compress a zillion bytes file into a single bit...
as long those are exactly the zillion bytes in a row that bit is supposed
to represent... but if you knew it in the first time, why bother to trans-
mit even that bit? Then we have compressed a zillion bytes to nothing...
but then we have no information, for we knew what was going to be transmi-
tted...
So, if we want a general method to be useful to every possible stream,
there's no way around the counting principle.
--
Juan de Dios Santander Vela
Ingeniero en Electrónica (Electronics Engineer)
España (Spain)
e-mail: juand...@airtel.net ICQ: 22903952
Well, it is caused by the original counting question, so it
probably counts as part of the discussion. Of course, the real
fun starts when sub-threads start getting renamed. Hopefully this
one won't be so enormous.
>On this topic though, is there a good web site that plainly explains the
>counting principle? I have never been able to find a good one. Note, I
>understand the counting principle, I just want a nice site to point
>people to to cut threads like this short without having to take 1/2 an
>hour to explain it myself.
Has anyone mentioned the FAQ yet? I don't think they have.
There's a wonderful explanation in section 9.2 of the FAQ which
provides a particularly clear example anyone should be able to
follow. URLs for the FAQ are:
ftp://rtfm.mit.edu/pub/usenet/news.answers/compression-faq/
http://www.landfield.com/faqs/compression-faq/part1/preamble.html
http://www.cis.ohio-state.edu/hypertext/faq/usenet/compression-faq/top.html
There's also a section slightly later on about software patents
which is rather amusing.
Cheers,
Phil
The bit is transmitted to let the decoder know that the zillion bit
message is included in the original message. If no information was
transmitted then no assumption could be made about what the message
contained (unless this is explicity defined in the decoder syntax).
paul
Phil Norman wrote:
>
> >On this topic though, is there a good web site that plainly explains the
> >counting principle? I have never been able to find a good one. Note, I
> >understand the counting principle, I just want a nice site to point
> >people to to cut threads like this short without having to take 1/2 an
> >hour to explain it myself.
>
> Has anyone mentioned the FAQ yet? I don't think they have.
> There's a wonderful explanation in section 9.2 of the FAQ which
> provides a particularly clear example anyone should be able to
> follow. URLs for the FAQ are:
>
> ftp://rtfm.mit.edu/pub/usenet/news.answers/compression-faq/
> http://www.landfield.com/faqs/compression-faq/part1/preamble.html
> http://www.cis.ohio-state.edu/hypertext/faq/usenet/compression-faq/top.html
>
> There's also a section slightly later on about software patents
> which is rather amusing.
Best regards and good luck,
Mikael
In article <3951...@news.bezeqint.net>,
Sent via Deja.com http://www.deja.com/
Before you buy.
> I almost suspect that when people come with a lot of difficult theory to
> explain why it's impossible to solve this or that problem, they actually
> really don't have a clue. The fact remains, the most powerful tool you
> have is your mind. If your attitude is that nothing is impossible and you
> try and try again, NEVER EVER giving up, then the problem is destined to
> be solved no matter what the NO-sayers tries to tell you!
> Where would we be if not Edison, Marconi or Einstein, with this attitude,
> had not done what they did?
It's not only a matter of attitude: they realized which were the difficul-
ties... and went other way. If you want to compress EVERY possible stream
losslessly, you're under the counting argument. So if you want to get rid
of it, you'll have to be out of its range.
And, about the NO-sayers, anyone with a perpetual movement machine? Every-
body says you can't, but if you try and try...
While some see them as the crazy ones,
we see genius.
Because the people who are crazy enough to think
they can change the world, are the ones who do.
You're right, there should always be some realism included when making
decisions what to do with problems through life. But we don't know
everything now and will most certainly never do. So, that's why we need
these crazy people that are crazy enough to think they can change the
world. We should not inhibit them with ill-educated "advice" but
encourage them!
Best regards,
Mikael Lundqvist
Software engineer
Sweden
In article <B5784A58.7FD9%juand...@airtel.net>,
Feel free to try all you want and don't give up. When you find a way
around these problems be sure to let us know :-)
And funny you should mention Einstein. He seemed to put some limits on
the universe like not being able to go beyond the speed of light. He
must be wrong. If we put forth enough effort we should be able to break
the speed of light barrier.
(Have we hit 12 messages on this topic yet?)
mikael.l...@spray.se wrote:
>
> How true it is what you are saying ...
> I almost suspect that when people come with a lot of difficult theory to
> explain why it's impossible to solve this or that problem, they actually
> really don't have a clue. The fact remains, the most powerful tool you
> have is your mind. If your attitude is that nothing is impossible and you
> try and try again, NEVER EVER giving up, then the problem is destined to
> be solved no matter what the NO-sayers tries to tell you!
> Where would we be if not Edison, Marconi or Einstein, with this attitude,
> had not done what they did?
--
> On Wed, 21 Jun 2000 17:46:10 -0500, Dale King <Ki...@TCE.com> wrote:
> >But do you include this message in the count, since it is kind of a
> >meta-thread (a thread about the thread)?
>
> Well, it is caused by the original counting question, so it
> probably counts as part of the discussion. Of course, the real
> fun starts when sub-threads start getting renamed. Hopefully this
> one won't be so enormous.
>
>
> >On this topic though, is there a good web site that plainly explains the
> >counting principle? I have never been able to find a good one. Note, I
> >understand the counting principle, I just want a nice site to point
> >people to to cut threads like this short without having to take 1/2 an
> >hour to explain it myself.
>
> Has anyone mentioned the FAQ yet? I don't think they have.
> There's a wonderful explanation in section 9.2 of the FAQ which
> provides a particularly clear example anyone should be able to
> follow. URLs for the FAQ are:
Actually, it's not all that clear. IMHO, anyways. I've been
trying for quite a while to come up with a way to express it that will
make the principle involved even clearer. I *think* I have something,
but I haven't had a lot of time to work on it.
> ftp://rtfm.mit.edu/pub/usenet/news.answers/compression-faq/
> http://www.landfield.com/faqs/compression-faq/part1/preamble.html
> http://www.cis.ohio-state.edu/hypertext/faq/usenet/compression-faq/top.html
>
> There's also a section slightly later on about software patents
> which is rather amusing.
>
> Cheers,
> Phil
-jc
--
* -jc IS *NOW* feld...@cryogen.com
* Home page: http://members.tripod.com/~afeldspar/index.html
* The home of >>Failed Pilots Playhouse<<
* "Better you hold me close than understand..." Thomas Dolby
Pigeons? Why pigeons? I thought we were talking about information. But if you
insist on pigeons, it appears that unlimited geographical information can be
crammed into their tiny finite brains.
>
> Feel free to try all you want and don't give up. When you find a way
> around these problems be sure to let us know :-)
>
> And funny you should mention Einstein. He seemed to put some limits on
> the universe like not being able to go beyond the speed of light. He
> must be wrong. If we put forth enough effort we should be able to break
> the speed of light barrier.
The current thinking is that tachyons actually have to gain energy to slow down.
>
> (Have we hit 12 messages on this topic yet?)
One more, by my count.
Mikael
In article <3952941D...@TCE.com>,
Dale King <Ki...@TCE.com> wrote:
> And I don't care how much effort you put forth you cannot get 10 gallons
> of water into a 5 gallon container (perhaps with a black hole, but then
> we would have to be more rigorous on our definition of a gallon). That
> in a nutshell is the counting argument! Or alternatively as the
> pigeon-hole principle you can't put n pigeons into n-1 holes without
> putting 2 pigeons in the same hole. Those aren't complicated theories,
> they are simple logic.
>
> Feel free to try all you want and don't give up. When you find a way
> around these problems be sure to let us know :-)
>
> And funny you should mention Einstein. He seemed to put some limits on
> the universe like not being able to go beyond the speed of light. He
> must be wrong. If we put forth enough effort we should be able to break
> the speed of light barrier.
>
> (Have we hit 12 messages on this topic yet?)
>
> mikael.l...@spray.se wrote:
> >
> > How true it is what you are saying ...
> > I almost suspect that when people come with a lot of difficult theory to
> > explain why it's impossible to solve this or that problem, they actually
> > really don't have a clue. The fact remains, the most powerful tool you
> > have is your mind. If your attitude is that nothing is impossible and you
> > try and try again, NEVER EVER giving up, then the problem is destined to
> > be solved no matter what the NO-sayers tries to tell you!
> > Where would we be if not Edison, Marconi or Einstein, with this attitude,
> > had not done what they did?
>
> --
> --- Dale King
>
> Recruiters: I am not willing to relocate outside of Indianapolis!
>
Well, let me be that one more but adding that you can fit more pigeons
per hole if you first apply a blender transform. But then we are
talking about lossy compression!
-Michael
--
Michael A Maniscalco
Visit my homepage on the new M99 data compression scheme at
http://michaelmaniscalco.homepage.com/M99index.htm
> It's the attitude and spirit I'm trying to explain and encourage.
> If we let "the common man", mr Smith, decide what is possible and what we
> should do, then you can count on that innovations, inventions or great
> break throughs will never happen because he's saying "It's impossible".
But the attitude and spirit should be directed in a fruitful
direction. When something is easily and clearly shown to be
impossible, encouraging someone to try the impossible is just plain
stupid.
You mentioned Einstein in another posting. Here's a quote from
Einstein:
"One reason why mathematics enjoys special esteem, above all other
sciences, is that its laws are absolutely certain and
indisputable, while those of all other sciences are to some extent
debatable and in constant danger of being overthrown by newly
discovered facts." -- Albert Einstein, 1921
That sums up this situation perfectly. While the physical sciences
(where these other breakthroughs happened that you talked about) might
say something is impossible it is always with the caveat that it's
impossible if our understanding of the basic physical laws is correct.
On the other hand, compression is a purely mathematical process, and
we understand the laws completely and unequivocally. That's why we
can say such compression is impossible, and it really does mean
impossible.
Finally, in a later posting you made some bizarre claims about entropy
and compression. First you demonstrated that you didn't know what
entropy was (by referring to entropy of a file rather than entropy of
a source), and then you claimed that people "overcame the impossible"
by compressing better than entropy. Well, here's some news for you:
You *CAN'T* compress better than entropy. The compressors you
mentioned (BWT, PPM, etc) in fact compress slightly worse than the
entropy of a source, and there will never be a compressor that can
compress (on average) to less than the entropy of the source. Shannon
proved this back in the 40's, and it's one of those absolutely true
facts that will never, ever be contradicted... (now there are some
arguments about whether "real world data" is accurately modeled with
the probabilistic models that you have to use for entropy, but that's
a different question).
--
Steve Tate --- srt[At]cs.unt.edu | Gratuitously stolen quote:
Dept. of Computer Sciences | "The box said 'Requires Windows 95, NT,
University of North Texas | or better,' so I installed Linux."
Denton, TX 76201 |
s...@nospam.unt.edu wrote in article <8ium9f$477$1...@hermes.acs.unt.edu>...
> mikael.l...@spray.se wrote:
>
>(now there are some
> arguments about whether "real world data" is accurately modeled with
> the probabilistic models that you have to use for entropy, but that's
> a different question).
>
No it is not a different question. It goes right to the heart of the matter
of the thread. If these arguments exist and have merit (to which I have no
idea), then we have an 'alternate universe' of models that we need to deal
with. I doubt that it is the case, given the maturity of the science (info
theory), but I suppose its possible.
Tim
> You mentioned Einstein in another posting. Here's a quote from
> Einstein:
>
> "One reason why mathematics enjoys special esteem, above all other
> sciences, is that its laws are absolutely certain and
> indisputable, while those of all other sciences are to some extent
> debatable and in constant danger of being overthrown by newly
> discovered facts." -- Albert Einstein, 1921
>
> That sums up this situation perfectly. While the physical sciences
> (where these other breakthroughs happened that you talked about) might
> say something is impossible it is always with the caveat that it's
> impossible if our understanding of the basic physical laws is correct.
> On the other hand, compression is a purely mathematical process, and
> we understand the laws completely and unequivocally. That's why we
> can say such compression is impossible, and it really does mean
> impossible.
>
It's a fact though that all of today's universal compression algorithms
and implementations are far from the "magical limit" or what the entropy
allows, if I dare to use that word. We can also say that in the practical
world we don't want to pack a file that's been packed already. That would
be plain stupid too! (So what the heck are we debating about?)
> Finally, in a later posting you made some bizarre claims about entropy
> and compression. First you demonstrated that you didn't know what
> entropy was (by referring to entropy of a file rather than entropy of
> a source), and then you claimed that people "overcame the impossible"
> by compressing better than entropy. Well, here's some news for you:
> You *CAN'T* compress better than entropy. The compressors you
> mentioned (BWT, PPM, etc) in fact compress slightly worse than the
> entropy of a source, and there will never be a compressor that can
> compress (on average) to less than the entropy of the source. Shannon
> proved this back in the 40's, and it's one of those absolutely true
> facts that will never, ever be contradicted... (now there are some
> arguments about whether "real world data" is accurately modeled with
> the probabilistic models that you have to use for entropy, but that's
> a different question).
>
I know what entropy means. "Disorder" or "randomness". I should have used
the word "statistics" instead, I was thinking about e.g. huffman coding.
And it's easy to pack a file better than with huffman only, if we use
"bwt, ppm or ctw as preprocessor" isn't it?
I think this also very clearly demonstrates the difference between
scientists like you and engineers like me. For me it's not a lot of
theory, math or carefully laid up proofs of something. For me it's much
more interesting to see what I can do with it, practically. And I don't
always trust what other people tell me, even if it's told by a scientist!
Critical thinking is a good and necessary tool too!
Are you really sure that YOU know how to measure the "source's" entropy?
If you don't, then what is all this theory worth?
Thanks for the feedback! I suppose one should be careful with the words
one choose. The whole world could be reading it ... I'm not good with
words. I suppose that's why I became a programmer instead!
Mikael
Absolutely. I, personally, have been trying to get 1+1 to equal 3
for a good number of years now. I've not managed it yet (when I
do, I'll post the results on http://inkvine.fluff.org/~forrey/),
but *I'm* certainly not giving up just because mathematicians say
it's impossible. What do they know? They're just constrained by
all the prejudices they've been taught during their studies.
What the world needs is new, original thought. Good luck breaking
the counting argument! Einstein, Newton and all their friends
will be proud.
Cheers,
Phil
However, one of those rare forms of lossy compression where the
end result is better than the uncompressed form.
Cheers,
Phil
--
:wq
Pigeon navigation might be impressive, but let's not blow the
whole thing out of proportion, eh? Unless you can come up with
some evidence that there's no limit to the amount of geographical
information which can be stored in a pigeon's brain, I'm not going
to take the above paragraph at all seriously.
Cheers,
Phil
--
:wq
Well, when I read it (I refer particularly to the bit which
states that since there are 2^n possible n-bit files, you can't
represent all those values in 2^(n-1) bits) I found that
explanation really quite beautiful in its simplicity. However,
I've watched that argument fail to be absorbed by enough people to
realise that it's probably not clear enough.
Good luck in finding something better though - I've not thought a
great deal about how one would explain it, but it strikes me as a
difficult concept to get across in unquestionably simple terms.
Cheers,
Phil
--
:wq
No, I have no data on their maximum capacity. This was to subtly convey the
point of view that observation of what actually happens in the real world
should take precedence over mathematical theories about what ought to happen.
Good work! Might I suggest that, when you have finished this, you
start to look for the even prime numbers greater than 2. Mathematicians
claim they don't exist, but they haven't checked all of them so they
can't possibly know. I'm pretty sure that given the number of even
numbers greater than two that *one* of them has to be prime.
Alan
> s...@nospam.unt.edu wrote in article <8ium9f$477$1...@hermes.acs.unt.edu>...
>> mikael.l...@spray.se wrote:
>>
>>(now there are some
>> arguments about whether "real world data" is accurately modeled with
>> the probabilistic models that you have to use for entropy, but that's
>> a different question).
>>
> No it is not a different question. It goes right to the heart of the matter
> of the thread. If these arguments exist and have merit (to which I have no
> idea), then we have an 'alternate universe' of models that we need to deal
> with. I doubt that it is the case, given the maturity of the science (info
> theory), but I suppose its possible.
Of course it's a different question. I can define a source, compute
the entropy, and guarantee you that no compressor you can come up with
compress data from that source any smaller than the entropy allows. I
can prove that you can't do better than entropy. Ever.
But the question of whether it makes sense to talk about entropy and
probabilistic sources of real world data is different. But I was
responding to someone talking about entropy, not real world data.
> I know what entropy means. "Disorder" or "randomness". I should have used
> the word "statistics" instead, I was thinking about e.g. huffman coding.
> And it's easy to pack a file better than with huffman only, if we use
> "bwt, ppm or ctw as preprocessor" isn't it?
But an order-0 Huffman coder is not a universal compressor. The fact
that it can be beat isn't a surprise to anyone (in fact, Shannon's
original paper defining entropy talks about this).
> I think this also very clearly demonstrates the difference between
> scientists like you and engineers like me. For me it's not a lot of
> theory, math or carefully laid up proofs of something.
But don't you think you'd be better off if you understood the theory?
I'm not saying everyone has to be a scientist and work on new
theories, but you need to understand what's there. An engineer
shouldn't just try random stuff in the hopes that something will
work. You need to be strong in the foundations if you expect to make
progress.
> Are you really sure that YOU know how to measure the "source's" entropy?
> If you don't, then what is all this theory worth?
Yes, I'm absolutely sure I know how to measure a source's entropy.
It's a fairly simple theory and formula. Here's a simple example: my
source generates numbers 0-255 (bytes) all random, uniform, and
independent. The entropy of that source is exactly 8 bits per symbol,
and you will never be able to compress it any better than that.
I think you've got it backwards in this case. The common man (you)
thinks it can be done because he doesn't understand the mathematics.
Those who do understand know that it is impossible and are laughing at
those who claim it is possible.
To reiterate what someone else said, a file does not have an entropy
limit. A source has an entropy limit. And I can certainly compress A
file below the source's entropy limit, but any such lossless compression
algorithm MUST necessarily compress most of the other files from that
source to a size greater than the entropy limit. If you average the
compression over all possible messages from that source you will be at
or above the entropy limit. That is an inescapable mathematical law.
All those algorithms that you mentioned do is try to be better about
selecting which of the possible messages from a source get mapped to the
smaller files. They look for specific types of non-randomness to the
data. Some algorithms are better than others on detecting the non random
patterns. None will let you compress all files from a random source
below the source's entropy limit.
> When speculations about sending people to the moon came in the 50's, the
> common man laughed at it and stated categorically that IT-IS-IMPOSSIBLE
> and laughed some more. We all know the outcome of that too ...
Once again you are confusing impossible with difficult or impractical.
The impossible in this might be better stated as illogical. Putting a
man on the moon in the 50's would definitely appear impossible, but that
is simply because it would be very difficult and impractical. It was not
however illogical. Logically there was no reason why someone couldn't be
on the moon. Compression below a source's entropy limit, like the even
prime number > 2, are logical impossibilities. The existance of such a
prime number or such a compressor is not logical.
> From Apple:
>
> While some see them as the crazy ones,
> we see genius.
>
> Because the people who are crazy enough to think
> they can change the world, are the ones who do.
>
> You're right, there should always be some realism included when making
> decisions what to do with problems through life. But we don't know
> everything now and will most certainly never do. So, that's why we need
> these crazy people that are crazy enough to think they can change the
> world. We should not inhibit them with ill-educated "advice" but
> encourage them!
>
> Best regards,
> Mikael Lundqvist
> Software engineer
> Sweden
>
> In article <B5784A58.7FD9%juand...@airtel.net>,
> juand...@airtel.net wrote:
> > en artÃculo 8itti2$m7t$1...@nnrp1.deja.com, mikael.l...@spray.se
> > <mikael.l...@spray.se> escribió:
> >
> > > I almost suspect that when people come with a lot of difficult theory to
> > > explain why it's impossible to solve this or that problem, they actually
> > > really don't have a clue. The fact remains, the most powerful tool you
> > > have is your mind. If your attitude is that nothing is impossible and you
> > > try and try again, NEVER EVER giving up, then the problem is destined to
> > > be solved no matter what the NO-sayers tries to tell you!
> > > Where would we be if not Edison, Marconi or Einstein, with this attitude,
> > > had not done what they did?
> >
> > It's not only a matter of attitude: they realized which were the difficul-
> > ties... and went other way. If you want to compress EVERY possible stream
> > losslessly, you're under the counting argument. So if you want to get rid
> > of it, you'll have to be out of its range.
> >
> > And, about the NO-sayers, anyone with a perpetual movement machine? Every-
> > body says you can't, but if you try and try...
> >
> > --
> > Juan de Dios Santander Vela
> > Ingeniero en Electrónica (Electronics Engineer)
> > España (Spain)
> > e-mail: juand...@airtel.net ICQ: 22903952
> >
> >
>
> Sent via Deja.com http://www.deja.com/
> Before you buy.
--
All a blender transform will let you do is spread that extra pigeon
among several holes, but you will still end up with a quantity of
pigeons > 1 in at least one hole. If you fully apply the blender
transform you will get 1 + 1/n pigeons per hole. ;-). As n approaches
infinity the average number of pigeons per hole approaches 1 but does
not equal 1.
kaptke...@my-deja.com wrote:
>
> > One more, by my count.
> >
>
> Well, let me be that one more but adding that you can fit more pigeons
> per hole if you first apply a blender transform. But then we are
> talking about lossy compression!
--
Form the pigeons I've seen I would think the information capacity of
their brains was one or two bits at most ;-)
PS - What do you think about the possibility of googleflop quantum computers?
>> Yes, I'm absolutely sure I know how to measure a source's entropy.
>> It's a fairly simple theory and formula. Here's a simple example: my
>> source generates numbers 0-255 (bytes) all random, uniform, and
>> independent. The entropy of that source is exactly 8 bits per symbol,
>> and you will never be able to compress it any better than that.
> Gotcha, Hah! Please describe the implementation of your source,
> along with the necessary theoretical or empirical evidence that its
> output is impossible for others to predict.
Huh? I just completely described the source above. You can plug it
into the entropy formula to see that the entropy is indeed 8
bits/symbol. And you can read Shannon's proof that no compressor can
ever beat that. What's possibly left?
> PS - What do you think about the possibility of googleflop quantum
> computers?
Irrelevant. Quantum computers don't change this one iota...
OK, I'll buy your guaranteed random source, please send it COD. Warning,
substantial penalties may apply if it doesn't perform as advertised!
>
> > PS - What do you think about the possibility of googleflop quantum
> > computers?
>
> Irrelevant. Quantum computers don't change this one iota...
Interesting origin of this expression. Has (I think) to do with the added iota
going from the greek "homoousios" (made of the same substance) to homoiousios
(made of a similar substance). When applied to the relationship between Jesus
and God, the early Christian church came to regard the latter as a dangerous
heresy. So however insignificant one iota may seem today, it meant a great
deal to those who were executed because their theoretical constructs
conflicted with the contemporary reality.
-michael
--
Michael A Maniscalco
Visit my homepage on the new M99 data compression scheme at
http://michaelmaniscalco.homepage.com/M99index.htm
In article <3953C26C...@TCE.com>,
> --
> --- Dale King
>
> Recruiters: I am not willing to relocate outside of Indianapolis!
>
Sent via Deja.com http://www.deja.com/
Before you buy.
> What the world needs is new, original thought. Good luck breaking
> the counting argument! Einstein, Newton and all their friends
> will be proud.
Oh, but I am sure Shannon will not be so proud :)-
Cheers,
Nasir
Well, here's the basic approach I'm working on: telling people
to visualize all the possible files *numbered*... numbering the two
one-bit files[1] 0 and 1, the four two-bit files 2, 3, 4 and 5, the
eight three-bit files... well, you get the idea.
Then I'm examining what it means to 'compress' a file: a file
with one number actually takes the place of a file with a lower number
(not all files with lower numbers are shorter files, but all shorter
files must have lower numbers.) You are now telling the decompressor:
"When you see the file whose number is 4,506; actually restore the file
whose number is 789,054."
But, what happens to the file whose number *started out* as
4,506? That file has lost its number, and needs another one... and if
our claim that any file can be compressed is true, it must be an even
lower number... and hopefully anyone really *can* see from here that
sooner or later you wind up having to try and find numbers lower than 0.
Hopefully, it's easier from there to handle claims like
"compresses everything, as long as it's above x number of bytes" (that
identifies a *fixed* number of files that can be evicted from their
numbers; therefore only a *fixed* number of files can be awarded their
places) and the issue of bytestrings rather than bitstrings (bytestrings
are actually worse; if you reduce a bytestring to a bitstring 15 bits
long, that actually evicts *two* bytestrings, not just one...)
-jc
[1] To force people to confront the principle first, I'm starting with
bitstrings rather than bytestrings.
This could take a while, because so much of what is said is
absolutely true, and absolutely meaningless. Itai's statement
that any file can be compressed -- you just need to find the right
algorithm -- is quite true (for any given file the simplest best
algorithm is one that exchanges that file with the zero length
file and leaves other files unchanged) and quite meaningless (what
good is compressing just one possible file?)
The counting argument also has a major flaw. It is again quite
true, but in a sense quite meaningless. The overwhelming majority
of possible files do not exist, and will never exist. A real
compression program need only deal with files that actually do or
will exist -- its behavior on files that will never exist is
irrelevant.
One can argue that almost any sufficiently large actual file is
highly compressible, regardless of how random it might appear to
be. Any such file must have been generated by some process, and
that process is likely to be limited in size. For instance, a
common way to get "random" file for testing is to use a
pseudo-random number generator. But any such file is obviously
compressible down to the size required to describe the algorithm,
and its starting state. A compressed version of the C source for
generating the file is not likely to be very big.
Interesting thought on the counting argument. If you modify it to
count only files that do, or will, (two separate cases) exist,
what sort of limits do you get on compressibility?
--
Tom Gutman
He didn't claim that he actually *had* one, he was just describing
one. Fortunatly, I can supply you with one. Take a look at
http://www.fourmilab.to/hotbits.
If you believe that radioactive decay isn't actually random then feel
free to try to compress this data. Let us know when you give up.
Alan
I've tried compressing 1024 of their bytes, no luck so far. Will keep trying
as the spirit motivates. Be patient, it may take a while. But there *is* a
theoretical bias, as they do discard duplicate decay-time differences. Wonder
why? Mathematical foresight? [evil grin] I think not!
>
> If you believe that radioactive decay isn't actually random then feel
> free to try to compress this data. Let us know when you give up.
Well, of course if it was totally random there would be little point in the
concept of a radioactive half-life. The authors realize this, and even the
layperson reading their explanation of how it works
http://www.fourmilab.to/hotbits/how.html
will wonder about the need for so many experimental corrections (inverting the
computer clock every event!) to what is putatively a random measurement.
[Adopt avatar of modesty] As I understand it, Fermi's theory of beta decay
involved the nuclear electron "trying" to overcome the potential barrier many
times per second, and ultimately succeeding. This gave good fits to existing
data and good quantitative estimates for the half-lives of then-yet
undiscovered nuclei. But IMHO it did beg the question of why all nuclei that
had been created at the same time just didn't simultaneously go poof after a
fixed time interval.
Discarding equal time intervals is to ensure a bias doesn't occur.
--
Samuel S. Paik | http://www.webnexus.com/users/paik/
3D and multimedia, architecture and implementation
In order to violate the counting argument, you have to abandon digital
computation, so perhaps an analog, or a quantum computer can do better.
But, we'll have to wait until analog or quantum computers have the real
power that digital computers do.
Harvey Rook
harve...@hotmail.com
"Itai Bar-Haim" <av...@bezeqint.net> wrote in message
news:3954...@news.bezeqint.net...
> Hi.
> Since Math actually based on definitions, proven to work in many cases
(well
> all of them, actually :-), I wouldn't try to break that theory. If some
> smart greek guys many years ago have decided that 1+1=2, than you can't
tell
> them that you found a way to break their own definition and that from now
on
> 1+1=3. That's why it's definition. What is possible, though, is to create
a
> "newer version of math". You will define your own rules of the new
language
> of numbers, and than everything will work the way you want it.
> I think that theories that are based on definitions are not supposed to be
> broken. Actually, I think they cannot be broken. But pure theories based
on
> logical thinking only can be broken.
> The theory of data compression is based on logical thinking only, and many
> of the compression algorithms are not based on math at all. That's why I
> believe that the compression theory that says that infinate compression is
> impossible is wrong. It is logical, but yet, I think it's wrong. That's
why
> I'm going to try to prove that it is wrong, by developing an algorithm
that
> will actually compress any data whatsoever (probably under minimum block
> size restrictions, so what?).
> If anyone think the same and want to try to help me, he/she may contact
me,
> and I'll be glad to discuss this issue.
>
> Itai.
>
>
>
> Alan Morgan <amo...@Xenon.Stanford.EDU> wrote in message
> news:8j03u9$d2k$1...@nntp.Stanford.EDU...
> > In article <slrn8l613a.t6...@jalua.ch.genedata.com>,
> > Phil Norman <Phil....@genedata.com> wrote:
> > >On Thu, 22 Jun 2000 20:39:32 GMT,
> > > mikael.l...@spray.se <mikael.l...@spray.se> wrote:
> > >>How true it is what you are saying ...
> > >>I almost suspect that when people come with a lot of difficult theory
to
> > >>explain why it's impossible to solve this or that problem, they
actually
> > >>really don't have a clue. The fact remains, the most powerful tool you
> > >>have is your mind. If your attitude is that nothing is impossible and
> you
> > >>try and try again, NEVER EVER giving up, then the problem is destined
to
> > >>be solved no matter what the NO-sayers tries to tell you!
> > >>Where would we be if not Edison, Marconi or Einstein, with this
> attitude,
> > >>had not done what they did?
> > >
Mikael
In article <slrn8l613a.t6...@jalua.ch.genedata.com>,
Phil....@genedata.com (Phil Norman) wrote:
> On Thu, 22 Jun 2000 20:39:32 GMT,
> mikael.l...@spray.se <mikael.l...@spray.se> wrote:
> >How true it is what you are saying ...
> >I almost suspect that when people come with a lot of difficult
theory to
> >explain why it's impossible to solve this or that problem, they
actually
> >really don't have a clue. The fact remains, the most powerful tool
you
> >have is your mind. If your attitude is that nothing is impossible
and you
> >try and try again, NEVER EVER giving up, then the problem is
destined to
> >be solved no matter what the NO-sayers tries to tell you!
> >Where would we be if not Edison, Marconi or Einstein, with this
attitude,
> >had not done what they did?
>
> Absolutely. I, personally, have been trying to get 1+1 to equal 3
> for a good number of years now. I've not managed it yet (when I
> do, I'll post the results on http://inkvine.fluff.org/~forrey/),
> but *I'm* certainly not giving up just because mathematicians say
> it's impossible. What do they know? They're just constrained by
> all the prejudices they've been taught during their studies.
>
> What the world needs is new, original thought. Good luck breaking
> the counting argument! Einstein, Newton and all their friends
> will be proud.
>
> Cheers,
> Phil
I'd actually be surprised if you could.
Mikael
In article <8j08t9$g6p$2...@hermes.acs.unt.edu>,
s...@nospam.unt.edu wrote:
> mikael.l...@spray.se wrote:
>
> > I know what entropy means. "Disorder" or "randomness". I should
have used
> > the word "statistics" instead, I was thinking about e.g. huffman
coding.
> > And it's easy to pack a file better than with huffman only, if we
use
> > "bwt, ppm or ctw as preprocessor" isn't it?
>
> But an order-0 Huffman coder is not a universal compressor. The fact
> that it can be beat isn't a surprise to anyone (in fact, Shannon's
> original paper defining entropy talks about this).
>
> > I think this also very clearly demonstrates the difference between
> > scientists like you and engineers like me. For me it's not a lot of
> > theory, math or carefully laid up proofs of something.
>
> But don't you think you'd be better off if you understood the theory?
> I'm not saying everyone has to be a scientist and work on new
> theories, but you need to understand what's there. An engineer
> shouldn't just try random stuff in the hopes that something will
> work. You need to be strong in the foundations if you expect to make
> progress.
>
> > Are you really sure that YOU know how to measure the "source's"
entropy?
> > If you don't, then what is all this theory worth?
>
> Yes, I'm absolutely sure I know how to measure a source's entropy.
> It's a fairly simple theory and formula. Here's a simple example: my
> source generates numbers 0-255 (bytes) all random, uniform, and
> independent. The entropy of that source is exactly 8 bits per symbol,
> and you will never be able to compress it any better than that.
>
> --
> Steve Tate --- srt[At]cs.unt.edu | Gratuitously stolen quote:
> Dept. of Computer Sciences | "The box said 'Requires Windows
95, NT,
> University of North Texas | or better,' so I installed Linux."
> Denton, TX 76201 |
>
The compression can't get a better result than a theoretical value of
information.
Rather worse, because the practical compress algorithm don't get
the theoretical effectivity.
Partially you have the truth. Any (any!) variations of data (your 4096
bites)
can get the compressed size under 817 bits.
E.g. first 128 bits set, next (4096-128) bits non-set, etc., this is a
very cool
variation for compression.
But we must solve the max. post-compressed size for a most inefficient
variations of
data!
Harry
There is a body of mathematical theory called "information theory"
which assigns definitions to the terms "source", "message", "model",
"entropy", and derives a number of interesting and fundamental results,
including Shannon's noiseless source coding theorem. In this theory, the
elements of the Calgary corpus are not sources, but are specific messages.
So you are asking a meaningless question--however, this is somewhat
understandable since people are often a bit loose and talk about the
entropy of messages without reference to a model.
You seem to have had a smattering of information theory without really
grasping some of its consequences, Itai Bar-Haim obviously does not have
any understanding.
A good example is this statement of yours:
And it's easy to pack a file better than with huffman only, if we use
"bwt, ppm or ctw as preprocessor" isn't it?
Mikael
In article <395561F0...@webnexus.com>,
> You didn't answer the question!
> Do you have an exact method to measure the entropy of a given source
> that actually CAN be compressed? Like Calgary corpus geo or news?
The Calgary Corpus is not a source, it's a set of files.
Here's a source that I can measure the entropy of and compress. I'm
going to generate bytes, with a geometric distribution: probability
of value 0 is 1/2, probability of value 1 is 1/4, value 2 is 1/8,
and so on, until the probability of value 254 is 1/2^255 and the
probability of value 255 is also 1/2^255 (so everything adds up to 1).
That's a source, and the entropy is almost exactly 2 bits per symbol.
I can also encode (or compress) this to exactly entropy -- in this
case it's very easy: order-0 Huffman coding works perfectly here,
since the probabilities are all powers of 1/2.
So now I've given you two different sources that I've calculated the
entropy for. The first was incompressible, and the second was
compressible with a 4:1 ratio.
Mikael
In article <8j5ipd$m9c$1...@hermes.acs.unt.edu>,
Just wondering, but why do you need to ensure against bias if your
source of data is a random nuclear decaying source?
Incidentally, Mr Kopf, yes there will be a shift of the
distribution of time-differences with time due to the half-life,
but I imagine in this case a source has been chosen which has a
half-life so huge that it won't really affect the time differences
in any noticable way when measured in the order of a few days.
Cheers,
Phil
--
:wq
Well, unless you count ready blended pigeon. I'm sure there are a
number of supermarkets who'd be interested.
Cheers,
Phil
This is something that's been puzzling me. Is 'prediction' really
viewed as a separate type of compression? It would seem to me
that prediction can take place only as a result of knowledge of
what is likely to occur in the future of the input stream.
However, this knowledge is going to be either knowing the
distribution of occurrences of symbols, or knowing what kind of
strings of symbols are likely to occur (so either statistical or
dictionary). Or both, of course.
In fact, I'd be inclined to think that prediction isn't really a
method of compression of itself at all, but is simply something
one does while trying to compress. In order to compress some
data in the future, one must construct a mapping from the possible
set of inputs to the possible set of outputs which favours what we
currently view as predictably more likely. This is what all the
compressors I've come across effectively do.
Or is there some specific definition of what predictive
compression means which I've completely missed?
Cheers,
Phil
--
:wq
That wasn't irony, that was sarcasm.
>It IS just plain stupid to try what is so easily proven impossible.
Thankyou. Now, go read the counting argument and notice how
easily proven it is.
And yes, computing (and compression) is based on mathematics.
Cheers,
Phil
--
:wq
As far as you new math, you can certainly come up new symbology that
reuses the symbols in other ways. Just like you can translate words into
other languages. But this does not change the physical universe, merely
the words or symbols you use to describe it. 1+1=3 is valid if any of
the symbols changes meaning from the standard mathematical definitions.
No, everything will still work the same way, you just describe it
differently.
> I think that theories that are based on definitions are not supposed to be
> broken. Actually, I think they cannot be broken. But pure theories based on
> logical thinking only can be broken.
You are confusing two similar sounding words:
theory - general principles drawn from any body of facts; a plausible or
scientifically acceptable general principle offered to explain observed
facts; hypothesis, guess.
theorem - a statment in mathematics that has been or is to be proved; an
idea accepted or proposed as a demonstrable truth.
Theories often are proven wrong. But the counting argument is a theorem,
not a theory. It is easily proven true.
> The theory of data compression is based on logical thinking only, and many
> of the compression algorithms are not based on math at all. That's why I
> believe that the compression theory that says that infinate compression is
> impossible is wrong. It is logical, but yet, I think it's wrong. That's why
> I'm going to try to prove that it is wrong, by developing an algorithm that
> will actually compress any data whatsoever (probably under minimum block
> size restrictions, so what?).
Once again, don't confuse information theory from the theorems that it
is made up of.
If you succeed you could easily become rich enough to buy the planet.
And once you succeed, then you can set to work on creating that
perpetual motion machine.
On a more serious note, back to your original question which was how to
compress blocks of 4096 bits which have exactly 128 bits set to less
than 500 bits. It has already been shown that there are approximately
2^817 such files. There are approximately 2^500 files whose size is less
than 500 bits. If it were possible to compress all those input files to
less than 500 bits, it is obvious that you must compress more than one
of your input files to the same compressed file (in fact on average you
would compress 2^317 files to each of the possible compressed files).
Now look at the decompression side. How can you correctly map from 2^500
input files to all 2^817 output files?
That is the pigeon-hole principle: If A and B are finite sets and |A| >
|B|, then there is no one-to one function. Here A is the set of all
files with 4096 bits of which 128 are set. |A| = 2^817. B is set of all
files whose size is <= 500 bits. |B| = 2^500. In doing compression you
are trying to come up with a function f(a) = b, where a is in A and b is
in B. For lossless compression f must be invertible as f' in that f'( f(
a ) ) = a. But because |A| > |B| you cannot escape that there must be
some case where f(a1) = f(a2) when a1 != a2. This leads you to:
a1 = f'( f(a1) )
a1 = f'( f(a2) )
a1 = a2
But this contradicts the statement above that a1 != a2. Therefore any
such function you come up with is not invertible. You basically have a
lossy compressor.
> But you have only considered the statistics of the symbols (I think)!
> If we leave the word "entropy" and use "possible compression" instead
> then statistics is easily proven not enough, we know several other
> methods to compress! I conclude that science has no method to beyond
> any doubts to show where the theoretical limit is for a given file (at
> least I have not seen any), without preinstalled dictionaries.
Well, you're the one who brought up the word "entropy", so that's what
I was talking about.
If you want to use "possible compression" instead, particularly for a
particular file instead of a probabilistic source, then you're looking
at something entirely different. And there is a perfect answer to
questions about maximum possible compression, and that answer is
Kolmogorov complexity (there's are several books on the topics of
Kolmogorov complexity if you're interested). Of course, there's a
downside: while we can define what we mean by the "best possible
compression", it turns out that it's uncomputable -- there's no single
algorithm that can compute the best possible compression for all files
(or even a significant fraction of files), even if you had all the
time in the world (in other words, it's a computability problem, not a
complexity problem). So one step forward in theory, and an infinite
leap backwards in practicality.
No the counting argument has no flaw. Let's look at it from the
pigeon-hole priniciple which says if you have 2 sets A and B and the
number of elements in set A is larger than the number of elements in set
B, you cannot have a function f(a) = b to map elements from set A to set
B such that f(a1) != f(a2) for all a1 and a2 in A. Or alternatively
there must be an a1 and an a2 in A such that f(a1) == f(a2).
No your saying that some of the files don't exist. All that say is the
size of A is not as large as it would appear.
But this is all handled very well in information theory. You don't even
have to go so far as to say those files won't eixst. You can just say
that they aren't as likely. Entropy of a source looks at the
probabilities of the source and even if all 2^n are possible but some
are more likely than others you can still compress ON AVERAGE better
than n bits. Of course, we haven't escaped the counting argument, those
rare messages will require more than n bits, but more likely messages
will hopefully require less.
> One can argue that almost any sufficiently large actual file is
> highly compressible, regardless of how random it might appear to
> be. Any such file must have been generated by some process, and
> that process is likely to be limited in size. For instance, a
> common way to get "random" file for testing is to use a
> pseudo-random number generator. But any such file is obviously
> compressible down to the size required to describe the algorithm,
> and its starting state. A compressed version of the C source for
> generating the file is not likely to be very big.
But the question is can you necessarily specify the process in less
bytes than the process outputs. Not in all cases.
And the biggest problem for compression is that want a general purpose
compressor to be able to try and guess what the process is without any
prior knowledge of it. If you know the process ahead of time then it is
easier to devise a compression scheme. It is not too difficult to devise
simple sources that follow easy to express patterns that are nearly
impossible to detect.
> Interesting thought on the counting argument. If you modify it to
> count only files that do, or will, (two separate cases) exist,
> what sort of limits do you get on compressibility?
You get a lower entropy limit, because you are reducing the entropy of
your source.
You can theoretically map the first 2^500 files to any of 2^817 output files.
Then, if anybody is still alive to care, you'd best get outta town.
You need to look again at my second sentence. I agree that the
counting argument is true. Perhaps I need to qualify my original
statement a bit more -- the counting argument is flawed as applied
to the compression issues.
>
> No your saying that some of the files don't exist. All that say
is the
> size of A is not as large as it would appear.
And that is exactly the flaw. The counting argument says that you
have 2^N files of length N, but only (2^N)-1 files of length <N,
and so you cannot compress all files of length N. But if there
are actually only 2^n files of length N, with n<<N, then the
counting argument allows you to compress all actual files of
length N down to n or less bits.
That is not to say that it is possible to do such a
compression -- only that the counting argument does not exclude
it. Perhaps a more complex argument, based on the decidability of
the existence of a file, would demonstrate the impossibility But
such arguments are considerably more difficult to develop, and,
being difficult to understand, likely to be less convincing.
>
> But this is all handled very well in information theory. You
don't even
> have to go so far as to say those files won't eixst. You can
just say
> that they aren't as likely. Entropy of a source looks at the
> probabilities of the source and even if all 2^n are possible but
some
> are more likely than others you can still compress ON AVERAGE
better
> than n bits. Of course, we haven't escaped the counting
argument, those
> rare messages will require more than n bits, but more likely
messages
> will hopefully require less.
Sure, and that is what makes compression useful for its common
use -- cutting down average storage requirements and transmission
times. But then you run into the guy whose brand new device has
limited (storage space, bandwidth, ...) and he therefore needs an
algorithm that guarantees a particular level of compression. So
somebody trots out the counting argument to show that that is not
possible. But the guy has an intuitive feel, which he cannot
properly state, that there aren't *really* all that many files and
that the counting argument somehow doesn't really apply. And so
the argument meanders on and on and ....
>
> > One can argue that almost any sufficiently large actual file
is
> > highly compressible, regardless of how random it might appear
to
> > be. Any such file must have been generated by some process,
and
> > that process is likely to be limited in size. For instance, a
> > common way to get "random" file for testing is to use a
> > pseudo-random number generator. But any such file is
obviously
> > compressible down to the size required to describe the
algorithm,
> > and its starting state. A compressed version of the C source
for
> > generating the file is not likely to be very big.
>
> But the question is can you necessarily specify the process in
less
> bytes than the process outputs. Not in all cases.
No -- hence the qualification fo being sufficiently large. A 100
byte file is quite likely to have a source that is larger than the
file. A 1,000,000,000 byte file is unlikely to have such a source
.
But the issue of source size is one that could use a bit more
attention. There was a message elsewhere in this thread where
somebody was claiming to be adding a high degree of randomness
with just a few simple lines of code. Impossible -- at most what
was added was pseudo-randomness, and the results are still highly
compressible (in principle).
>
> And the biggest problem for compression is that want a general
purpose
> compressor to be able to try and guess what the process is
without any
> prior knowledge of it. If you know the process ahead of time
then it is
> easier to devise a compression scheme. It is not too difficult
to devise
> simple sources that follow easy to express patterns that are
nearly
> impossible to detect.
Quite. The difference between "here is a proof of the existence
of a solution to this problem" and "here is a proof of the
existence of an algorithm to calculate a solution to this problem"
and "here is an algorithm to calculate a solution to this
problem".
>
> > Interesting thought on the counting argument. If you modify
it to
> > count only files that do, or will, (two separate cases) exist,
> > what sort of limits do you get on compressibility?
>
> You get a lower entropy limit, because you are reducing the
entropy of
> your source.
--
Tom Gutman
More or less agreed, for practical coding, the counting argument is
irrelevent (but not FLAWED). However, we do get kooks from time
to time in this newsgroup who are looking for or claim to be able
to reduce the size of *all* files. [Itai and Mikael are this month's
examples...]
I apologize, you irked me by belittling me in the "Very Specific
Compression Problem" thread after I had spent more time than
I should have in analyzing the poster's problem and your proposed
solution.
> I did certainly NOT say anything about everything being compressible!
> I really don't think Itai did either. It's just too stupid.
Itai most certainly did in <3954...@news.bezeqint.net>:
That's why I believe that the compression theory that says that
infinate compression is impossible is wrong. It is logical, but yet,
I think it's wrong. That's why I'm going to try to prove that it is
wrong, by developing an algorithm that will actually compress any
data whatsoever (probably under minimum block size restrictions,
so what?).
Mikael
I understood your second sentence perfectly well. There is no flaw in
the counting argument. Some people misapply it, that is where the flaw
is.
> > No your saying that some of the files don't exist. All that say
> is the
> > size of A is not as large as it would appear.
>
> And that is exactly the flaw. The counting argument says that you
> have 2^N files of length N, but only (2^N)-1 files of length <N,
> and so you cannot compress all files of length N. But if there
> are actually only 2^n files of length N, with n<<N, then the
> counting argument allows you to compress all actual files of
> length N down to n or less bits.
Counting argument works just fine here. That is because it is concerned
with the number of input files not the size of the files. And the
counting argument says you have 2^n input files and you cannot compress
all of them below 2^n bits, because that would be the entropy limit of
that source, provided that all input files are equally likely.
> > But this is all handled very well in information theory. You
> don't even
> > have to go so far as to say those files won't eixst. You can
> just say
> > that they aren't as likely. Entropy of a source looks at the
> > probabilities of the source and even if all 2^n are possible but
> some
> > are more likely than others you can still compress ON AVERAGE
> better
> > than n bits. Of course, we haven't escaped the counting
> argument, those
> > rare messages will require more than n bits, but more likely
> messages
> > will hopefully require less.
>
> Sure, and that is what makes compression useful for its common
> use -- cutting down average storage requirements and transmission
> times. But then you run into the guy whose brand new device has
> limited (storage space, bandwidth, ...) and he therefore needs an
> algorithm that guarantees a particular level of compression. So
> somebody trots out the counting argument to show that that is not
> possible. But the guy has an intuitive feel, which he cannot
> properly state, that there aren't *really* all that many files and
> that the counting argument somehow doesn't really apply. And so
> the argument meanders on and on and ....
Once again, that is misapplying the counting argument. The counting
argument doesn't take into account variations in probability of
messages. Perhaps that is what you are calling the flaw. But information
theory has entropy to handle the probabilities.
> > But the question is can you necessarily specify the process in
> less
> > bytes than the process outputs. Not in all cases.
>
> No -- hence the qualification fo being sufficiently large. A 100
> byte file is quite likely to have a source that is larger than the
> file. A 1,000,000,000 byte file is unlikely to have such a source
> .
How about a 1,000,000,000 byte file that is the compressed version of
2,000,000,000 byte file.
> > And the biggest problem for compression is that want a general purpose
> > compressor to be able to try and guess what the process is without any
> > prior knowledge of it. If you know the process ahead of time then it is
> > easier to devise a compression scheme. It is not too difficult to devise
> > simple sources that follow easy to express patterns that are nearly
> > impossible to detect.
>
> Quite. The difference between "here is a proof of the existence
> of a solution to this problem" and "here is a proof of the
> existence of an algorithm to calculate a solution to this problem"
> and "here is an algorithm to calculate a solution to this
> problem".
I think somehow one could use diagonalization to prove that such an
algorithm cannot exist, just as is done with the halting problem. For
those unfamiliar with this the idea for the halting problem is:
- Assume the existence of an algorithm HALTS( P, X ) that accepts a
program P and input X and returns a boolean result if the program P will
halt on input X.
- We create a program diagonal( X ) that is:
if HALTS( X, X ) then loop forever else halt
- Call diagonal on itself, i.e. diagonal( diagonal )
You then have a paradox, which leads you to the conclusion that HALTS
cannot exist. I'm sure the same thing can be done to show that you
cannot have an algorithm that can devise another algorithm by its
output.
It's fairly simple. The theoretical limit for compression of a single
file is basically 100% compression. For any given file there is some
compression algorithm that will compress that file to 1 bit. Of course
it will not do so well on all other files and if you average the amount
of compression over all possible input files you will have a net loss.
Once again you have to talk about the source, not about individual
files.
While on the way home last night, I think I have come up with a very
good analogy that makes the couting argument much clearer. The analogy
is the telephone system and telephone numbers.
Imagine we have a large city or state and we want to assign telephone
numbers to 20 million homes or businesses, etc. Just assume that we want
20 million telephones to be individually callable. Can we do this using
the typical US 7-digit telephone number? Using only 7-digit phone
numbers only gives us 10^7 or 10 million unique telephone numbers. It is
therefore impossible to map all 20 million telephone destinations on to
only 10 million telephone numbers. If you do so, you will be giving the
same phone number to more than one person. This is the counting argument
or the pigeon-hole principle in action.
So the solution used in the US is to use area codes which are three
digits. Therefore each telephone destination in the US is uniquely
identified using a 10 digit number. So theoretically you could have 10
billion phone numbers all of which are 10 digits. But that is not where
it stops. It would be cumbersome to always have to dial all 10 digits,
considering how most of the calls you make are local and will have the
same area code. Therefore they add some "compression". For calls within
the same area code you can simply dial the 7 digit number. At first that
seems like some savings. For 10 million numbers you save 3 digits. That
saves 30 million digits. But what is the consequence. For any non-local
number you have to dial a 1 before the 10 digit number. So for
9,990,000,000 telephone numbers you now have to dial an extra digit.
Averaged over all telephone numbers you have a net loss. And you also
don't get the original 30 million savings because you can't use any of
the 1 million phone numbers that begin with 1. But it is still a good
compression because of the characteristics of the source are such that
local numbers are more likely. But what this shows is that while you can
compress some files below the entropy limit of a source, that requires
expanding others.
And the analogy works with the concept of prefix codes. Let's look at
just within an area code. Let's say we want to go below 7 digits for
some numbers. Could we have a service where we charge a premium for
users to get shorter phone numbers. We could have 10 people that single
digit phone numbers, 100 people get 2 digit numbers, and so on. In some
sense this is a real scenario with things like dialing 911 for
emergencies. But this doesn't work because telephone numbers must be
prefix codes. We can't have one person with the telephone of 32 and
someone else with a phone number of 321 because if we see 3-2 whether to
call the first person or wait on more digits. For the case of 911 we
basically can't use any of the numbers that start with 911.
I'm sure someone would likely point out that we could have the person
hit # to signal that those are all the digits and go ahead and dial.
First this cuts down on the compression since once again you are adding
a key. Secondly, that is cheating because you changed the parameters of
the problem by changing the alphabet of the output. If you allow # to be
part of the output you would do much better allowing it in any location
rather than just the last one.
The only difference from compression is that you are working with only 0
and 1 instead of 0 through 9. So think of Itai's problem as trying to
assign 2^500 phone numbers to 2^817 people. It cannot be done without
losing the ability to call 2^817-2^500 of the people.
Samuel Paik <pa...@webnexus.com> wrote in message
news:3957EC5C...@webnexus.com...
> Tom Gutman wrote:
> > You need to look again at my second sentence. I agree that the
> > counting argument is true. Perhaps I need to qualify my original
> > statement a bit more -- the counting argument is flawed as applied
> > to the compression issues.
>
> More or less agreed, for practical coding, the counting argument is
> irrelevent (but not FLAWED). However, we do get kooks from time
> to time in this newsgroup who are looking for or claim to be able
> to reduce the size of *all* files. [Itai and Mikael are this month's
> examples...]
>
An extreme case would be partitioning into bytes without order information,
a.k.a the source histogram. Thousands of files (i.e. bytes) could be contained
in 256 bins. Given a bin I could tell which files were in it, and even how
many there were, but would not be able to say *when* the files sere sent.
A more complicated example would involved storing, say, 1025 byte files in
1024 byte sectors, where the files are sorted by the discarded first byte,
such that the sector number could be indexed through a table to recover that
byte. This is no cheat, it would allow compression of files from a completely
random source, because the source entropy is reduced by the subsequent file
sorting operation.
Methinks the same consideration applies to pigeonholes; all the pigeons
without beaks could be put in head-first; all the others could be debeaked and
put in facing out (yuk, let's do that the other way around!) so all the
pigeonholes could be downsized by a beak-length allowing more holes per unit
volume. (Well, if nothing else, this example might suggest an essential
difference between pigeons and information)
Itai Bar-Haim wrote:
[compression of *all files*]
> I estimate roughly the value of such algorithm in about 50 billion dollars
> US. After all, it does solve the world's data transfer and storage
> problems...
> Now: with such great sum of money in mind, wouldn't you try to achive this
> goal?
> :-)
Well, one could first try:
1 + 1 = 50 billion (dollars)
(you have a very, very slight chance to get 50$ billion by investing 2$ -
but there isn't however any chance to get a working algorithm of what you
propose :-)
--
Edgar
Mikael
In article <3959...@news.bezeqint.net>,
"Itai Bar-Haim" <av...@bezeqint.net> wrote:
> Just a question:
> Suppose I do have (and I don't, and probably won't have in the near time :-)
> a compression algorithm that can compress ALL files. How much would you pay
> for such algorithm?
> I estimate roughly the value of such algorithm in about 50 billion dollars
> US. After all, it does solve the world's data transfer and storage
> problems...
> Now: with such great sum of money in mind, wouldn't you try to achive this
> goal?
> :-)
>
> Samuel Paik <pa...@webnexus.com> wrote in message
> news:3957EC5C...@webnexus.com...
> > Tom Gutman wrote:
> > > You need to look again at my second sentence. I agree that the
> > > counting argument is true. Perhaps I need to qualify my original
> > > statement a bit more -- the counting argument is flawed as applied
> > > to the compression issues.
> >
> > More or less agreed, for practical coding, the counting argument is
> > irrelevent (but not FLAWED). However, we do get kooks from time
> > to time in this newsgroup who are looking for or claim to be able
> > to reduce the size of *all* files. [Itai and Mikael are this month's
> > examples...]
> >
> > --
> > Samuel S. Paik | http://www.webnexus.com/users/paik/
> > 3D and multimedia, architecture and implementation
> >
> >
>
>
Mikael
In article <3958EF04...@TCE.com>,
It is slightly imprecise in terms of mathematics. The original problem was
basically a source that emitted a "message" or "file" of 4096 bits (128 of
which are set). It is convenient to think of the 2^817 possible files to be
compressed.
There is no order to keep track of. We are given one of these messages and
told to compress it.
>Methinks the same consideration applies to pigeonholes; all the pigeons
>without beaks could be put in head-first; all the others could be debeaked
and
>put in facing out (yuk, let's do that the other way around!) so all the
>pigeonholes could be downsized by a beak-length allowing more holes per
unit
>volume. (Well, if nothing else, this example might suggest an essential
>difference between pigeons and information)
Once again, the pigeon-hole principle does not say that the holes can hold
only one pigeon. Your holes could be the size of Yankee stadium. But if
you've got more pigeons than holes, you have to put more than one pigeon in
a hole. The application to compression is that as part of the decompression
you want to request a certain pigeon by its hole number. If there are 2
pigeons in the hole you don't know which pigeon is the correct one. In other
words if two source files compress to the same compressed file you cannot
decompress them uniquely. You will get one file or the other. It has already
been arguee that you could get both and decide which is the correct one, but
if one is disallowed, then it really was not a legal message from your
D.A.Kopf wrote in message <3957C79E...@dakx.com>...
Sure, there is still work to be done on compressors, but that work will
necessarily come down to trying to match compression and decompression to
various sources. Consider this, if we knew exactly the entropy of various
sources could you design a general purpose compressor and decompressor such
that you feed that entropy information along with an input file to the
compressor and be able to compress the information from all those sources to
their entropy limit? Note the only input to the decompressor is the output
of compressor. The answer is no, because of the counting argument. It is
easy to prove. Consider a source that can send a series of symbols A and B
with equal probability. It's entropy limit is 1 bit per symbol and we can
easily compress that as 0 means A and 1 means B. But if we have another
source that is the same except that it sends C and D. It has the same
entropy limit of 1 bit per symbol. But if we could compress both to their
entropy limit a decompressor has no way to distinguish which source it was
from. Any information about the source in the compressed output means we
can't compress to the entropy limit.
So the bottom line is BWT et al. may be allowing us to get better on certain
sources, but they cannot get better on all sources.
--
-- Dale King
mikael.l...@spray.se wrote in message <8jb8jv$1tu$1...@nnrp1.deja.com>...
Mikael
In article <3959...@news.tce.com>,
"Dale King" <Ki...@TCE.com> wrote:
> But we know an awful lot about the limits. We know about the entropy of a
> source. Unfortunately, many sources are quite complex.
>
> Sure, there is still work to be done on compressors, but that work will
> necessarily come down to trying to match compression and decompression to
> various sources. Consider this, if we knew exactly the entropy of various
> sources could you design a general purpose compressor and decompressor such
> that you feed that entropy information along with an input file to the
> compressor and be able to compress the information from all those sources to
> their entropy limit? Note the only input to the decompressor is the output
> of compressor. The answer is no, because of the counting argument. It is
> easy to prove. Consider a source that can send a series of symbols A and B
> with equal probability. It's entropy limit is 1 bit per symbol and we can
> easily compress that as 0 means A and 1 means B. But if we have another
> source that is the same except that it sends C and D. It has the same
> entropy limit of 1 bit per symbol. But if we could compress both to their
> entropy limit a decompressor has no way to distinguish which source it was
> from. Any information about the source in the compressed output means we
> can't compress to the entropy limit.
>
> So the bottom line is BWT et al. may be allowing us to get better on certain
> sources, but they cannot get better on all sources.
> --
> -- Dale King
I've noticed this, too. *cough* Very hard.
>Seriously, with A and B in a file by the same amount e.g. ABABABAB ...
>PPM or BWT would produce a long, long sequence of zeros. Use runlength-
>coding on that along with a huffman- or arithmetic coder and I promise
>you a lot less than 1 bit/symbol.
Wonderful. New compression algorithm for any kind of data.
Step 1: Read in each bit of the input, and convert it to the byte
'A', or 'B', depending on whether it's a 0 or a 1.
Step 2: Use PPM or BWT to produce a long stream of zeroes.
Step 3: Use RLE with huffman or arithmetic coding for your
guaranteed compression rate of under 1 bit per symbol.
Do please think before you make such promises - you've just
claimed something which 'breaks' the counting argument ;-).
Cheers,
Phil
--
:wq
Yes, we want to compress "real files". So why are we obsessed with the
counting
argument? Mainly because on a 4 week cycle, someone wants to use
recursive/iterative
compression. The "idea" goes (stripped down):
I have a very simple compression algorithm that will compress any file
to 99% of its size
But I can apply it to it's own output.
(the commonest refinement of this "hides" information in the count of
howmany
time this process has been applied)
The actual techniques used are normally "complex" (dare one say
obfuscated?).
The easiest way to show that they are rubbish (without going down the
highways and byways of whatever hand-waving buzzwords the proposal uses)
is the counting argument. Which is why we consider it important, and
relevant.
BugBear
I get it now! These 2^817 possible files are an abstraction that allows a
rigorous assertion of how many bits it must take to represent the one file
that actually exists?
<snip>
>
> Once again, the pigeon-hole principle does not say that the holes can hold
> only one pigeon. Your holes could be the size of Yankee stadium. But if
> you've got more pigeons than holes, you have to put more than one pigeon in
> a hole. The application to compression is that as part of the decompression
> you want to request a certain pigeon by its hole number. If there are 2
> pigeons in the hole you don't know which pigeon is the correct one. In other
> words if two source files compress to the same compressed file you cannot
> decompress them uniquely. You will get one file or the other. It has already
> been arguee that you could get both and decide which is the correct one, but
> if one is disallowed, then it really was not a legal message from your
> source.
But if the number of pigeonholes could be doubled (e.g. by size sorting)
without changing the volume of the dovecote, isn't that compression? You seem
to be retaining an aspect of the "addressibility" of a message that is not
explicitly stated.
No fair! You have to first give me the files you want to compress! [As I
recall, the original poster said there were some blocks (s)he wanted to compress...]
paul womack <bug...@NOSPAMmediabridge.net> wrote in article
<3959CCAA...@NOSPAMmediabridge.net>...
> mikael.l...@spray.se wrote:
> >
> > Yes, it's true that random data are very hard to compress ...
> The easiest way to show that they are rubbish (without going down the
> highways and byways of whatever hand-waving buzzwords the proposal uses)
> is the counting argument. Which is why we consider it important, and
> relevant.
>
> BugBear
>
Is it not true that within a sufficiently long sample output from a random
source one could find a sequence that could be compressed by *any* method?
even given the overhead of the offset to that sequence and the length?
While impractical due to the likely size of the sample, this suggests there
may be a theoretical exception to the counting argument. I of course could
be way off base here (have been before).
Just a thought
Tim
Mikael
In article <slrn8lj6ii.io...@jalua.ch.genedata.com>,
Phil....@genedata.com (Phil Norman) wrote:
> On Wed, 28 Jun 2000 05:11:04 GMT,
> mikael.l...@spray.se <mikael.l...@spray.se> wrote:
> >Yes, it's true that random data are very hard to compress ...
>
> I've noticed this, too. *cough* Very hard.
>
> >Seriously, with A and B in a file by the same amount e.g. ABABABAB ...
> >PPM or BWT would produce a long, long sequence of zeros. Use runlength-
> >coding on that along with a huffman- or arithmetic coder and I promise
> >you a lot less than 1 bit/symbol.
>
> Wonderful. New compression algorithm for any kind of data.
>
> Step 1: Read in each bit of the input, and convert it to the byte
> 'A', or 'B', depending on whether it's a 0 or a 1.
>
> Step 2: Use PPM or BWT to produce a long stream of zeroes.
>
> Step 3: Use RLE with huffman or arithmetic coding for your
> guaranteed compression rate of under 1 bit per symbol.
>
> Do please think before you make such promises - you've just
> claimed something which 'breaks' the counting argument ;-).
>
> Cheers,
> Phil
>
> --
> :wq
>
Mikael
> Yes, we want to compress "real files". So why are we obsessed with the
> counting
> argument? Mainly because on a 4 week cycle, someone wants to use
> recursive/iterative
> compression. The "idea" goes (stripped down):
>
> I have a very simple compression algorithm that will compress any file
> to 99% of its size
> But I can apply it to it's own output.
>
> (the commonest refinement of this "hides" information in the count of
> howmany
> time this process has been applied)
>
> The actual techniques used are normally "complex" (dare one say
> obfuscated?).
> The easiest way to show that they are rubbish (without going down the
> highways and byways of whatever hand-waving buzzwords the proposal uses)
> is the counting argument. Which is why we consider it important, and
> relevant.
>
> BugBear
>
Mikael
In article <01bfe104$704126e0$4b51143f@tim>,
"Tim" <ther...@earthlink.net> wrote:
>
>
> paul womack <bug...@NOSPAMmediabridge.net> wrote in article
> <3959CCAA...@NOSPAMmediabridge.net>...
> > mikael.l...@spray.se wrote:
> > >
> > > Yes, it's true that random data are very hard to compress ...
>
> > The easiest way to show that they are rubbish (without going down the
> > highways and byways of whatever hand-waving buzzwords the proposal uses)
> > is the counting argument. Which is why we consider it important, and
> > relevant.
> >
> > BugBear
> >
>
> Is it not true that within a sufficiently long sample output from a random
> source one could find a sequence that could be compressed by *any* method?
> even given the overhead of the offset to that sequence and the length?
> While impractical due to the likely size of the sample, this suggests there
> may be a theoretical exception to the counting argument. I of course could
> be way off base here (have been before).
>
> Just a thought
>
> Tim
>
Oh, my appologies. When you said "a stream of As and Bs, eg:
'ABABAB'" (or something very similar), I assumed that was just an
example of a stream of As and Bs, and that you weren't specifying
any particular pattern. According to your words, you did actually
guarantee compression of any stream of As and Bs, not just the
particular one you gave as an example.
>I don't think you have understood the counting argument either. It simply
>says that if a compressed file is smaller than the uncompressed original
>and the compressed file must decompress to one unique original then
>according to (unbreakable according to Tate) mathematical laws there are
>less possible combinations in a smaller file, why it's impossible to
>compress *all* files. Just think about it!
I'm quite aware of what the counting argument is, how it works,
why it works, and that it indeed is defined by unbreakable
mathematical laws.
>It does not say however that it's impossible to compress one file with
>stunning results!
This is patently obvious to anyone who's ever tried compressing
large files of zeroes.
Cheers,
Phil
--
:wq
And when people turn up with ideas about amazing compression
algorithms, they do initially tend to get gently brought down with
news of the counting argument by people who spend a lot of time
and patience explaining it. However, depressingly often the
response to this is "Oh, you're all blinded by your beliefs -
infinite compression must exist!".
>Even if the history of human kind is full of war and violent
>death, we're still good at cooperation, aren't we?
Yes - sometimes we team up in order to impose war and violent
death on a larger scale.
Cheers,
Phil
--
:wq
>No, he's right. It is in the nature of random data to be uncompressible.
>How could you possibly see any repeating patterns there?
>The counting argument (or theorem) just say that it's impossible to
>compress *all* files. It does not however say it's impossible to
>compress *one* file.
>
>Mikael
>
>In article <01bfe104$704126e0$4b51143f@tim>,
> "Tim" <ther...@earthlink.net> wrote:
>>
>>
>> paul womack <bug...@NOSPAMmediabridge.net> wrote in article
>> <3959CCAA...@NOSPAMmediabridge.net>...
>> > mikael.l...@spray.se wrote:
>> > >
>> > > Yes, it's true that random data are very hard to compress ...
>>
>> > The easiest way to show that they are rubbish (without going down
>> > the highways and byways of whatever hand-waving buzzwords the
>> > proposal uses) is the counting argument. Which is why we consider it
>> > important, and relevant.
>> >
>> > BugBear
>> >
>>
>> Is it not true that within a sufficiently long sample output from a
>> random source one could find a sequence that could be compressed by
>> *any* method? even given the overhead of the offset to that sequence
>> and the length? While impractical due to the likely size of the
>> sample, this suggests there may be a theoretical exception to the
>> counting argument. I of course could be way off base here (have been
>> before).
>>
>> Just a thought
>>
>> Tim
>>
>
>
>Sent via Deja.com http://www.deja.com/
>Before you buy.
The counting argument only means not all files can be compressed to
a smaller size by a given compression method. But no file is truely
random in the sense. That just because come certain compression routine
does not compress it to a smaller size. Then some other compression routine
will compress it to a smaller size. No finite file is truely random to
every possible compressor routine.
David A. Scott
--
SCOTT19U.ZIP NOW AVAILABLE WORLD WIDE
http://www.jim.com/jamesd/Kong/scott19u.zip
Scott famous encryption website NOT FOR WIMPS **no JavaScript allowed**
http://members.xoom.com/ecil/index.htm
Scott rejected paper for the ACM
http://members.xoom.com/ecil/dspaper.htm
Scott famous Compression Page WIMPS allowed ** JavaScript OK**
http://members.xoom.com/ecil/compress.htm
**NOTE EMAIL address is for SPAMERS***
I leave you with this final thought from President Bill Clinton:
"The road to tyranny, we must never forget, begins with the destruction
of the truth."
> Suppose I do have (and I don't, and probably won't have in the near time :-)
> a compression algorithm that can compress ALL files. How much would you pay
> for such algorithm?
> I estimate roughly the value of such algorithm in about 50 billion dollars
> US. After all, it does solve the world's data transfer and storage
> problems...
> Now: with such great sum of money in mind, wouldn't you try to achive this
> goal?
Here's a chance to practice your probability theory to calculate the
expected value of your invention. To get the expected value, you take
the set of all possible events, and multiply the probability of each
even by the value of the event. In this case:
Event=you find such an algorithm. Probability=0, Value=50,000,000,000
Event=you don't find it. Probability=1, Value=0
Now you're expected value is 0*50000000000 + 1*0 = 0. Doesn't look
like you should waste much time on this, does it? :-)
--
Steve Tate --- srt[At]cs.unt.edu | Gratuitously stolen quote:
Dept. of Computer Sciences | "The box said 'Requires Windows 95, NT,
University of North Texas | or better,' so I installed Linux."
Denton, TX 76201 |
It's easy to compress *all* files of a given length. Sort them into ascending
order, take the forward difference, run-length encode, and (if you're really
anal) save the result for the decompresser.
> It does not say however that it's impossible to compress one file with
> stunning results!
Or any given set of files.
No kidding. Give me a file and I'll write a compression program that
will compress it down to one bit. I warn you, the decompression code
will be rather large. Roughly the size of the original file, as a matter
of fact.
If you require that you include the size of the decompression code
in your total (not unreasonable. It helps you avoid stunt programs
like the one I just described) then there are some files that are
incompressible. Quite a lot of files, actually. You will rarely
run into these files in practice, but they do exist.
Alan
Binder Edgar <Binde...@T-Online.de> wrote in message
news:39591311...@T-Online.de...
> Hi,
>
> Itai Bar-Haim wrote:
>
> [compression of *all files*]
> > I estimate roughly the value of such algorithm in about 50 billion
dollars
> > US. After all, it does solve the world's data transfer and storage
> > problems...
> > Now: with such great sum of money in mind, wouldn't you try to achive
this
> > goal?
> Just a question:
> Suppose I do have (and I don't, and probably won't have in the near time :-)
> a compression algorithm that can compress ALL files. How much would you pay
> for such algorithm?
> I estimate roughly the value of such algorithm in about 50 billion dollars
> US. After all, it does solve the world's data transfer and storage
> problems...
> Now: with such great sum of money in mind, wouldn't you try to achive this
> goal?
> :-)
Well, if I could find someone willing to pay me $5 million USD
yearly for watching porn videos eight hours day, I could retire young
and rich.
Now, with such a great sum of money in mind, *shouldn't* I
assume, without a shred of evidence to support it, that there *is* such
a job opportunity out there waiting for me? And that I should expend
effort trying to find it?
The situation I'm describing is only *unlikely.* And I'd be
silly to pursue it, seeing as how I have no evidence that it exists.
What you are proposing is not unlikely, it is IMPOSSIBLE. And
you are not just seeking it with no evidence that it exists; you are
seeking it *despite* evidence that it *cannot.*
-jc
--
* -jc IS *NOW* feld...@cryogen.com
* Home page: http://members.tripod.com/~afeldspar/index.html
* The home of >>Failed Pilots Playhouse<<
* "Better you hold me close than understand..." Thomas Dolby
But goodluck just the same.
-mike
--
Michael A Maniscalco
Visit my homepage on the new M99 data compression scheme at
http://michaelmaniscalco.homepage.com/M99index.htm
In article <3954...@news.bezeqint.net>,
"Itai Bar-Haim" <av...@bezeqint.net> wrote:
> Hi.
> Since Math actually based on definitions, proven to work in many
cases (well
> all of them, actually :-), I wouldn't try to break that theory. If
some
> smart greek guys many years ago have decided that 1+1=2, than you
can't tell
> them that you found a way to break their own definition and that from
now on
> 1+1=3. That's why it's definition. What is possible, though, is to
create a
> "newer version of math". You will define your own rules of the new
language
> of numbers, and than everything will work the way you want it.
> I think that theories that are based on definitions are not supposed
to be
> broken. Actually, I think they cannot be broken. But pure theories
based on
> logical thinking only can be broken.
> The theory of data compression is based on logical thinking only, and
many
> of the compression algorithms are not based on math at all. That's
why I
> believe that the compression theory that says that infinate
compression is
> impossible is wrong. It is logical, but yet, I think it's wrong.
That's why
> I'm going to try to prove that it is wrong, by developing an
algorithm that
> will actually compress any data whatsoever (probably under minimum
block
> size restrictions, so what?).
> If anyone think the same and want to try to help me, he/she may
contact me,
> and I'll be glad to discuss this issue.
>
> Itai.
>
> Alan Morgan <amo...@Xenon.Stanford.EDU> wrote in message
> news:8j03u9$d2k$1...@nntp.Stanford.EDU...
> > In article <slrn8l613a.t6...@jalua.ch.genedata.com>,
> > Phil Norman <Phil....@genedata.com> wrote:
> > >On Thu, 22 Jun 2000 20:39:32 GMT,
> > > mikael.l...@spray.se <mikael.l...@spray.se> wrote:
> > >>How true it is what you are saying ...
> > >>I almost suspect that when people come with a lot of difficult
theory to
> > >>explain why it's impossible to solve this or that problem, they
actually
> > >>really don't have a clue. The fact remains, the most powerful
tool you
> > >>have is your mind. If your attitude is that nothing is impossible
and
> you
> > >>try and try again, NEVER EVER giving up, then the problem is
destined to
> > >>be solved no matter what the NO-sayers tries to tell you!
> > >>Where would we be if not Edison, Marconi or Einstein, with this
> attitude,
> > >>had not done what they did?
> > >
> > >Absolutely. I, personally, have been trying to get 1+1 to equal 3
> > >for a good number of years now. I've not managed it yet (when I
> > >do, I'll post the results on http://inkvine.fluff.org/~forrey/),
> > >but *I'm* certainly not giving up just because mathematicians say
> > >it's impossible. What do they know? They're just constrained by
> > >all the prejudices they've been taught during their studies.
> >
> > Good work! Might I suggest that, when you have finished this, you
> > start to look for the even prime numbers greater than 2.
Mathematicians
> > claim they don't exist, but they haven't checked all of them so they
> > can't possibly know. I'm pretty sure that given the number of even
> > numbers greater than two that *one* of them has to be prime.
> >
> > Alan
Interesting approach.
Some of Itai's comments have been so ridiculous that
I wonder whether he actually knows what 'math' and
'logic' and 'proof' means.
Dror
At least Itai knows his current situation.
Dror
Mikael
In article <Pine.GSO.4.21.000629...@chopin.ifp.uiuc.edu>
,
mikael.l...@spray.se wrote:
> How about a compressor with compression similar to or better than Malcolm
> Taylor's rk and its fantastic compression combined with memory use and
> speed similar to or faster than some of the lz based compressors?
> Isn't this realistic?
> Isn't this of high commercial value?
Hmm, I don't think that has much commercial potential, because Zip got too
much of a standard - but that's only my oppinion...
--
Edgar
I agree completely that sometimes I lack politeness.
However, I was quiet for several weeks, but I've really
had it. Itai began his series of postings asking how to
compress his interesting source. And people explained
to him in detail why theoretically his requirement was
not realistic (actually, it was impossible).
But apparently Itai was not satisfied with these answers
and decided that all of information theory was useless.
So I'm sorry to be blunt, but Itai's approach of 'you are
all wrong I'll show you' and the '50 billion' claims,
all these got me pissed.
Perhaps Itai is a talented programmer/engineer, but he
needs to brush up on his theory.
Dror
> You and Tate are completely right of course. However not so polite.
> Why not create something positive out of this and make a compromise?
> How about a compressor with compression similar to or better than Malcolm
> Taylor's rk and its fantastic compression combined with memory use and
> speed similar to or faster than some of the lz based compressors?
> Isn't this realistic?
> Isn't this of high commercial value?
>
"US" telephone exchanges must actually begin with a digit 2-9
so that a "long-distance" prefix of 1 or 0 is unambiguous
(since as you note later tel. nos. are required to be a prefix code)
thus you can have at most 8e6 subscriber numbers per area code.
For historical reasons area codes also must begin with 2-9,
so there are at most 800 areas, and some of them are reserved
for things like 911, 800 (now also 877 and 866), 411, etc.
(Originally, exchanges were constrained to be (2-9)(2-9)(0-9)
and area codes to be (2-9)(0-1)(0-9), but these restrictions
became technologically unnecessary and politically unacceptable
about 10 years ago. And in practice because of people starting
and stopping phone service in ways that are not finely predictable,
you can never actually have all possible numbers assigned.)
In big metropolitan areas like New York and Los Angeles,
nowadays shortcut 7-digit dialing is almost useless because
many calls even to nearby locations are a different area code,
and if and where new area codes are added by overlay rather than
(as traditional) geographic split, the FCC *requires* that 10-digit
dialing be mandated for "fairness", to prevent the incumbent carrier(s)
from being more convenient to use initially than newer competitors.
The "US" telephone numbering scheme actually covers Canada
and most of the Caribbean as well and is formally called
the North American Numbering Plan (NANP).
--
- David.Thompson 1 now at worldnet.att.net