mathematic approach

TekCat

unread,

May 22, 2003, 1:58:03 AM5/22/03

to

Most of the compression software uses some sort of statistical analysis of
the data to work with redundancies. Has anyone thought of using
mathematical formulas that can potentially represent the data set (lets say
a sequence of bytes) as a number, and using these formulas (like exponents,
factorials, and such) to break the big number into smaller number or set of
numbers ( they would have less number of bits than the source number).
For instance X = 128 bit number, and using X = a! + b, where sum of bits of
a and b would be < 128 bits. Of course not all the numbers could use the
same formula, some would use something else, like X = c^d + e, or maybe
some numbers can't be dissected at all, so they would be just copied as is.
My guess, that if speed is not very important (because math takes too much
processing power of any processor) the compression can be applied
recursively (possibly with some sort of data renormalization).
Any thoughts?

Thomas Richter

unread,

May 22, 2003, 6:37:17 AM5/22/03

to

Hi,

It is the very matter of the Shannon theorem that it does *not* matter how
you represent the data (as for example, as encoded in a binary system or
encoded as prime factors, to present an example straight out of the box).
Provided that you imply the same pre-knowledge of the data at the
decompressor, the archivable compression is limited by the very same amount,
namely its entropy. It's a mathematical fact as much as the second law
of thermodynamics is a fact. You just cannot build a perpetuum mobile, no
matter what kind of tricks you'd like to try. (-;

So long,
Thomas

Marco Al

unread,

May 22, 2003, 7:51:25 AM5/22/03

to

"Thomas Richter"

> You just cannot build a perpetuum mobile, no
> matter what kind of tricks you'd like to try. (-;

Perpetuum mobile are merely unlikely.

Compression of random sources is impossible.

Marco

Wojciech K.

unread,

May 22, 2003, 8:06:46 AM5/22/03

to

> Provided that you imply the same pre-knowledge of the data at the
> decompressor, the archivable compression is limited by the very same amount,

> namely its entropy. It's a mathematical fact as much as the second law...

According to Bruce Schneier's "Applied Cryptography", the informational rate r
is equal to the entropy H(M) of the message divided by the *length* N of the
message:
r = H(M) / N

So you're right saying that the maximal achievable compression is determined by
the entropy. BUT Shannon also stated that the entropy is dependent on the length
of the message (which itself is dependent on the underlying representation of
the data, of course). And thinking the practical way, it is possible to do
compression the way proposed by TekCat (and if the description provided by Ray
in the thread "Information about new algorithm" is right, then ZeoSync's patent
seems to do pretty the same).

Cheers,
Wojciech K.

Steven Pigeon

unread,

May 22, 2003, 9:10:40 AM5/22/03

to

--
"TekCat" <tek_cat...@hotmail.com> wrote in message
news:bahpec$t36ui$1...@ID-21580.news.dfncis.de...

> Most of the compression software uses some sort of statistical analysis of
> the data to work with redundancies. Has anyone thought of using
> mathematical formulas that can potentially represent the data set (lets
say
> a sequence of bytes) as a number, and using these formulas (like
exponents,
> factorials, and such) to break the big number into smaller number or set
of
> numbers ( they would have less number of bits than the source number).

This approach is called "Kolmogorov Complexity". Kolmogorov's approach
to sequence complexity is to find the shortest possible program (actual
instructions
and data) that, once run, produces a given sequence then stops.

Andres Paniagua

unread,

May 22, 2003, 10:57:26 AM5/22/03

to

Hi Cat, as alluring as the idea is, the difficulties are great.
If you have a function in the first place, of which you happen to know
that it will produce the String you want to compress, given an input
x, then youre done. But, and this is going to be the general case, if
you don't have prior knowledge of that function, how are you going to
look for it?
Even if you did have a method for finding a "generating" function for
a particular string, you still wouldn't have any guarantee that the
encoding of the function and the "seed" value are shorter than the
string you hope to compress.
The common compression methods, try to reduce redundancies, because
those are a lot easier to look for and there are well studied methods
for encoding strings in a way that reduces redundancy.

As for applying compression recursively, that isnt going to get you
anywhere, you simply cannot compress everything. If you device a
method which on succesive passes increasingly compresses a given
string, there will always a limit after which, regardles of how often
you reapply it, it will stop to compress. And whats more,will simply
be an inefficient method, a good one would achieve maximum compression
in a single pass.

Andres Paniagua

unread,

May 22, 2003, 11:45:29 AM5/22/03

to

Hi Thomas, there is of course no magic method for compressing
everything to a fixed length. That said, I think that your argument is
not quite right. Shanon wrote about encoding strings in a way that
minimizes the length. He gives exact measures of the attainable
reduction for given parameters etc. However, for specific strings, you
can attain compression ratios far beyond his theory if you make use of
something he specifically excludes, namely semantics. Consider this
example:
lets suposse we have a function pi() which for any natural n computes
the number pi to the n'th place in binary notation. From here on all
strings refered to shall be binary strings. So if we want to compress
a String which we KNOW to be pi to the n-th position all we need to
encode it, is the encoding of the function pi() which is constant and
the code for n. The length of the encoding for our function is
constant, lets call it C, while the length of the encoding for n is
log(n). So the length of the compressed string will be C+log(n) while
the length of the uncompressed string is n. The ratio is C+log(n)/n so
that the larger n gets, the better the ratio gets. And it can
arbitrarily get better than the methods which asume that the string
being compressed is random.

Thomas Richter

unread,

May 22, 2003, 1:09:13 PM5/22/03

to

Hi,

You make actually two points here; one is correct (due to my
oversimplification) and the other is not. First things first:

It is correct that the Shannon theorem holds for "long" strings, that is,
it holds in the limiting case for string lengths going to infinity. It
does not grant strict results for finite strings. Sorry for that, shouldn't
have tried to over-simplify the arguments. This is almost the same as
the second law of thermodynamics which holds "on average for large systems".
Locally, entropy may decrease sometimes, though this case is "unlikely".

A second question is in which case "on average" statements make any
sense on *finite* sequences. Honestly, I would have problems to formulate
an average compression ratio on finite strings as a strict mathematical
property.

On the other hand, we always have the counting argument on finite
strings, granting you that you cannot compress *all* n-bit sequences to
less than n-bits, thus you have to make a model of the data you'd like
to compress. Whether a model consisting of "simple computability" as
proposed is a good model depends on the data you'd like to compress. As
the output of a mathematical plot it would be excellent, clearly. As a
model for audio data --- for example --- I have my doubts.

Your "pi" argument is void, however: Here you imply a pre-knowledge on
the string, namely "it's Pi", or "it's computable by algorithm Pi()"
or "it's a computable number". It is no art to compress a special
given string to a given length. In fact, if you've 4 known strings of
length 10000 bits, and the decoder knows this, you can compress them
to two bits each. (Humorous side remark: In image compression, someone
"came up" with the idea to compress images with a single bit, namely
"it's the Lena image", a well known test case for image
compression). Hence, saying that "we are able to compress special
cases to shorter lengths" doesn't make much sense as a mathematical
statement.

For compression, you'd need to make a statement as: "Given sequences
in a sequence class (as for example, given by a first-order
propability distribution or any other model) we are on average able to
compress these to..."

Greetings,
Thomas

Dr Chaos

unread,

May 22, 2003, 2:04:19 PM5/22/03

to

But similarly to thermodynamics, compression of strings examples from
probabilistic stationary information sources better than the entropy
rate is exponentially(*) unlikely.

(*) I think. The AEP better characterizes

Dr Chaos

unread,

May 22, 2003, 2:13:18 PM5/22/03

to

On Thu, 22 May 2003 14:06:46 +0200, Wojciech K. <> wrote:
>
>> Provided that you imply the same pre-knowledge of the data at the
>> decompressor, the archivable compression is limited by the very same amount,
>> namely its entropy. It's a mathematical fact as much as the second law...
>
> According to Bruce Schneier's "Applied Cryptography", the informational rate r
> is equal to the entropy H(M) of the message divided by the *length* N of the
> message:
> r = H(M) / N

M is not a concrete finite message here, but the probability *distribution* of
length-N strings taken from the source, and more correctly

r = lim(N->infinity) H(M_N) / N

> BUT Shannon also stated that the entropy is dependent on the length
> of the message

He did not say such a thing.

Entropy is a property of a probabilistic information *source*, a
conceptual "machine" which emits symbols.

Entropy is a functional on probability distributions. What Shannon
was saying is that you can look at the probability distribution of
succesively longer words of the distribution produced from a source,
apply the entropy functional to those. Then when you take the
intensive (*1/N) limit of block entropies in the manner above, you
will get the "entropy rate".

That number is the bound on lossless compression.

To make an analogy, an ordinary differential equation for a particular
system, say dynamics from Newton's laws, is a "dynamical source",
and the particular orbit which is emitted from integrating from
any specific initial condition is the analogy of the finite message.

Matt Mahoney

unread,

May 22, 2003, 6:41:45 PM5/22/03

to

ap...@yahoo.com (Andres Paniagua) wrote in message news:<734ef97.03052...@posting.google.com>...

> As for applying compression recursively, that isnt going to get you
> anywhere, you simply cannot compress everything. If you device a
> method which on succesive passes increasingly compresses a given
> string, there will always a limit after which, regardles of how often
> you reapply it, it will stop to compress. And whats more,will simply
> be an inefficient method, a good one would achieve maximum compression
> in a single pass.

Even if you could get it to work (which is impossible), you would
still have to license the patent. See
http://www.teaser.fr/~jlgailly/05488364.html

-- Matt Mahoney, matma...@yahoo.com

Andres Paniagua

unread,

May 23, 2003, 10:05:12 AM5/23/03

to

I'm afraid that we have some misunderstandings, at the start, I want
to restate that it is perfectly clear to me that there is no limitless
compression, regardless of the method applied.

> You make actually two points here; one is correct (due to my
> oversimplification) and the other is not. First things first:
>
> It is correct that the Shannon theorem holds for "long" strings, that is,
> it holds in the limiting case for string lengths going to infinity. It
> does not grant strict results for finite strings. Sorry for that, shouldn't
> have tried to over-simplify the arguments. This is almost the same as
> the second law of thermodynamics which holds "on average for large systems".
> Locally, entropy may decrease sometimes, though this case is "unlikely".
>
> A second question is in which case "on average" statements make any
> sense on *finite* sequences. Honestly, I would have problems to formulate
> an average compression ratio on finite strings as a strict mathematical
> property.
>
> On the other hand, we always have the counting argument on finite
> strings, granting you that you cannot compress *all* n-bit sequences to
> less than n-bits, thus you have to make a model of the data you'd like
> to compress. Whether a model consisting of "simple computability" as
> proposed is a good model depends on the data you'd like to compress. As
> the output of a mathematical plot it would be excellent, clearly. As a
> model for audio data --- for example --- I have my doubts.

Actually, the point I was trying to make was, that other approaches to
the problem of compression are possible than the commonly derived from
information theory. Let us first agree that by compression (loseless)
we mean the following: given a bijective function K on the set of all
strings over an alphabet compression occurs when for a string s the
length of K(s) is smaller than the length of s.
Now, the methods derived from information theory work under the
assumption that the real world meaning of the strings to be compressed
is irrelevant. The amount of information is measured as a function of
uncertainty.
However there is no obligation to using these methods for compression.
Any bijection on the set of all strings over an alphabet will compress
a subset of strings. Nothing forbids us using the semantics (provided
there is a semantic element) of our set of strings to construct such a
bijection which will compress a subset of strings. The approach
suggested by the OP could be regarded thus.

>
> Your "pi" argument is void, however: Here you imply a pre-knowledge on
> the string, namely "it's Pi", or "it's computable by algorithm Pi()"
> or "it's a computable number".

Actually it was meant to be an example of what you can achieve by the
use of the real world meaning of a string. It is not void, and in
terms of compression, it perfectly holds.

> It is no art to compress a special given string to a given length.

I hope by "a special given" you don't mean "any specific", please
clarify.

> In fact, if you've 4 known strings of
> length 10000 bits, and the decoder knows this, you can compress them
> to two bits each. (Humorous side remark: In image compression, someone
> "came up" with the idea to compress images with a single bit, namely
> "it's the Lena image", a well known test case for image
> compression).

note that example did not asume that the decoder know anything about
the encoded message. The only thing that the decoder needs to know, is
how to decode the compressed string into an algorithm and aply it to
the suplied "seed". You will agree with me that for any
encoding/decoding to work, both ends need to know how to handle the
message.

> Hence, saying that "we are able to compress special
> cases to shorter lengths" doesn't make much sense as a mathematical
> statement.

As I pointed out before, a given compression method, will only work
for a subset of all possible strings. So actually for any method you
are allways saying "we are able to compress special cases to shorter
lengths". In my humble opinion, compression research is about finding
those bijections whose "special cases" happen to be those strings
which we want to compress.

Reivilo

unread,

May 23, 2003, 11:58:00 AM5/23/03

to

Did anybody ever tried a genetic approach to generate the shortest
program. I mean, it won't be THE shortest program, but the compressor
would iteratively reduce the size of the program at each generation.

As it would approach the kolmogorov complexity it would get harder and
harder to reudce the size of the program, so the number of generation
and/or the rate at which it reduces must be significany of how far we
are from the kolmogorov complexity.

Like maybe one could have a formula that give the probabilty that a
program has the size of the kolmogorov given its generation and the rate
at which the message reduce, or something like that.

Steven Pigeon

unread,

May 23, 2003, 11:36:40 PM5/23/03

to

> Did anybody ever tried a genetic approach to generate the shortest
> program. I mean, it won't be THE shortest program, but the compressor
> would iteratively reduce the size of the program at each generation.

Not that I know of.

>
> As it would approach the kolmogorov complexity it would get harder and
> harder to reudce the size of the program, so the number of generation
> and/or the rate at which it reduces must be significany of how far we
> are from the kolmogorov complexity.

Even restricting the class of programs (as Chaitin did), the number of
programs to examine is incredibly large. The longer the sequence, the
larger the number of programs. Even using a genetic-type algorithms,
this may be too complex to get good compression results.

> Like maybe one could have a formula that give the probabilty that a
> program has the size of the kolmogorov given its generation and the rate
> at which the message reduce, or something like that.

The notion of probability is excluded from Kolmogorov complexity.
However, If you could managed to get such a formula, then it would
be very strongly determined by the class of sequences you want to
compress.

Mikael Lundqvist

unread,

May 24, 2003, 4:45:19 AM5/24/03

to

Steven Pigeon wrote:
>>Did anybody ever tried a genetic approach to generate the shortest
>>program. I mean, it won't be THE shortest program, but the compressor
>>would iteratively reduce the size of the program at each generation.
>
>
> Not that I know of.
>

That's not hard to believe.
A sequence as tiny as 4 bytes means 256*256*256*256 different machines.
This is why we simplify things and make assumptions about the data, like
about statistics and predictability (and different variations within this).
This will never generate perfection but is at least *usable*.
There may sometimes be lightyears in distance between theory and
practical engineering.
--
Mikael Lundqvist
http://www.geocities.com/mikaellq/
Occam's Razor:
"Keep things simple!"

Reivilo

unread,

May 24, 2003, 9:06:05 PM5/24/03

to

Steven Pigeon wrote:
>>Did anybody ever tried a genetic approach to generate the shortest
>>program. I mean, it won't be THE shortest program, but the compressor
>>would iteratively reduce the size of the program at each generation.
>
>
> Not that I know of.
>
>
>>As it would approach the kolmogorov complexity it would get harder and
>>harder to reudce the size of the program, so the number of generation
>>and/or the rate at which it reduces must be significany of how far we
>>are from the kolmogorov complexity.
>
>
> Even restricting the class of programs (as Chaitin did), the number of
> programs to examine is incredibly large. The longer the sequence, the
> larger the number of programs. Even using a genetic-type algorithms,
> this may be too complex to get good compression results.
>

The idea is not examining a certain number of program, it's considering
a certain universal turing machine, starting with a program that
reproduce the observe sequence and ramdomly generating programs from it,
using the size of the program as the selection criteria.

Its like, say, you want small people, so what you do is make them breed,
and systematically kill the 50% tallest of them. and with time the
people will be get smaller and smaller on average along generations.

Steven Pigeon

unread,

May 24, 2003, 10:03:56 PM5/24/03

to

"Reivilo" <rei...@hotmail.com> wrote in message
news:bap4u0$r7c$1...@news-reader14.wanadoo.fr...

> Its like, say, you want small people, so what you do is make them breed,
> and systematically kill the 50% tallest of them. and with time the
> people will be get smaller and smaller on average along generations.

I know what a genetic algorithm is. What I'm saying is that for a
lengthy sequence, the computational requirements are still VERY
high.

Wojciech K.

unread,

May 25, 2003, 10:06:30 AM5/25/03

to

> M is not a concrete finite message here, but the probability *distribution* of
> length-N strings taken from the source, and more correctly
>
> r = lim(N->infinity) H(M_N) / N
>
> > BUT Shannon also stated that the entropy is dependent on the length
> > of the message
>
> He did not say such a thing.
>

FYI: C.E. Shannon: "Predication and Entropy in Printed English, Bell System
Technical Journal, vol. 30, n.1, 1951, pp. 51-64.

Plus, if this utterance was wrong, why would you bother to make your entropy
dependent of N in the formula
r = lim(N->infinity) H(M_N) / N ?

Because that would be a contradiction (as I see it), I believe that r =
lim(N->infinity) H(M_N) / N is true, while "He did not say such a thing." isn't.

Cheers,
Wojciech K.

Thomas Richter

unread,

May 26, 2003, 7:51:46 AM5/26/03

to

Dale King

unread,

May 26, 2003, 6:50:48 PM5/26/03

to

"Reivilo" <rei...@hotmail.com> wrote in message

news:balgeb$fvp$1...@news-reader12.wanadoo.fr...

It doesn't matter what approach you use to try to find the shortest string.
The important thing from Kolmogorov complexity is not how to find the
shortest strings but that for some inputs there is no shorter string than
the original string. The other important result is that switching languages
doesn't buy you anymore than a constant (depending on the languages).
--
Dale King

Andres Paniagua

unread,

May 28, 2003, 12:52:24 AM5/28/03

to

Hi Thomas, did you mean to respond? The actual message didn't make it
into your posting, look at it...
regards

Andrés