The best compression

timo Makinen

unread,

Aug 18, 2003, 1:50:00 AM8/18/03

to

I am able to compress any data to more then 1/2 of its size as long as the
data is larger then 8k. I can recompress compressed files down so anything
is down to 8k.

ANYTHING DOWN TO 8K

however there is one catch, no its not lossy but its slow !
1200 Duron compresses 800k (random bin file) down to 100k in over a week.
The decompression is somewhat faster only 49 hours or so.

Faster CPU's Please !

Addressing the naysayers.

Yes i understand the traditional line of thought in that u cant squeeze
data in that small space because its not mathematically possible. I am aware
of such arguments. However that argument is only valid when programming
under the environment that places the constraints in the first place with is
traditional math applied compression.
Which is not the case here. I am getting legal protection for the code and
then i can discuss it in detail.

Marco Schmidt

unread,

Aug 18, 2003, 2:20:56 AM8/18/03

to

fup2 comp.compression

timo Makinen:

> I am able to compress any data to more then 1/2 of its size as long as the
>data is larger then 8k. I can recompress compressed files down so anything
>is down to 8k.

Ah, it's that time of the year again.

[...]

> Yes i understand the traditional line of thought in that u cant squeeze
>data in that small space because its not mathematically possible. I am aware
>of such arguments. However that argument is only valid when programming
>under the environment that places the constraints in the first place with is
>traditional math applied compression.

Please read
<http://www.faqs.org/faqs/compression-faq/part1/section-8.html> in
order to find out that "any" environment mapping bit sequences to bit
sequences disallows random compression. If you map something else to
something else, you are leaving the realms of what is considered
compression.

[...]

Regards,
Marco

fuse

unread,

Aug 18, 2003, 3:17:03 AM8/18/03

to

impossible!!!

"timo Makinen" <timokm...@hotmail.com> wrote in message
news:cOZ%a.189733$4UE....@news01.bloor.is.net.cable.rogers.com...

Guest

unread,

Aug 18, 2003, 4:18:21 AM8/18/03

to

1 WEEK for 800K ?

is it written using m$ VB3 ?? :)

"timo Makinen" <timokm...@hotmail.com> wrote in message
news:cOZ%a.189733$4UE....@news01.bloor.is.net.cable.rogers.com...

Guest

unread,

Aug 18, 2003, 4:59:31 AM8/18/03

to

could you post at least a screenshot ?

"fuse" <wufe...@163.net> wrote in message
news:bhpuhu$601$1...@mail.cn99.com...

Willem

unread,

Aug 18, 2003, 7:18:23 AM8/18/03

to

timo wrote:
) ...
)
) ANYTHING DOWN TO 8K
)
) ...

People, please don't feed the trolls.

SaSW, Willem (at stack dot nl)
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

David A. Scott

unread,

Aug 18, 2003, 7:32:42 AM8/18/03

to

"timo Makinen" <timokm...@hotmail.com> wrote in

news:cOZ%a.189733$4UE....@news01.bloor.is.net.cable.rogers.com:

>
> I am able to compress any data to more then 1/2 of its size as long
> as the
> data is larger then 8k. I can recompress compressed files down so
> anything is down to 8k.
>
> ANYTHING DOWN TO 8K
>
>

One reason why this is foolish it that its the same as saying
you can compress any large file to 8k. Which is very stupid.
Example take a 256k file. Its larger than 8k so compress it to
128k then assume the 128k is the input compress it down to 64k
and so on till you reach 8k. Know don't tell me you can only
compress once since you stated it works on all files.
Know do you see why your WRONG??

David A. Scott
--
SCOTT19U.ZIP NOW AVAILABLE WORLD WIDE "OLD VERSIOM"
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Crypto code http://radiusnet.net/crypto/archive/scott/
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **

Disclaimer:I am in no way responsible for any of the statements

made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

Stuart Caie

unread,

Aug 18, 2003, 8:39:12 AM8/18/03

to

timo Makinen wrote:
> ANYTHING DOWN TO 8K

Steps to success:
1. Read the FAQ about impossible compression.
2. Deliberately post about impossible compression.
3. ???
4. Profit!

Actually, this reminds me. Did anyone here hear about "hash zip"?

The author really believed he could use a message-digest algorithm to turn
each 256 bytes into just the 12-16 bytes of the digest plus a little
overhead. Obviously, this is possible. However, he then believed that the MD
could be "restored" by iterating over all permutations of the 256 bytes
until he found a matching MD. He thought MD was some kind of compression,
and didn't realise that one 16 byte MD corresponds to 2^(256-16) possible
256 byte blocks on average and even to iterate which one was the correct
expansion would end up taking more than 256 bytes on average.

http://freedaemonconsulting.com/tech/hzip.php

Regards
Stuart

timo Makinen

unread,

Aug 18, 2003, 10:35:13 AM8/18/03

to

However that argument is only valid when programming
>under the environment that places the constraints in the first place with
is
>traditional math applied compression

"Marco Schmidt" <marcos...@geocities.com> wrote in message
news:ajr0kv05fjrjh5o5t...@4ax.com...

timo Makinen

unread,

Aug 18, 2003, 10:35:15 AM8/18/03

to

However that argument is only valid when programming
>under the environment that places the constraints in the first place with
is
>traditional math applied compression

"Willem" <n...@sp.am.invalid> wrote in message
news:slrnbk1d...@toad.stack.nl...

timo Makinen

unread,

Aug 18, 2003, 10:35:14 AM8/18/03

to

However that argument is only valid when programming
>under the environment that places the constraints in the first place with
is
>traditional math applied compression

"fuse" <wufe...@163.net> wrote in message
news:bhpuhu$601$1...@mail.cn99.com...

timo Makinen

unread,

Aug 18, 2003, 10:35:15 AM8/18/03

to

However that argument is only valid when programming
>under the environment that places the constraints in the first place with
is
>traditional math applied compression

"David A. Scott" <daVvid_...@email.com> wrote in message
news:Xns93DB38C738664H1...@130.133.1.4...

Marco Schmidt

unread,

Aug 18, 2003, 11:02:14 AM8/18/03

to

timo Makinen:

>However that argument is only valid when programming
>>under the environment that places the constraints in the first place with
>is
>>traditional math applied compression

What exactly are you adding to the discussion by repeating a
grammatically incorrect sentence from your first posting four times,
full-quoting those who replied to you? If you're not interested in
discussion, don't participate. But don't spam the group either.

Regards,
Marco

Dale King

unread,

Aug 18, 2003, 11:28:27 AM8/18/03

to

At the risk of feeding the trolls, but just in case we can stop another
person falling for such impossibilities.

"timo Makinen" <timokm...@hotmail.com> wrote in message
news:cOZ%a.189733$4UE....@news01.bloor.is.net.cable.rogers.com...

> I am able to compress any data to more then 1/2 of its size as long as the
> data is larger then 8k. I can recompress compressed files down so anything
> is down to 8k.
>
> ANYTHING DOWN TO 8K
>
> however there is one catch, no its not lossy but its slow !

No, by definition, it must be lossy. You have 256^8096 files that are 8K
bytes. You have an infinite number of inputs that are larger than 8K bytes.
A reversible mapping from an infinite number of things to only 256^8096
things is a logical impossibility.

> 1200 Duron compresses 800k (random bin file) down to 100k in over a week.
> The decompression is somewhat faster only 49 hours or so.
>
> Faster CPU's Please !

Which won't help you defy simple logic.

> Yes i understand the traditional line of thought in that u cant
squeeze

It isn't a question of tradition, it is a question of simple logic. If you
have n+1 pigeons and n holes you cannot put every pigeon into a hole and
still have only one pigeon per hole. Your claim is that you can put an
infinite number of pigeons in only 256^8096 holes. That's a lot of holes,
but you still have more pigeons than holes.

> data in that small space because its not mathematically possible.

It is important to not focus on squeezing or on the size of the output. The
issue is not so much the size of the output as the number of outputs there
are vs. the number of inputs. A good analogy is the assigning of telephone
numbers. Compression is like the process of assigning telephone numbers and
decompression is the process of making a telephone call and being connected
with the right person.

Consider if we want to assign everyone a 4 digit telephone number. Each
person must have a unique telephone number. If we give two people the same
telephone number, then which of the two do we get when we enter the phone
number? Giving two people the same telephone number corresponds to lossy
compression. There are only 10,000 4-digit telephone numbers. If we have a
million people there is no way to assign a unique 4-digit telephone number
to each of them.

So extending this analogy, you claim that you can assign 256^8096 numbers to
an infinite number of people without assigning the same number to any two
people. That is a logical impossibility.

> I am aware of such arguments.

But obviously don't understand them (or else you are an idiot troll).

> However that argument is only valid when programming
> under the environment that places the constraints in the first place with
is
> traditional math applied compression.

The argument is always valid. If you have more pigeons than holes you can't
put all the pigeons in different holes. There is no way you can uniquely
assign a 4-digit phone number to one million people. Please point out any
environment where that is not valie. Doesn't matter what environment you are
talking about. That is a basic fact.

The same logic applies to your claim that you can uniqely assign 256^8096
compressed files to an infinite number of inputs. It just is not possible.

> Which is not the case here. I am getting legal protection for the code
and
> then i can discuss it in detail.

Don't bother.

--
Dale King

Guest

unread,

Aug 18, 2003, 11:55:57 AM8/18/03

to

Dave , i dont want to feed the troll either , but
using your logic , even an uncompressed .exe file
couldnt be compressed due the fact its a random data file !

"Dale King" <Ki...@tmicha.net> wrote in message
news:3f40...@news.tce.com...

Thomas Richter

unread,

Aug 18, 2003, 12:35:49 PM8/18/03

to

Hi,

> Dave , i dont want to feed the troll either , but
> using your logic , even an uncompressed .exe file
> couldnt be compressed due the fact its a random data file !

Dave's logic is absolutely correct. You just need to read very
carefully what he's written. He did *not* say that it is impossible
to compress files. He just said that it is impossible to compress
*every possible* file, which is quite some difference.

The difference is that executable files, especially *interesting*
executable files make only a very minor subset(!) (in terms of number)
of all possible files, thus allowing a data compressor to compress them
effectively, at the cost of expanding the majority of byte sequences
that do not make up executables, .doc documents, images, audio files...

The clou is that there is such an enourmous number of un-interesting
files nobody cares about, and that, due to this huge number, the average
file need not even to get expanded by much to compensate for the very
very minor (in number, though not neglectible and interesting) files
we care about.

Choose a good random generator, generate a sequence of 100MB and
run it thru ZIP. You don't get much of a compression here, and the file
is, at best, only a couple of bytes longer than the original (since ZIP
finds that its compressor cannot compress it and stores it uncompressed),
but these couple of additional header bytes 'on average' are enough to
compensate for the 'compressible' files.

And yet, I bet I can compress *this specific file* very efficiently, in
a couple of MBs. Namely, by the program that generated the numbers in
first place... (-; Does this mean that your random file generator wasn't
'random enough'? Think about this!

\begin{quote}
A data compression algorithm is a data expander with some interesting
failure cases.
\end{quote}

To the OP: I offer the following business oportunity using "non-standard"
math. Lend me 5 million dollar. Don't worry, I pay them pack. Within a
week, I sent the first two million, and in another week, I sent the
remaining two million.

What, 2+2 is not five for you? I'm so sorry, I'm using non-standard math.

So long,
Thomas

Marco Schmidt

unread,

Aug 18, 2003, 12:44:05 PM8/18/03

to

fup2 comp.compression

Can we please leave the discussion here?

Guest:

>Dave , i dont want to feed the troll either , but
>using your logic , even an uncompressed .exe file
>couldnt be compressed due the fact its a random data file !

No, executable files are not random data. Even if they were, a given
compressor might reduce _some_ of them in size.

The counting argument talks about not being able to compress _all_
sequences of size N + 1 bits to N bits (or less).

Here again is the description of that argument:
<http://www.faqs.org/faqs/compression-faq/part1/section-8.html> (9.2).

Regards,
Marco

Guest

unread,

Aug 18, 2003, 2:53:59 PM8/18/03

to

again with this ZIP story...

ZIP seeks byte sequences.
ZIP uses BYTES and dont care about PATTERNS.

of course , using ZIP logic , a random data file cant be recompressed
in the majority of the cases , but using different point of views it COULD
be done even if the gain would be 0.00001%.

or not ?

"Thomas Richter" <th...@cleopatra.math.tu-berlin.de> wrote in message
news:bhqv95$e2n$1...@mamenchi.zrz.TU-Berlin.DE...

Guest

unread,

Aug 18, 2003, 2:55:41 PM8/18/03

to

thats true.
BUT in the real world , patterns never occur in the same
percentage.

therefore there IS the way to gain a very snall amount of data.
that's exactly the kind of program i'm writing right now , and
hopefully i'll soon show you you're wrong.

"Marco Schmidt" <marcos...@geocities.com> wrote in message

news:i802kvkaif38gf0vg...@4ax.com...

Mark Adler

unread,

Aug 18, 2003, 2:56:33 PM8/18/03

to

"fuse" <wufe...@163.net> wrote in message news:<bhpuhu$601$1...@mail.cn99.com>...
> impossible!!!

Not at all. Of course he can compress down to 8K. He didn't say
anything about *de*compressing. :-)

mark

Marco Schmidt

unread,

Aug 18, 2003, 3:17:54 PM8/18/03

to

Guest:

>thats true.
>BUT in the real world , patterns never occur in the same
>percentage.
>
>therefore there IS the way to gain a very snall amount of data.
>that's exactly the kind of program i'm writing right now , and
>hopefully i'll soon show you you're wrong.

I'm wrong about what? I never said anything about gaining small
percentages on some compressed files, I just reiterated what the
counting argument is about.

BTW, existing archivers already are able to shrink some compressed
files by a certain margin.

Regards,
Marco

Willem

unread,

Aug 18, 2003, 3:35:48 PM8/18/03

to

Guest wrote:
) thats true.
) BUT in the real world , patterns never occur in the same
) percentage.
)
) therefore there IS the way to gain a very snall amount of data.
) that's exactly the kind of program i'm writing right now , and
) hopefully i'll soon show you you're wrong.

I'm pretty sure you're trolling, but I'm happy to play along...

Could you please write precisely what it is you claim you can do ?

For example:

" Given an input of 2^16 bytes, I can create an output of 2^16-1 bytes, and
I can then recreate the given input from the output. This works for all
possible inputs. "

(Which happens to be impossible, but hey..)

SaSW, Willem (at stack dot nl)
--

Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be

Falk Hueffner

unread,

Aug 18, 2003, 3:56:08 PM8/18/03

to

Willem <n...@sp.am.invalid> writes:

> I'm pretty sure you're trolling, but I'm happy to play along...

Please, stop it. Look at the previous posts and guess whether this
will lead to anything useful (hint: no). You're just wasting your and
everybody else's time.

--
Falk

Guest

unread,

Aug 18, 2003, 4:08:13 PM8/18/03

to

thats exactly what im doing right now.
the issue is that in SOME files i can do that gaining an average of 1 bit
each 128 bits ,
but in most other cases the target comes bigger than the source.

too bad , but im studying how to make it work at least for .avi files
which are the most promising.

Guest

unread,

Aug 18, 2003, 4:10:44 PM8/18/03

to

gaining SMALL percentage on SOME compressed random data files
IS in my opinion possible , infact the algo im working on can do that
in some kind of files (expecially on .avi).

due the miserable gaining factor i would call it "optimization" instead
of compression , yet im on the good path to make it work fast
and reliable , so when and IF my program will work right i'll let you
know the details.

"Marco Schmidt" <marcos...@geocities.com> wrote in message

news:g892kvk8e6nb51835...@4ax.com...

Thomas Richter

unread,

Aug 18, 2003, 4:41:47 PM8/18/03

to

Guest wrote:

> ZIP seeks byte sequences.
> ZIP uses BYTES and dont care about PATTERNS.

That doesn't touch the argument at all.

> of course , using ZIP logic , a random data file cant be recompressed
> in the majority of the cases , but using different point of views it COULD
> be done even if the gain would be 0.00001%.

You still don't understand what we're all saying:

a) If you have a specific file, it is easy to write a compressor that
compresses *this* file to any size you want, even to a single bit, and
that expands all other files by just a bit. Here's the algorithm that
compresses the famous lena-image (call it Lena-compress):

if (image is lena) {
write a one-bit;
} else {
write a zero bit;
write the remaining image;
}

*THIS IS NOT AN ART*. And not the kind of problem a file compressor is
supposed to solve. I want to compress an "interesting subset of files"
to a given size, not a single file. It also doesn't disprove the
counting argument since the majority of files is *expanded*.

b) If you have a compressor (independent of a file), and I feed this
compressor by all possible sequences, what do I get? The counting
arguments *proves* that I cannot compress *all* of them. And that even
holds if you have several compressors and decide to select smartly
between them - the reason is that the decompressor needs to know which
compressor you choose.

Thus, any claim that any file can be compressed to 8K is utter pure
nonsense, making a fool of the OP, or trolling. (And pretty much
entertaining, maybe, which is the only reason why I'm answering this.)

So long,
Thomas

Thomas Richter

unread,

Aug 18, 2003, 4:43:58 PM8/18/03

to

Guest wrote:

> thats exactly what im doing right now.
> the issue is that in SOME files i can do that gaining an average of 1 bit
> each 128 bits ,
> but in most other cases the target comes bigger than the source.
>
> too bad , but im studying how to make it work at least for .avi files
> which are the most promising.

A good compressor requires a good model of the files it should compress.
A lot of work is going on in Video Compression you shouldn't
underestimate. Developing the right models requires a good mathematical
background.

So long,
Thomas

Guest

unread,

Aug 18, 2003, 5:11:27 PM8/18/03

to

you think with DOGMAs.

i think more freely.

what i wanna achieve is a generic model , not a specific model for .avi
files.
(yet i wouldnt dislike it).

im not interested either in lossy video compression , my target is lossless
compression of random data.
i would accept even a few % over a .zip , .rar , or .ace file , it would be
enough to
make a commercial product.
(i'm sorry Thomas , but im not rich like you , and if someone makes a good
job it's right
that he got big bucks for it).

"Thomas Richter" <th...@math.tu-berlin.de> wrote in message
news:3F413A8E...@math.tu-berlin.de...

Guest

unread,

Aug 18, 2003, 5:19:50 PM8/18/03

to

first of all , im not the troll who started the thread , im just a reader
who occasionally reads this NG.

why bytes && patterns shouldnt touch the arguments ?
it's the KEY ! bytes are 8 bits , a pattern could be even 2 bits or
a bit-mask , there's no limit on the "dictionary" unlike with zip.
zip is infact limited by many factors , but you guys keep claiming
there's NO way to improve it even by 0.000001%.

i believe zip sucks and im gonna prove it soon if my idea is right.

i know also about Lena , thanks.

counting arg : i said already you're right , but have you ever counted
the patterns in a REAL file ?
lets take bytes (0-255) in a zip file , havent you noticed some
incongruence in their distribution ? :)

what you say is right only in the theoretical form , but due the fact
that a zipped file isnt linked to what's written in your books ,
there IS undoubtely the possibility to achieve some VERY small
gains in SOME file types.

"Thomas Richter" <th...@math.tu-berlin.de> wrote in message

news:3F413A0B...@math.tu-berlin.de...

> Guest wrote:
> > ZIP seeks byte sequences.
> > ZIP uses BYTES and dont care about PATTERNS.
> That doesn't touch the argument at all.

> a) If you have a specific file, it is easy to write a compressor that
> compresses *this* file to any size you want, even to a single bit, and
> that expands all other files by just a bit. Here's the algorithm that
> compresses the famous lena-image (call it Lena-compress):
>

Willem

unread,

Aug 18, 2003, 8:19:19 PM8/18/03

to

Guest wrote:
) zip is infact limited by many factors , but you guys keep claiming
) there's NO way to improve it even by 0.000001%.

No we don't. As a matter of fact, zip files contain headers which can
quite easily be compressed. And zip isn't one of the best compressors
around when it comes to compression factor.

You may have noticed some posts which compare different compressors.

) i believe zip sucks and im gonna prove it soon if my idea is right.

We all know zip sucks. But you should consider that the best/easiest way
to compress a zip file is to first unzip it and then recompress the files
with a better algorithm.

) counting arg : i said already you're right , but have you ever counted
) the patterns in a REAL file ?
) lets take bytes (0-255) in a zip file , havent you noticed some
) incongruence in their distribution ? :)

Well, in the output of a *good* compressor, the distribution should be
random. As said, zip compression sucks.

) what you say is right only in the theoretical form , but due the fact

) that a zipped file isnt linked to what's written in your books ,
) there IS undoubtely the possibility to achieve some VERY small
) gains in SOME file types.

If you compare how well zip performs in comparison to some other
compressors, you might notice that there is the possibility to get pretty
big gains on some file types.

By the way, I don't expect 'the ultimate compressor' to get much gain on
video streams (such as avi files) so any real compressor won't get much
gain either.

(The 'ultimate' compressor is the theoretical compressor that produces the
best self-extracting file, but it isn't possible to write it in practice,
due to the halting problem.)

Kelsey Bjarnason

unread,

Aug 18, 2003, 8:36:52 PM8/18/03

to

On Mon, 18 Aug 2003 13:39:12 +0100, Stuart Caie wrote:

> timo Makinen wrote:
>> ANYTHING DOWN TO 8K
>
> Steps to success:
> 1. Read the FAQ about impossible compression.
> 2. Deliberately post about impossible compression.
> 3. ???
> 4. Profit!
>
> Actually, this reminds me. Did anyone here hear about "hash zip"?
>
> The author really believed he could use a message-digest algorithm to turn
> each 256 bytes into just the 12-16 bytes of the digest plus a little
> overhead.

I still like my approach - use CRCs [1]. The rationale is thus:

1) A CRC is used to detect errors in a block of data
2) Therefore, the CRC is specific to that block of data
3) Therefore, the CRC can be used in place of that block of data

Given this, we can use, say, 32-bit CRCs to represent, say, 256 byte (2048
bit) blocks of data.

Decompression is trivial, albeit a bit slow: simply generate sequences of
bytes ( 0000, 0001, 0002...) computing the CRC at each step, until the CRC
matches the CRC stored in the compressed file - then write out the
recovered data.

Since we're using 32 bits to represent 2048, our compressed file is only
about 1.5% of the original - regardless of the contents of the original.

Voila. Instant magic compression.

[1]: Clue for the humor-impaired. Read the last line. Again.

timo Makinen

unread,

Aug 18, 2003, 10:27:56 PM8/18/03

to

Sigh

"Marco Schmidt" <marcos...@geocities.com> wrote in message

news:oeq1kv8hsunhgtfg9...@4ax.com...

timo Makinen

unread,

Aug 18, 2003, 10:28:20 PM8/18/03

to

har har har
"Mark Adler" <mad...@alumni.caltech.edu> wrote in message
news:7ab39013.03081...@posting.google.com...

timo Makinen

unread,

Aug 18, 2003, 10:34:16 PM8/18/03

to

"Marco Schmidt" <marcos...@geocities.com> wrote in message

news:oeq1kv8hsunhgtfg9...@4ax.com...

Ray

unread,

Aug 19, 2003, 4:26:46 AM8/19/03

to

It is indeed possible. Just that it is not available to us yet
this is the technology like optical computer nanotium V. Timo, would u
kindly please keep it to urself this trade secret. we need to release
lousy compression program to fund our previous research eg. earn
money... dun broke our fishing rod !

Please !

Petition Sign
-Winzip( I have 3 wife and 12 children )
-Winrar( My wife is compressed too much )
-Winarj( I dun believe the above 2... )

"Guest" <bou...@localhost.com> wrote in message news:<Tz00b.96725$cl3.3...@news2.tin.it>...
> could you post at least a screenshot ?

>
>
> "fuse" <wufe...@163.net> wrote in message
> news:bhpuhu$601$1...@mail.cn99.com...
> > impossible!!!
> >
> >

> > "timo Makinen" <timokm...@hotmail.com> wrote in message
> > news:cOZ%a.189733$4UE....@news01.bloor.is.net.cable.rogers.com...
> > > I am able to compress any data to more then 1/2 of its size as long as
> the
> > > data is larger then 8k. I can recompress compressed files down so
> anything
> > > is down to 8k.
> > >
> > > ANYTHING DOWN TO 8K
> > >
> > > however there is one catch, no its not lossy but its slow !

> > > 1200 Duron compresses 800k (random bin file) down to 100k in over a
> week.
> > > The decompression is somewhat faster only 49 hours or so.
> > >
> > > Faster CPU's Please !
> > >

> > > Addressing the naysayers.

> > >
> > > Yes i understand the traditional line of thought in that u cant
> squeeze

> > > data in that small space because its not mathematically possible. I am
> aware
> > > of such arguments. However that argument is only valid when programming

> > > under the environment that places the constraints in the first place
> with
> is

> > > traditional math applied compression.

Thomas Richter

unread,

Aug 19, 2003, 4:40:58 AM8/19/03

to

Hi,

> first of all , im not the troll who started the thread , im just a reader
> who occasionally reads this NG.

Hard to tell. Some people use "multiple personalities". (-;

> why bytes && patterns shouldnt touch the arguments ?

Because the argument is about mappings from bit sequences to
bit sequences. Any input file is a bit sequence, and any output is also
a bit sequence. Whether you look for patterns in the bit sequence to
archive compression remains unsaid, and is irrelevant for the argument.
All the counting arguments says is:

A mapping from an N element set to an M element set with N > M is not
reversible.

*How* that mapping looks like is irrelevant. Here, N = 2^input bits, and
M = 2^output bits, enumerating all possible input and output files.

In fact, whether you talk about bits is also pretty irrelevant; you
can also talk about symbols if you prefer. Larger files allow still
more possibilities.

> it's the KEY ! bytes are 8 bits , a pattern could be even 2 bits or
> a bit-mask , there's no limit on the "dictionary" unlike with zip.

The counting argument is not about bits. It is about the number of
possible input configurations, and the number of output configurations.
*How* you count them does not matter; there are always 256^n possible files
of byte-length n, or 2^m files of bit-length m. What is important is that
this number is smaller if the file gets smaller, leaving you less freedom
in the possible files. Thus, an "always compress" map with a possible
lossless expander is a map that maps a larger set into a smaller
*reversible* (since that's your decompressor) and that's just not
possible.

> zip is infact limited by many factors , but you guys keep claiming
> there's NO way to improve it even by 0.000001%.

No, exactly *not*. ZIP can be improved by quite a lot of methods.
However, there's one thing ZIP cannot do, and any other compression
mechanism cannot do either:

Compress all files to 8K size, as claimed by the OP.

This is impossible.

> i believe zip sucks and im gonna prove it soon if my idea is right.

There's nothing to prove about. I'd rather call that a fact. (-;

> counting arg : i said already you're right , but have you ever counted
> the patterns in a REAL file ?
> lets take bytes (0-255) in a zip file , havent you noticed some
> incongruence in their distribution ? :)

Sure, so? You can get better compression than ZIP for these files,
no questions asked. In your language, "REAL file" means:

The subset of possible files that we really care about.

And these are just quite a few, actually a minority, of all possible
files. In fact, if you compute the probability of a monkey typing
a Shakespear novel on a typewriter, you'll find that this probability
is so extremly low that it is unlikely to happen even if we had monkeys
typing since the beginning of the universe. This explains why file
compressors are possible: All *other* (uninteresting) files get expanded
at the cost of compressing the ones we really care.

> what you say is right only in the theoretical form , but due the fact
> that a zipped file isnt linked to what's written in your books ,
> there IS undoubtely the possibility to achieve some VERY small
> gains in SOME file types.

I never claimed the opposide. I just claimed that you cannot have
a compressor that compresses all files to 8K.

So long,
Thomas

Thomas Richter

unread,

Aug 19, 2003, 5:00:37 AM8/19/03

to

Hi,

> you think with DOGMAs.

No.

> what i wanna achieve is a generic model , not a specific model for .avi
> files.
> (yet i wouldnt dislike it).

I afraid then you won't arrive at a competing algorithm; this is not
my dogma, please read on.

Hint: (Just think about this, no need to comment)

In a still image, it is likely that the neighbouring pixels have similar
colors than the pixel in the center. If we represent the pixel grey-
scales in numbers, it is therefore likely that we find similar numbers
in the neighbourhood of the center.

In english text, it is likely that we find an 'u' in the neighbourhood
of a 'q', and it is more likely that we find it behind rather than before.
If we represent characters by ASCII code, it is less likely to find
'similar' codes near the code of the 'q' since combinations like 'qr' or
'qp' are rather unlikely. This means that, on this level, english text
looks rather different than a still image.

Now, don't you think a "good compressor"(tm) should take some advantage of
this? Wouldn't it be a good idea how take advantage of right this?

Clearly, you can write your compressor in a way that it finds this
out itself, but remember that you've to tell the decompressor about
the choice you've made, thus you need to tell it that choice, thus
you need to enlarge your output, even if by a single bit.

In other words, a compressor for a specific file class invests some
pre-knowledge of the files it is going to compress, and is able to remove
some bits from the original stream, at the price of making the decompressor
larger. Thus, 'knowledge about the file type enlarges the size of the
compressor by making this knowledge a 'static' part of the algorithm'.

If you think about this picture, one could rather say that a good compressor
removes 'common bits' from the input class it should compress and puts these
bits - in the form of an algorithm - into the decompressor. Thus, by supplying
a decompressor, you supply part of the knowledge how 'typical' streams should
look like, and the more you specialize on what is typical, the more bits can
be removed from the stream, and put into the decompressor/compressor pair.

Back to the Lena compressor: Here, in this extreme example, you put the
knowledge 'we have lena' into the compressor, at the benefit of compressing
the file down to one bit, and put all these image bits into the decompressor,
which, by reading the single bit, writes lena back to disk. It isn't always
that obvious, of course.

That's also the reason why we have so many 'image compression' algorithms,
rather than using ZIP or MySuperCompressionApp on these files. (-; They
just know about more the data they want to compress and can do better for
this reason. Not because there are some computer nerds behind them which
are highly overpaid. (-;

So long,
Thomas

Joe

unread,

Aug 19, 2003, 5:16:01 AM8/19/03

to

timo Makinen <timokm...@hotmail.com> wrote in message
news:cOZ%a.189733$4UE....@news01.bloor.is.net.cable.rogers.com...
> I am able to compress any data to more then 1/2 of its size as long as the
> data is larger then 8k. I can recompress compressed files down so anything
> is down to 8k.
>
> ANYTHING DOWN TO 8K
>
> however there is one catch, no its not lossy but its slow !
> 1200 Duron compresses 800k (random bin file) down to 100k in over a week.
> The decompression is somewhat faster only 49 hours or so.
>
> Faster CPU's Please !
>
> Addressing the naysayers.
>
> Yes i understand the traditional line of thought in that u cant
squeeze
> data in that small space because its not mathematically possible. I am
aware
> of such arguments. However that argument is only valid when programming
> under the environment that places the constraints in the first place with
is
> traditional math applied compression.
> Which is not the case here. I am getting legal protection for the code
and
> then i can discuss it in detail.
>
>
>
>
>

Let me guess: It's floating point math? You remind me of the people who
say that all CPUs must be 'trusted', yet they fail to recognize, that that's
not a CPU, but a VCR... Without math, you can't compress, since it's all
_numbers_. Duh!

Joe

unread,

Aug 19, 2003, 5:19:40 AM8/19/03

to

Willem <n...@sp.am.invalid> wrote in message
news:slrnbk1d...@toad.stack.nl...
> timo wrote:
> ) ...
> )
> ) ANYTHING DOWN TO 8K
> )
> ) ...
>
> People, please don't feed the trolls.

>
>
> SaSW, Willem (at stack dot nl)
> --
> Disclaimer: I am in no way responsible for any of the statements
> made in the above text. For all I know I might be
> drugged or something..
> No I'm not paranoid. You all think I'm paranoid, don't you !
> #EOT

But I get bored otherwise.
):

I like to keep some pet dragons too. heh

Joe

unread,

Aug 19, 2003, 5:21:27 AM8/19/03

to

timo Makinen <timokm...@hotmail.com> wrote in message

news:Du50b.195529$4UE.1...@news01.bloor.is.net.cable.rogers.com...

> However that argument is only valid when programming
> >under the environment that places the constraints in the first place with
> is
> >traditional math applied compression

Are you related to Smart1234?
Of course it's only valid when programming, since by definition, you need a
program or procedure, to make the compression reversable.

Joe

unread,

Aug 19, 2003, 5:23:12 AM8/19/03

to

Stuart Caie <ky...@4u.net> wrote in message
news:3f40c92d$0$960$cc9e...@news.dial.pipex.com...

> timo Makinen wrote:
> > ANYTHING DOWN TO 8K
>
> Steps to success:
> 1. Read the FAQ about impossible compression.
> 2. Deliberately post about impossible compression.
> 3. ???
> 4. Profit!
>
> Actually, this reminds me. Did anyone here hear about "hash zip"?
>
> The author really believed he could use a message-digest algorithm to turn
> each 256 bytes into just the 12-16 bytes of the digest plus a little

> overhead. Obviously, this is possible. However, he then believed that the
MD
> could be "restored" by iterating over all permutations of the 256 bytes
> until he found a matching MD. He thought MD was some kind of compression,
> and didn't realise that one 16 byte MD corresponds to 2^(256-16) possible
> 256 byte blocks on average and even to iterate which one was the correct
> expansion would end up taking more than 256 bytes on average.
>
> http://freedaemonconsulting.com/tech/hzip.php
>
> Regards
> Stuart
>

I wonder how many english text files would be in that 2^240 combination
range, for a 256 byte file. heh

Joe

unread,

Aug 19, 2003, 5:41:13 AM8/19/03

to

Dale King <Ki...@tmicha.net> wrote in message news:3f40...@news.tce.com...
> At the risk of feeding the trolls, but just in case we can stop another
> person falling for such impossibilities.

>
> "timo Makinen" <timokm...@hotmail.com> wrote in message

> news:cOZ%a.189733$4UE....@news01.bloor.is.net.cable.rogers.com...
> > I am able to compress any data to more then 1/2 of its size as long as
the
> > data is larger then 8k. I can recompress compressed files down so
anything
> > is down to 8k.
> >
> > ANYTHING DOWN TO 8K
> >
> > however there is one catch, no its not lossy but its slow !
>

> No, by definition, it must be lossy. You have 256^8096 files that are 8K
> bytes. You have an infinite number of inputs that are larger than 8K
bytes.
> A reversible mapping from an infinite number of things to only 256^8096
> things is a logical impossibility.
LOGIC?! WHO NEEDS LOGIC?! hehe Not timo, cause he's been eating 'shrooms,
and smoking crack all his life.

>
> > 1200 Duron compresses 800k (random bin file) down to 100k in over a
week.
> > The decompression is somewhat faster only 49 hours or so.
> >
> > Faster CPU's Please !
>

> Which won't help you defy simple logic.
Politician: Hey, our <impossible> project needs more $$$$$. Please give us
more tax money!
Techie: <mumbling> Grr, at least I get a pay check out of it.
Techie: <aloud> Yes, help the economy by building new science and
infrastructure.
Actor: It's greaaaaaat!

>
> > Yes i understand the traditional line of thought in that u cant
> squeeze
>

> It isn't a question of tradition, it is a question of simple logic. If you
> have n+1 pigeons and n holes you cannot put every pigeon into a hole and
> still have only one pigeon per hole. Your claim is that you can put an
> infinite number of pigeons in only 256^8096 holes. That's a lot of holes,
> but you still have more pigeons than holes.

>
> > data in that small space because its not mathematically possible.
>

> It is important to not focus on squeezing or on the size of the output.
The
> issue is not so much the size of the output as the number of outputs there
> are vs. the number of inputs. A good analogy is the assigning of telephone
> numbers. Compression is like the process of assigning telephone numbers
and
> decompression is the process of making a telephone call and being
connected
> with the right person.
>
> Consider if we want to assign everyone a 4 digit telephone number. Each
> person must have a unique telephone number. If we give two people the same
> telephone number, then which of the two do we get when we enter the phone
> number? Giving two people the same telephone number corresponds to lossy
> compression. There are only 10,000 4-digit telephone numbers. If we have a
> million people there is no way to assign a unique 4-digit telephone number
> to each of them.
>
> So extending this analogy, you claim that you can assign 256^8096 numbers
to
> an infinite number of people without assigning the same number to any two
> people. That is a logical impossibility.

>
> > I am aware of such arguments.
>

> But obviously don't understand them (or else you are an idiot troll).

>
> > However that argument is only valid when programming
> > under the environment that places the constraints in the first place
with
> is

> > traditional math applied compression.
>
> The argument is always valid. If you have more pigeons than holes you
can't
> put all the pigeons in different holes. There is no way you can uniquely
> assign a 4-digit phone number to one million people. Please point out any
> environment where that is not valie. Doesn't matter what environment you
are
> talking about. That is a basic fact.
>
> The same logic applies to your claim that you can uniqely assign 256^8096
> compressed files to an infinite number of inputs. It just is not possible.

>
> > Which is not the case here. I am getting legal protection for the code
> and
> > then i can discuss it in detail.
>

> Don't bother.
>
> --
> Dale King
>
>

FROM: fsdf...@ssff.com
SUBJECT: Achieve compression rates TWO TO ONE!!!!! (notice more then one
'!')
Hey, I found a miracle method to compress files.
Since I'm such a nice fellow, I'll even show how it works.

open #1, "filename.1", readonly, binary
loadfile #1, C
close #1
while len(C >1)
splitandpad (C,firsthalf,secondhalf)
C=firsthalf && secondhalf
end while
open #1, "filename.2", create, binary
savefile #1
close #1

Nice psuedocode, eh? ^_^

Of course I've patented splitandpad, loadfile, and oddly not savefile. I
think that && can be patented, in the use of a compressor, since I doubt
anyone's done it before.

Joe

unread,

Aug 19, 2003, 5:56:23 AM8/19/03

to

Guest <bou...@localhost.com> wrote in message

news:Nma0b.103611$cl3.3...@news2.tin.it...

Hint: If you reduced the keyframes, the file would be smaller, but then it
would be unseekable. ):
AVIs are a streamed format(stored in segments), so of course you can always
improve them, but at the expense of CPU time/RAM/quality/etc.

Dale King

unread,

Aug 19, 2003, 1:26:35 PM8/19/03

to

"Thomas Richter" <th...@cleopatra.math.tu-berlin.de> wrote in message
news:bhqv95$e2n$1...@mamenchi.zrz.TU-Berlin.DE...
> Hi,
>
> > Dave , i dont want to feed the troll either , but
> > using your logic , even an uncompressed .exe file
> > couldnt be compressed due the fact its a random data file !

I mentioned nothing about randomness. I explained it as a simple mapping
from the set of inputs to the set of outputs of the compressor. The set of
outputs is smaller than the set of inputs means the mapping is not
reversible for all inputs and the compressor is therefore lossy.

> Dave's logic is absolutely correct.

It's Dale, not Dave, but I get that a lot and am used to it.

> You just need to read very
> carefully what he's written. He did *not* say that it is impossible
> to compress files. He just said that it is impossible to compress
> *every possible* file, which is quite some difference.

To be more precise, what I said was that it was impossible to compress every
file losslessly. It is possible to compress every file in a lossy manner. I
can compress every file to 8K, but it must be lossy as there are more inputs
than outputs. Therefore more than one input must get mapped to the same
output.

--
Dale King

Dale King

unread,

Aug 19, 2003, 1:32:33 PM8/19/03

to

"Mark Adler" <mad...@alumni.caltech.edu> wrote in message
news:7ab39013.03081...@posting.google.com...

Actually, he did. He said,

"ANYTHING DOWN TO 8K

however there is one catch, no its not lossy but its slow !"

He said it is not lossy, which means that it can be decompressed to the
original.

That of course is an impossibility as I have already shown.

--
Dale King

Petrut

unread,

Aug 20, 2003, 2:18:28 AM8/20/03

to

> Are you related to Smart1234?
> Of course it's only valid when programming, since by definition, you need
a
> program or procedure, to make the compression reversable.
>

what is smart1234 ?

Keith Thompson

unread,

Aug 20, 2003, 2:17:02 AM8/20/03

to

"timo Makinen" <timokm...@hotmail.com> writes:
> Sigh

Thanks, I just did.

--
Keith Thompson (The_Other_Keith) k...@cts.com <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"

Dirk Stoecker

unread,

Aug 20, 2003, 2:10:12 AM8/20/03

to

On Mon, 18 Aug 2003, Thomas Richter wrote:

> a) If you have a specific file, it is easy to write a compressor that
> compresses *this* file to any size you want, even to a single bit, and
> that expands all other files by just a bit. Here's the algorithm that
> compresses the famous lena-image (call it Lena-compress):
>
> if (image is lena) {
> write a one-bit;
> } else {
> write a zero bit;
> write the remaining image;
> }

IDEA! We may write a compressor which will be very good in benchmarks, as
it can compress everything relevant to 8 bit. This means we have 256
files, which we can compress (e.g. calgary corps, canterbury corpus). All
the others are expanded by 8 byte :-)

Advantages:
- really fast
- compression ratio excellent in test-file based benchmarks.
- very reliable (there can't be really that much bugs in the algorithm)

Disadvantage:
- Big executable size (which does not count that much nowadays).

And now the ultimate optimization:
We compress to 0 bytes and move the bits needed for compression into the
file name :-) When appending the extension ._xx, where xx are the letters
a to z and numbers 0 to 9, we have 1295 different files, which have zero
size and 1 marker is for uncompressed files :-)

Should be unbeatable in benchmarks!

Or has this already be done?

Ciao
--
____ _ _ ____ _ _ _ _ ____
| | | | | | \ / | | | the cool Gremlin from Bischofswerda
| __ | ____| | \/ | | | WWW: http://www.dstoecker.de/
| | | | | | | | PGP key available on www page.
|____| _|_ |____| _|_ _|_ |____| I hope AMIGA never stops making fun!

Thomas Richter

unread,

Aug 20, 2003, 4:26:31 AM8/20/03

to

Hi Dirk,

> IDEA! We may write a compressor which will be very good in benchmarks, as
> it can compress everything relevant to 8 bit. This means we have 256
> files, which we can compress (e.g. calgary corps, canterbury corpus). All
> the others are expanded by 8 byte :-)

/* snip */

> Or has this already be done?

Well, to some degree. (-; Lena-compress was kind of a 'running gag' in
the jpeg2000 standardization. No, it was not really considered to be part
of the standard.

So long,
Thomas

Joe

unread,

Aug 20, 2003, 7:04:34 AM8/20/03

to

Petrut <pet...@wanadoo.fr> wrote in message
news:bhv2ga$q51$1...@news-reader1.wanadoo.fr...

alt.sci.physics.new-theories ----> The guy who probally started that group,
just to torture people.

Keith Thompson

unread,

Aug 20, 2003, 2:11:37 PM8/20/03

to

"timo Makinen" <timokm...@hotmail.com> writes:
> I am able to compress any data to more then 1/2 of its size as long as the
> data is larger then 8k.

I can compress any data to more than 1/2 its size. Much more. (A
factor of 1 is pretty easy.)

> I can recompress compressed files down so anything is down to 8k.
>
> ANYTHING DOWN TO 8K

(I presume "8K" means 8192 8-bit bytes.)

Hmm. If it depends on repeated recompression, there's some extra
information in the number of iterations. If you have an 8K compressed
file, and it takes on the order of 2**(8*8K) iterated decompression
operations to recover the original 16K, that's just recovering
16K of output from 16K of input.

Somehow, though, I doubt that that's what timo has in mind.

Sander Vesik

unread,

Sep 1, 2003, 5:48:27 PM9/1/03

to

In comp.compression timo Makinen <timokm...@hotmail.com> wrote:
>
> Yes i understand the traditional line of thought in that u cant squeeze

> data in that small space because its not mathematically possible. I am aware
> of such arguments. However that argument is only valid when programming

> under the environment that places the constraints in the first place with is
> traditional math applied compression.

Umm... So to use the compression you go outside reality, do it and then ship
the bits back? Thats a rather nice trick 8-)

> Which is not the case here. I am getting legal protection for the code and
> then i can discuss it in detail.
>

--
Sander

+++ Out of cheese error +++

Dale King

unread,

Sep 2, 2003, 7:35:47 PM9/2/03

to

"Sander Vesik" <san...@haldjas.folklore.ee> wrote in message
news:10624529...@haldjas.folklore.ee...

> In comp.compression timo Makinen <timokm...@hotmail.com> wrote:
> >
> > Yes i understand the traditional line of thought in that u cant
squeeze
> > data in that small space because its not mathematically possible. I am
aware
> > of such arguments. However that argument is only valid when programming
> > under the environment that places the constraints in the first place
with is
> > traditional math applied compression.
>
> Umm... So to use the compression you go outside reality, do it and then
ship
> the bits back? Thats a rather nice trick 8-)

But that still doesn't help you as the bits exist in this reality and if you
compress two things and they produce the same compressed output it is lossy.
The only way it will work is if you create extra information in the other
reality and you don't ship it back, but is somehow carried around with the
bits from this reality.

--
Dale King

tal...@nospam.wp.pl

unread,

Sep 3, 2003, 11:46:30 AM9/3/03

to

> I am able to compress any data to more then 1/2 of its size as long as the

> data is larger then 8k. I can recompress compressed files down so anything
> is down to 8k.
LOL!!!

> ANYTHING DOWN TO 8K
OK. Let's say it really worx. But if this is really such a great application
than make a special version available.
Make a version than can compress and decompress only 1 MB files. Place in
the internet. Ppl will check how great it is.
But you know what? That's impossible!!!
The important fact is not that this is not impossible today, but this won't
be impossible ever!

Be serious!

Cya,
Talthen

Dr Chaos

unread,

Sep 3, 2003, 2:52:37 PM9/3/03

to

The whole thing comes down to whether it is better to first transmit
"N", the number of symbols and then the data, or to include an EOF
signal in-line.

There are costs to doing both.

Imagine there is a probability model for the data itself
p(s|past) available at any point. The data

First case: transmitting N ahead of time.

The cost for the data is L_data = sum(i=1..N) -log p(s_i | past) plus
the cost to transmit N. The Elias delta code can provide a code for
natural integers in less than L(N) <= log N + 2 log log (2 N ).

L = L_data + log N + 2 log log (2N)

Second case: transmititng an EOF signal in-band.

Here, we will have to model the probability of an EOF signal.
The common choice is p(EOF) = 1/(i+1) with i the number of symbols
processed. Thus at any step the codelength for the actual signal
is -log [ (1-p(EOF))*p(s_i | past) ] (since an EOF didn't occur), and then
at the end, one EOF, -log [ P(EOF) ]

this is L = L_data + \sum_(i=1..N) (- log( i/i+1) ) + log (N+1)
= L_data + 2*log(N+1)

Conclusion: the length information costs

Lelias = log N + 2 log log (2N) for pre-transmitting the length.

Leof = 2 log(N+1) for an inband EOF.

(all logarithms are base 2)

For N >= 37, the Elias code wins, providing a shorter codelength.

So, at least with this EOF model, there is an advantage to
pre-transmitting the length for files longer than 37 bits.

Note that you can make an EOF marker cost of

1 + min( Lelias, Leof )

by pre-transmitting one bit saying which scheme you will be using.

The difference between Lelias and Leof is small, being less than 6
bits for N=10000.

PS: That Elias codelength is an upper bound, more specifically

L(elias) = 1 + floor(log(N)) + 2*floor(log(1+floor(log(N))))
counts the integral concrete bits.

Dale King

unread,

Sep 4, 2003, 11:27:37 AM9/4/03

to

"Dr Chaos" <mbkennelS...@NOSPAMyahoo.com> wrote in message
news:slrnblce3l.9re.m...@lyapunov.ucsd.edu...

>
> The whole thing comes down to whether it is better to first transmit
> "N", the number of symbols and then the data, or to include an EOF
> signal in-line.
>
> There are costs to doing both.
>
> Imagine there is a probability model for the data itself
> p(s|past) available at any point. The data
>
> First case: transmitting N ahead of time.
>
> The cost for the data is L_data = sum(i=1..N) -log p(s_i | past) plus
> the cost to transmit N. The Elias delta code can provide a code for
> natural integers in less than L(N) <= log N + 2 log log (2 N ).

I'm not sure where you got this limit as I can't find any references with
this limit and it doesn't quite seem to agree with reality. The actual
length of an Elias delta code (which you noted later) is:

floor[lg x] + 2 floor[lg (1 + floor[lg x] ) ] + 1

And your formula is sometimes less than this. I think you need to add some
floor or ceilings to make it correct.

> L = L_data + log N + 2 log log (2N)
>
>
> Second case: transmititng an EOF signal in-band.
>
> Here, we will have to model the probability of an EOF signal.
> The common choice is p(EOF) = 1/(i+1) with i the number of symbols
> processed. Thus at any step the codelength for the actual signal
> is -log [ (1-p(EOF))*p(s_i | past) ] (since an EOF didn't occur), and
then
> at the end, one EOF, -log [ P(EOF) ]
>
>
> this is L = L_data + \sum_(i=1..N) (- log( i/i+1) ) + log (N+1)
> = L_data + 2*log(N+1)
>
>
> Conclusion: the length information costs
>
> Lelias = log N + 2 log log (2N) for pre-transmitting the length.
>
> Leof = 2 log(N+1) for an inband EOF.
>
> (all logarithms are base 2)
>
> For N >= 37, the Elias code wins, providing a shorter codelength.
>
> So, at least with this EOF model, there is an advantage to
> pre-transmitting the length for files longer than 37 bits.
>
> Note that you can make an EOF marker cost of
>
> 1 + min( Lelias, Leof )
>
> by pre-transmitting one bit saying which scheme you will be using.
>
>
> The difference between Lelias and Leof is small, being less than 6
> bits for N=10000.

I certainly agree that for larger N the Elias code takes less bits than
encoding the EOF as a symbol. But one cannot say that this means it is
"better". You cannot say one is better than another without making some
assumption of what the probability distribution is for the inputs. If the
probability distribution function is heavily weighted to small values of N
then the EOF symbol wins.

The key thing is that you can only say that one code is better for some set
of probability distribution functions. You cannot say one code is
universally better than the other.

For example, let's compare the Elias delta code with a unary coding of an
integer that encodes an integer N as N-1 zeroes followed by a 1, thus
L(unary) = N. L(unary) < L(elias) only for N <= 4. Is Elias better than
Unary? It depends on what the probability distribution is for the value of
N. If all values of N are equally likely, then yes it is.

But let's consider if the value of N we were encoding was the number of coin
flips we made until it come up heads. There p( N = x ) = 2^-x. The p( N <=
x ) = 1 - 2^-x. Here the unary code is the optimal code.
--
Dale King

Dr Chaos

unread,

Sep 4, 2003, 9:44:11 PM9/4/03

to

On Thu, 4 Sep 2003 10:27:37 -0500, Dale King <Ki...@tmicha.net> wrote:
> "Dr Chaos" <mbkennelS...@NOSPAMyahoo.com> wrote in message
> news:slrnblce3l.9re.m...@lyapunov.ucsd.edu...
>>
>> The whole thing comes down to whether it is better to first transmit
>> "N", the number of symbols and then the data, or to include an EOF
>> signal in-line.
>>
>> There are costs to doing both.
>>
>> Imagine there is a probability model for the data itself
>> p(s|past) available at any point. The data
>>
>> First case: transmitting N ahead of time.
>>
>> The cost for the data is L_data = sum(i=1..N) -log p(s_i | past) plus
>> the cost to transmit N. The Elias delta code can provide a code for
>> natural integers in less than L(N) <= log N + 2 log log (2 N ).
>
> I'm not sure where you got this limit as I can't find any references with
> this limit and it doesn't quite seem to agree with reality. The actual
> length of an Elias delta code (which you noted later) is:
>
> floor[lg x] + 2 floor[lg (1 + floor[lg x] ) ] + 1
>
> And your formula is sometimes less than this. I think you need to add some
> floor or ceilings to make it correct.

Add one bit to the formula.

L(N) <= 1+ log N + 2 log log (2 N ).

for certain needs I wanted a formula continuous in N.

> I certainly agree that for larger N the Elias code takes less bits than
> encoding the EOF as a symbol. But one cannot say that this means it is
> "better". You cannot say one is better than another without making some
> assumption of what the probability distribution is for the inputs. If the
> probability distribution function is heavily weighted to small values of N
> then the EOF symbol wins.

Right. Less than 37.

> The key thing is that you can only say that one code is better for some set
> of probability distribution functions. You cannot say one code is
> universally better than the other.

> For example, let's compare the Elias delta code with a unary coding of an
> integer that encodes an integer N as N-1 zeroes followed by a 1, thus
> L(unary) = N. L(unary) < L(elias) only for N <= 4. Is Elias better than
> Unary? It depends on what the probability distribution is for the value of
> N. If all values of N are equally likely, then yes it is.
>
> But let's consider if the value of N we were encoding was the number of coin
> flips we made until it come up heads. There p( N = x ) = 2^-x. The p( N <=
> x ) = 1 - 2^-x. Here the unary code is the optimal code.

Of course.

I would guess that in practice, the sizes of streams which need to be
compressed follow a mysterious power law distribution, like so many in
Nature.

In fact if it were P(N) ~ 1/N then you can't even normalize that to
make it a probability density function and thus the optimal prior for
coding of integers.

Peter Bozz

unread,

Sep 5, 2003, 3:55:12 AM9/5/03

to

Dude, I love your comebacks.

Anyway, you said IBM already had a patent for this ingenious lossless
compression alogrithm. How are you gonna patent it without being sued
into oblivion for stealing their trade secrets?
Darn, this way we'll never be able to see the code!

kelseyb

unread,

Oct 22, 2003, 1:35:16 AM10/22/03

to

In article <cOZ%a.189733$4UE.69652
@news01.bloor.is.net.cable.rogers.com>, timokm...@hotmail.com says...

> I am able to compress any data to more then 1/2 of its size as long as the
> data is larger then 8k. I can recompress compressed files down so anything
> is down to 8k.
>

> ANYTHING DOWN TO 8K
>
> however there is one catch, no its not lossy but its slow !

> 1200 Duron compresses 800k (random bin file) down to 100k in over a week.
> The decompression is somewhat faster only 49 hours or so.
>
> Faster CPU's Please !
>

> Addressing the naysayers.

>
> Yes i understand the traditional line of thought in that u cant squeeze
> data in that small space because its not mathematically possible.

Wrong. It's actually _trivial_ to compress data to 8K. What isn't
quite so trivial is to do it losslessly, for more than a handful of
(likely very useless) files.

> I am aware
> of such arguments. However that argument is only valid when programming
> under the environment that places the constraints in the first place with is
> traditional math applied compression.

I.e. anything other than magic. Yes, and?

compre$ion

unread,

Dec 7, 2003, 4:03:10 PM12/7/03

to

Awesome explanation =)
I knew this to be true, but never seen such a great example.