I took the $5000 Goldman Challenge

Patrick Craig

unread,

Apr 16, 2001, 6:47:12 AM4/16/01

to

Read all about it:

http://www.geocities.com/patchnpuki/other/compression.htm

Patrick

--
Posted from IDENT:ro...@ns.nag-j.co.jp [211.0.18.242]
via Mailgate.ORG Server - http://www.Mailgate.ORG

Willem-Jan Monsuwe

unread,

Apr 16, 2001, 9:04:37 AM4/16/01

to

) Read all about it:
)
) http://www.geocities.com/patchnpuki/other/compression.htm
)
) Patrick

We warned Mike that his challenge wasn't worded carefully enough and that
something like this could happen. In fact your approach was rather timid.
There've been numerous ideas floating about the last week about how to meet
the challenge by using the loopholes in there.

I personally feel Mike shouldn't have put up such a loosely worded challenge
with a $5000 prize. As stated, the challenge is _not_ mathematically
impossible, as there are more (decompressor, data) tuples of total length N
than there are data files of length N. And allowing multiple data files
only increases that.

Mike, you fouled up. Be a man and admit it. Good form, Patrick!

SaSW,
--
Willem (at stack dot nl)
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Joshua Schmier

unread,

Apr 16, 2001, 10:32:58 AM4/16/01

to

I actually disagree, as I'm sure many will on this subject. Mike was going
by the guidelines within the FAQ. The FAQ clearly outlines that filesystem
exploits are not true compression. I think it may have been short-sighted of
Goldman to allow multiple 'compressed' files, however, having the challenge
worded so loosely makes things interesting. :)

And yes, very good show Patrick.

"Willem-Jan Monsuwe" <wil...@toad.stack.nl> wrote in message
news:slrn9dlrge...@toad.stack.nl...

s...@nospam.unt.edu

unread,

Apr 16, 2001, 12:28:15 PM4/16/01

to

Joshua Schmier <jsch...@bellsouth.net> wrote:

> I actually disagree, as I'm sure many will on this subject. Mike was going
> by the guidelines within the FAQ. The FAQ clearly outlines that filesystem
> exploits are not true compression. I think it may have been short-sighted of
> Goldman to allow multiple 'compressed' files, however, having the challenge
> worded so loosely makes things interesting. :)

I agree with this. To show compression, you need to show space
savings, which has not been done here. In fact, the disk space
required for the "compressed" data is substantially larger than the
disk space for the original data, since many more inodes/directory
entries are required. It all goes back to "hiding information" in
the file length, which has been discussed here over and over.

That said, as soon as I saw the request that multiple "compressed
files" be allowed, it was pretty obvious what game was going to be
played. Unfortunately, Mike didn't see it coming, and didn't object
to the "multiple files" requirement (in fact in an earlier posting
here, I pointed out that you should require the decompressor and data
to be bundled together in one file, and the size of that piece should
determine the "compressed size").

--
Steve Tate --- srt[At]cs.unt.edu | Gratuitously stolen quote:
Dept. of Computer Sciences | "The box said 'Requires Windows 95, NT,
University of North Texas | or better,' so I installed Linux."
Denton, TX 76201 |

Dale King

unread,

Apr 16, 2001, 12:46:54 PM4/16/01

to

"Willem-Jan Monsuwe" <wil...@toad.stack.nl> wrote in message
news:slrn9dlrge...@toad.stack.nl...

> ) Read all about it:
> )
> ) http://www.geocities.com/patchnpuki/other/compression.htm
> )
> ) Patrick
>
> We warned Mike that his challenge wasn't worded carefully enough and that
> something like this could happen. In fact your approach was rather timid.
> There've been numerous ideas floating about the last week about how to
meet
> the challenge by using the loopholes in there.
>
> I personally feel Mike shouldn't have put up such a loosely worded
challenge
> with a $5000 prize. As stated, the challenge is _not_ mathematically
> impossible, as there are more (decompressor, data) tuples of total length
N
> than there are data files of length N. And allowing multiple data files
> only increases that.
>
> Mike, you fouled up. Be a man and admit it. Good form, Patrick!

I'm afraid I have to agree with Mike here the wording from the challenge
was, "Last, the contestant will send me a decompressor and a compressed
file". He said *A* compressed file. 218 files is not the same as a file. I
agree that Mike's challenge is not impossible, but in this case the
contestant did not meet the challenge as specified.

I see where Patrick said, "The restated challenge is to reconstruct the
original file using a decompressor and several compressed files whose total
file size is less than the original file size." The problem is that Mike
never restated the challenge in such a way. Nowhere in the thread of emails
he posted did Mike allow for anything but *a* compressed file and a *a* file
for the decompressor. That is even being generous as he allows you to rely
on the file length information in the header.

This is one of the reasons that I object so vigorously to David Scott's
claims of increasing compression ratio by relying on the file header. It is
the same fallacy that Patrick tried to exploit. Following Patrick's logic I
can easily meet this challenge for about any file. Let's say I have an n bit
file. All I do is create n files that have no data in any of them. I store
one data bit for each file in the archive bit (assuming Windoze) and the
last file is marked by setting it's read only bit. According to Patrick's
logic the files consume zero space because they have no data in them. So the
only space I am taking up is for the program.
--
Dale King

SCOTT19U.ZIP_GUY

unread,

Apr 16, 2001, 12:46:03 PM4/16/01

to

s...@nospam.unt.edu wrote in <9bf6iv$5p5$1...@hermes.acs.unt.edu>:

>That said, as soon as I saw the request that multiple "compressed
>files" be allowed, it was pretty obvious what game was going to be
>played. Unfortunately, Mike didn't see it coming, and didn't object
>to the "multiple files" requirement (in fact in an earlier posting
>here, I pointed out that you should require the decompressor and data
>to be bundled together in one file, and the size of that piece should
>determine the "compressed size").
>

So what is the bottom line in your view. Besides the fact Mike was
stupid and so eager for the 100 dollars that he allowed the multiple
files. I agree that he should have required one program which internally
must contain all info so that it expands to one file. But he did not.
So in your view should he pay the 5g's or not??
The rules that both men agreed to was a series of files. The total
length being less that the random files. I think that means Mike lost.
Are people here going to vote. I would like to know what Gutman and
Matt have to say after they read the rules the two men agreed to in
there correspondse. Which explicitly allowed for many files.

I already expect King to say Mike won since it obvous to him that the
length of a file is not the length but some number added to the length.

David A. Scott
--
SCOTT19U.ZIP NOW AVAILABLE WORLD WIDE "OLD VERSIOM"
http://www.jim.com/jamesd/Kong/scott19u.zip
My website http://members.nbci.com/ecil/index.htm
My crypto code http://radiusnet.net/crypto/archive/scott/
MY Compression Page http://members.nbci.com/ecil/compress.htm
**NOTE FOR EMAIL drop the roman "five" ***

s...@nospam.unt.edu

unread,

Apr 16, 2001, 2:42:10 PM4/16/01

to

SCOTT19U.ZIP_GUY <david_...@emailv.com> wrote:
> s...@nospam.unt.edu wrote in <9bf6iv$5p5$1...@hermes.acs.unt.edu>:

>>That said, as soon as I saw the request that multiple "compressed
>>files" be allowed, it was pretty obvious what game was going to be
>>played. Unfortunately, Mike didn't see it coming, and didn't object
>>to the "multiple files" requirement (in fact in an earlier posting
>>here, I pointed out that you should require the decompressor and data
>>to be bundled together in one file, and the size of that piece should
>>determine the "compressed size").

> So what is the bottom line in your view. Besides the fact Mike was
> stupid and so eager for the 100 dollars that he allowed the multiple
> files. I agree that he should have required one program which internally
> must contain all info so that it expands to one file. But he did not.
> So in your view should he pay the 5g's or not??

That's not such an easy question -- the letter of Mike's challenge was
met (since he agreed to the multiple file modification), but clearly
the spirit was violated. Of course, the challenger knew full well
that he was violating the spirit of the contest and purposely modified
the rules of the contest to allow his specific "cheat". I'm just
sorry that Mike didn't see that when he should have....

Frankly, I'd like to see it be called a "draw" -- have Mike send back
the $100 and acknowledge that the challenger (Patrick?) found a way to
bend the rules in such a way as to technically meet the modified
challenge...

SCOTT19U.ZIP_GUY

unread,

Apr 16, 2001, 12:58:55 PM4/16/01

to

jsch...@bellsouth.net (Joshua Schmier) wrote in
<QoDC6.107$xZ3....@news1.mco>:

>I actually disagree, as I'm sure many will on this subject. Mike was
>going by the guidelines within the FAQ. The FAQ clearly outlines that
>filesystem exploits are not true compression. I think it may have been
>short-sighted of Goldman to allow multiple 'compressed' files, however,
>having the challenge worded so loosely makes things interesting. :)
>

Mike thought he was going by the FAQ becasue he thought he had a lock
and a free 100 dollars. But his allowing multiple files and not
clarifying that for any file length of N he would automaticlly add some
magic number for the "true length" shows Mike did not fully understand
the problem. The challenger and Mike are not really held to the FAQ
since they both agreed to go om the sum of the file length. The bet
was on there wording of letters. And it should be clear to any one
that MIKE lost.

>And yes, very good show Patrick.

Bottom line your reply not clear. Should mike pay the money or not?
Is the other guy out the 100 dollars or not?

David A. Scott
--
SCOTT19U.ZIP NOW AVAILABLE WORLD WIDE "OLD VERSIOM"
http://www.jim.com/jamesd/Kong/scott19u.zip
My website http://members.nbci.com/ecil/index.htm
My crypto code http://radiusnet.net/crypto/archive/scott/
MY Compression Page http://members.nbci.com/ecil/compress.htm
**NOTE FOR EMAIL drop the roman "five" ***

Dale King

unread,

Apr 16, 2001, 4:01:52 PM4/16/01

to

<s...@nospam.unt.edu> wrote in message
news:9bfee2$5rn$1...@hermes.acs.unt.edu...

> SCOTT19U.ZIP_GUY <david_...@emailv.com> wrote:
> > s...@nospam.unt.edu wrote in <9bf6iv$5p5$1...@hermes.acs.unt.edu>:
>
> >>That said, as soon as I saw the request that multiple "compressed
> >>files" be allowed, it was pretty obvious what game was going to be
> >>played. Unfortunately, Mike didn't see it coming, and didn't object
> >>to the "multiple files" requirement (in fact in an earlier posting
> >>here, I pointed out that you should require the decompressor and data
> >>to be bundled together in one file, and the size of that piece should
> >>determine the "compressed size").
>
> > So what is the bottom line in your view. Besides the fact Mike was
> > stupid and so eager for the 100 dollars that he allowed the multiple
> > files. I agree that he should have required one program which internally
> > must contain all info so that it expands to one file. But he did not.
> > So in your view should he pay the 5g's or not??
>
> That's not such an easy question -- the letter of Mike's challenge was
> met (since he agreed to the multiple file modification), but clearly
> the spirit was violated. Of course, the challenger knew full well
> that he was violating the spirit of the contest and purposely modified
> the rules of the contest to allow his specific "cheat". I'm just
> sorry that Mike didn't see that when he should have....
>
> Frankly, I'd like to see it be called a "draw" -- have Mike send back
> the $100 and acknowledge that the challenger (Patrick?) found a way to
> bend the rules in such a way as to technically meet the modified
> challenge...

Looking back over the page I see where the miscommunication occurred:

-------------
> I meant can I send you a compressor and several compressed files whose
> total file size is less than the original uncompressed file and from
> which I can regenerate the original uncompressed file
>
> Patrick

Sure -- but you send me a *decompressor*, I don't need the compressor.
-------------

Patrick was trying to relax the rules to allow the compressed file to be
broken into many files and rely on information in the file header (the
lengths) for storing some information. I am certain that Mike did not feel
the rules has changed as nowhere does he talk about compressed files plural
and talks repeatedly about a compressed file or the compressed file.
Including saying right after the original was sent,"In order to meet the
challenge you must provide to me a compressed file and a decompressor."

Whether the challenge changed by the above statements is a legal question.

The challenge as originally stated was not met. Patrick thinks that he got
the original terms of the challeng changed, and I don't believe Mike things
the terms changed.

If the supposed new definition were allowed, I have already shown how it can
be done by using the archive and readonly flags from a large number of
files.

--
Dale King

SCOTT19U.ZIP_GUY

unread,

Apr 16, 2001, 1:13:05 PM4/16/01

to

Ki...@TCE.com (Dale King) wrote in <3adb...@news.tce.com>:

I knew you would not be able to read the email and be able to figure
out what each man said. The the challene as agreed to was for several
files. Did you even bother to read it. How can one be so blind.

True Mark referred to sending a compressed file later in the series of
messages. Only because his narrow view of things like your narrow view
of things. He did not think that more than one would be sent. But it is
clear that he allowed multiply files. It was only after he lost that
he decied to change the rules and not allow for muliply files.

>
>I see where Patrick said, "The restated challenge is to reconstruct the
>original file using a decompressor and several compressed files whose
>total file size is less than the original file size." The problem is
>that Mike never restated the challenge in such a way. Nowhere in the
>thread of emails he posted did Mike allow for anything but *a*
>compressed file and a *a* file for the decompressor. That is even being
>generous as he allows you to rely on the file length information in the
>header.

Actually he did. When he stated the ruless where OK excepet that
the wanted the "decompressor" and not the "compressor". He stated rest of
rules ok. He even left that part of messages in the previous copy
part so it was clear to both men what the rules were. However I can
see from other posts that this simple logic is to hard for you to
follow.

>
>This is one of the reasons that I object so vigorously to David Scott's
>claims of increasing compression ratio by relying on the file header. It
>is the same fallacy that Patrick tried to exploit. Following Patrick's

Actaully the fallacy is in your mind. Patrick exploited the loop
hole given him. You probelm is understand what is and is not
availage to Patrick. If the agreement was more carefully worded
then such a loop hole would not exist. Problem is that the so called
loop hole was agreed to by both men. Just becasue in your mind you
don't think the loop hole should have been allowed does not change the
simple facts that it was allowed.

David A. Scott
--
SCOTT19U.ZIP NOW AVAILABLE WORLD WIDE "OLD VERSIOM"
http://www.jim.com/jamesd/Kong/scott19u.zip
My website http://members.nbci.com/ecil/index.htm
My crypto code http://radiusnet.net/crypto/archive/scott/
MY Compression Page http://members.nbci.com/ecil/compress.htm
**NOTE FOR EMAIL drop the roman "five" ***

Dale King

unread,

Apr 16, 2001, 4:55:10 PM4/16/01

to

"SCOTT19U.ZIP_GUY" <david_...@emailv.com> wrote in message
news:9085767C9H110W...@207.36.190.226...

> Ki...@TCE.com (Dale King) wrote in <3adb...@news.tce.com>:
>

> >I'm afraid I have to agree with Mike here the wording from the challenge
> >was, "Last, the contestant will send me a decompressor and a compressed
> >file". He said *A* compressed file. 218 files is not the same as a file.
> >I agree that Mike's challenge is not impossible, but in this case the
> >contestant did not meet the challenge as specified.
>
> I knew you would not be able to read the email and be able to figure
> out what each man said. The the challene as agreed to was for several
> files. Did you even bother to read it. How can one be so blind.
>
> True Mark referred to sending a compressed file later in the series of
> messages. Only because his narrow view of things like your narrow view
> of things. He did not think that more than one would be sent. But it is
> clear that he allowed multiply files. It was only after he lost that
> he decied to change the rules and not allow for muliply files.

Note, I did later point out where the idea of allowing multiple files
slipped in. I'm not sure if Mike actually thought he had changed the rules
to allow multiple files given the fact that he always said a compressed
file.

The original challenge was not met. Patrick wanted the challenge changed to
allow him to take advantage of storage of the file lenght in the header. If
one allows an arbitrary number of files and the file header is not counted
against it, it is trivial to compress an n-bit file into n files of zero
length using the archive and read-only bits for storage. But of course, no
compression has actually taken place.

> >I see where Patrick said, "The restated challenge is to reconstruct the
> >original file using a decompressor and several compressed files whose
> >total file size is less than the original file size." The problem is
> >that Mike never restated the challenge in such a way. Nowhere in the
> >thread of emails he posted did Mike allow for anything but *a*
> >compressed file and a *a* file for the decompressor. That is even being
> >generous as he allows you to rely on the file length information in the
> >header.
>
> Actually he did. When he stated the ruless where OK excepet that
> the wanted the "decompressor" and not the "compressor". He stated rest of
> rules ok. He even left that part of messages in the previous copy
> part so it was clear to both men what the rules were.

I acknowledged that later as I said. I think it was a miscommunication, but
that is beside the point.

> >This is one of the reasons that I object so vigorously to David Scott's
> >claims of increasing compression ratio by relying on the file header. It
> >is the same fallacy that Patrick tried to exploit. Following Patrick's
>
> Actaully the fallacy is in your mind. Patrick exploited the loop
> hole given him. You probelm is understand what is and is not
> availage to Patrick. If the agreement was more carefully worded
> then such a loop hole would not exist. Problem is that the so called
> loop hole was agreed to by both men. Just becasue in your mind you
> don't think the loop hole should have been allowed does not change the
> simple facts that it was allowed.

I agree that Mike should have been more careful wording it.

I'm not talking about the contest in that last paragraph. I am relating it
to your claims where you essentially rely on the fallacy that the length
information does not count and actually claim that is an increase in
compression ratio. The question is did Patrick actually compress anything?
The total of the data portions of all his files is less than the original,
so doesn't that count as compression.

The answer is no for the same reason that I have argued against your claim
that relying on the file length in the header is not increasing the
compression ratio. He is relying on the extra storage provided for the file
length in the file header. And he is doing that over enough files to get
enough extra storage to offset the size of the program.

--
Dale King

SCOTT19U.ZIP_GUY

unread,

Apr 16, 2001, 4:10:29 PM4/16/01

to

s...@nospam.unt.edu wrote in <9bfee2$5rn$1...@hermes.acs.unt.edu>:

>SCOTT19U.ZIP_GUY <david_...@emailv.com> wrote:

>> So what is the bottom line in your view. Besides the fact Mike was
>> stupid and so eager for the 100 dollars that he allowed the multiple
>> files. I agree that he should have required one program which internally
>> must contain all info so that it expands to one file. But he did not.
>> So in your view should he pay the 5g's or not??
>
>That's not such an easy question -- the letter of Mike's challenge was
>met (since he agreed to the multiple file modification), but clearly
>the spirit was violated. Of course, the challenger knew full well
>that he was violating the spirit of the contest and purposely modified
>the rules of the contest to allow his specific "cheat". I'm just
>sorry that Mike didn't see that when he should have....
>

Yes the letter of Mikes challenge was meet. But don't forget the
guy paid the CASH before he even saw the file that was to be compressed
so it was stil a challenage. I don't aggree that the challenger thought
he was violating the rules. He asked questions and got anwsers. I
doubt you would have called it a draw if the challenger failed to actaully
do what he requested. It only after the fact Mike was wrong and
lost that people are screaming foul.

>Frankly, I'd like to see it be called a "draw" -- have Mike send back
>the $100 and acknowledge that the challenger (Patrick?) found a way to
>bend the rules in such a way as to technically meet the modified
>challenge...
>

If mike is a man of honor and I get the impression there are not
many men of honor who reply here he should pay the money and be a
little bit wiser for it.

Dror Baron

unread,

Apr 16, 2001, 8:58:16 PM4/16/01

to

It's quite obvious that Mike made a huge blunder.
He agreed to the "multiple file" issue.

Technically, this serves a nice legal lesson.

Theoretically, note as pointed by others that the
total memory consumed by the file system did not
go down. I believe even David will agree ;-)

Mike and Patrick - have a good laugh over a beer
sometime...

Dror

SCOTT19U.ZIP_GUY

unread,

Apr 16, 2001, 7:59:24 PM4/16/01

to

Ki...@TCE.com (Dale King) wrote in <3adb...@news.tce.com>:

>
>

>Looking back over the page I see where the miscommunication occurred:
>
>-------------
>> I meant can I send you a compressor and several compressed files whose
>> total file size is less than the original uncompressed file and from
>> which I can regenerate the original uncompressed file
>>
>> Patrick
>
>Sure -- but you send me a *decompressor*, I don't need the compressor.
>-------------
>
>Patrick was trying to relax the rules to allow the compressed file to be
>broken into many files and rely on information in the file header (the
>lengths) for storing some information. I am certain that Mike did not
>feel the rules has changed as nowhere does he talk about compressed
>files plural and talks repeatedly about a compressed file or the
>compressed file. Including saying right after the original was sent,"In
>order to meet the challenge you must provide to me a compressed file and
>a decompressor."

I am certain having dealt with many bar room bets. Mike didn't
give a dam due to being smug about his winning. If anything he thought
Patrick was a bigger fool than he first thought. A common thing
in bar room bets. Mike knew he lost when Patrick sent him the files.
From that point on the emails were Mike trying to weasel out of the bet.
He know He lost them pretended to not understand by saying he checked
the 2 files and that Patrick failed. He knew he lost but tried to
change the rules after the action was over. Much like the deomcrats
in Florida.

>
>Whether the challenge changed by the above statements is a legal
>question.
>

No question about. In most state betting like this is illegal.
Mike has perfects rights to be nonhonerable and not pay. I am sure
Clinton would do the same thing and not pay up. Mike could just
laugh and keep the 100 dollars. However I would never like to
do business with Mike since I think honesty is still a valuble
thing. The facts are black and white Mike lost. But I can see
you being unable to comprehend this. You have not seen many
bar room fights.

>The challenge as originally stated was not met. Patrick thinks that he
>got the original terms of the challeng changed, and I don't believe Mike
>things the terms changed.

The rules changed and there is a written trail to prove it.

>
>If the supposed new definition were allowed, I have already shown how it
>can be done by using the archive and readonly flags from a large number
>of files.

Actully it can be done with other than those unfair tactics that
you seem so found of. Maybe your just pissed you didn't get the 5k
your self so you wish no one to get it.

>
>--
> Dale King

Joshua Schmier

unread,

Apr 16, 2001, 11:44:59 PM4/16/01

to

Scott, is it not clear to you what the definition of compression really is?
You complately ignore the definition that the FAQ describes for the sake of
your arguement. Please reconsider these parts of the FAQ before making
insults:

Part 9.2
Paragraph 9

This assumes of course that the information available to the decompressor is
only the bit sequence of the compressed data. If external information such
as a file name, a number of iterations, or a bit length is necessary to
decompress the data, the bits necessary to provide the extra information
must be included in the bit count of the compressed data. Otherwise, it
would be sufficient to consider any input data as a number, use this as the
file name, iteration count or bit length, and pretend that the compressed
size is zero. For an example of storing information in the file name, see
the program lmfjyh in the 1993 International Obfuscated C Code Contest,
available on all comp.sources.misc archives (Volume 39, Issue 104).

Part 10
Paragraph 1

Some programs claimed to achieve incredible compression ratios are
completely fake: they do not compress at all but just stored the
uncompressed data in hidden files on the hard disk or keep it in unused
clusters. Needless to say, such programs are dangerous and should never be
used because there is a significant risk of losing all the data.

I'm sorry for having to post actual excerpts from the FAQ itself, but to me
(and everyone else I imagine) Patrick clearly did not compress any
information whatsoever. Only a fool would conceed when the challenge was
obviously not met as such. I'm actually sorry for Mike, as he truely
believed someone was actually going to try to compress the data. Patrick, in
this case, did not even try. Keep the hundred dollars Goldman, you deserve
it for all your trouble.

"SCOTT19U.ZIP_GUY" <david_...@emailv.com> wrote in message

news:9085B58E8H110W...@207.36.190.226...

Matt Timmermans

unread,

Apr 17, 2001, 12:10:15 AM4/17/01

to

Yes, this is essentially a bar bet.

I think it's pretty clear that Mike thought he could make a few bucks and
get a laugh, both at the expense of some sucker who thought he could
compress random data.

Mike was then out-suckered.

Well, turnabout is fair play. Patrick gets the laugh. He couldn't have
gotten the $5000 without a lawyer, which would cost him more than that
anyway (You're lucky he didn't have a lawyer uncle, Mike!), so he took the
moral high ground and forgave the bet. Even if he had a lawyer in the
family, he probably would have taken pity on Mike for his inexperience
(never give 50:1 odds on a sucker bet, Mike!), and forgiven the debt anyway.

Either way, Patrick wins this one. If this was an even-money bet, I would
say Mike should pay. At 50:1, I'm glad Patrick didn't force the issue -- In
a bar, you usually do this sort of thing for 20s, and you wouldn't try to
bleed your buddy for $1000 just because he tried to put one over on you.
Its best just to make him sweat a little before letting him off the hook,
which is exactly what Patrick did.

As others have said... very good form.

Joshua Schmier

unread,

Apr 17, 2001, 12:33:20 AM4/17/01

to

No, in this case Patrick bluffed and wasted all of our time in the process.
He never intended to actually COMPRESS the data, and lost a hundred dollars
over a rather well executed con. It may be true that he actually did provide
an example of file system exploitation which vaguely met the challenge
requirements, but no COMPRESSION took really place.

Bah. :)

"Matt Timmermans" <ma...@timmermans.nospam-remove.org> wrote in message
news:HmPC6.595981$f36.17...@news20.bellglobal.com...

Matt Timmermans

unread,

Apr 17, 2001, 1:41:51 AM4/17/01

to

It may have been called a "compression" challenge, but, as Patrick
demonstrated, the rules did not require compression. Mike defined an
acceptance test, which Patrick met. It's not his fault that the test was
defined poorly.

And what do you mean he wasted all of our time? What were you reading the
thread for? We know that you can't compress random data, Mike knows it, and
Patrick knows it. Nobody was expecting a miracle, so what were you hoping
for that you didn't get here?

"Joshua Schmier" <jsch...@bellsouth.net> wrote in message
news:zIPC6.203$xZ3....@news1.mco...

SCOTT19U.ZIP_GUY

unread,

Apr 16, 2001, 9:17:27 PM4/16/01

to

dba...@chopin.ifp.uiuc.edu (Dror Baron) wrote in
<Pine.GSO.4.21.01041...@chopin.ifp.uiuc.edu>:

>It's quite obvious that Mike made a huge blunder.
>He agreed to the "multiple file" issue.
>
>Technically, this serves a nice legal lesson.
>
>Theoretically, note as pointed by others that the
>total memory consumed by the file system did not
>go down. I believe even David will agree ;-)

Yes I agree Mike blundered and should pay up

>
>Mike and Patrick - have a good laugh over a beer
>sometime...
>

German or Japenese Sappro draft was good when I
was in Japan

>Dror

Tim Tyler

unread,

Apr 17, 2001, 6:21:56 AM4/17/01

to

Patrick Craig <pat...@nag-j.co.jp> wrote:

: http://www.geocities.com/patchnpuki/other/compression.htm

Pat: ``can I send you a compressor and several compressed files whose

total file size is less than the original uncompressed file and

from which I can regenerate the original uncompressed file''

Mike:``Sure''

It's almost a shame things have settled down and none of the mentioned
legal action for fraud is taking place ;-)
--
__________
|im |yler Try my latest game - it rockz - http://rockz.co.uk/

Joshua Schmier

unread,

Apr 17, 2001, 7:02:34 AM4/17/01

to

Main Entry: 1sev·er·al
Pronunciation: 'sev-r&l, 'se-v&-
Function: adjective
Etymology: Middle English, from Anglo-French, from Medieval Latin separalis,
from Latin separ separate, back-formation from separare to separate
Date: 15th century
1 a : separate or distinct from one another <federal union of the several
states> b (1) : individually owned or controlled : EXCLUSIVE <a several
fishery> -- compare COMMON (2) : of or relating separately to each
individual involved <a several judgment> c : being separate and distinctive
: RESPECTIVE <specialists in their several fields>
2 a : more than one <several pleas> b : more than two but fewer than many
<moved several inches> c chiefly dialect : being a great many

Would you consider 218 files many? I would.

"Tim Tyler" <t...@cryogen.com> wrote in message news:GBxM4...@bath.ac.uk...

Joshua Schmier

unread,

Apr 17, 2001, 7:05:27 AM4/17/01

to

I was expecting someone to TRY to compress a file. I was expecting a bunch
of e-mails talking about schemes never before thought of (I don't care if
they worked or not). I was not, however, expecting someone to exploit
something that anyone in this newsgroup could have done.

Do you truely believe Patrick's intent was to COMPRESS the data? No, he
intended to bluff his way to a five thousand dollar prize. It's very vague
in their correpondance whether or not Goldman agreed to the multiple file
scheme. And I believe he did not agree to such.

I honestly don't know how you think the rules didn't require compression.
Patrick didn't meet the rules. However, he did trick his way around a bit
until he got to the last step. So because of that Goldman should pay up? I
seriously don't think so.

It's Patrick's fault for entering the challenge knowing full and well his
intent was to exploit a file system, NOT compress data.

What was I hoping for that didn't get here? I was hoping the challenge was
met with sincerity. I was hoping the challenged was truely attempting to
hack out the random data until he gave up.

"Matt Timmermans" <ma...@timmermans.nospam-remove.org> wrote in message

news:zIQC6.596667$f36.17...@news20.bellglobal.com...

Phil Norman

unread,

Apr 17, 2001, 7:30:49 AM4/17/01

to

On Tue, 17 Apr 2001 04:05:27 -0700,
Joshua Schmier <jsch...@bellsouth.net> wrote:
>
>Do you truely believe Patrick's intent was to COMPRESS the data? No, he
>intended to bluff his way to a five thousand dollar prize.

I don't believe he had a purely financial intent in mind. Note
the two quotes from the discussion after the decompressor had been
sent:

"Obviously I would like you to send me the $5000 as promised,
but I never really expected you to."

... and a little later...

"I will withdraw my claim to the $5000 and allow you to...."

The way I read it, it's fairly clear that Patrick's main reason
for accepting the challenge was to show Mike how dangerous it can
be to make carelessly-worded challenges. If this really is his
reason for his acceptance of the challenge, I consider all those
comments which have appeared in this thread that his action is
immoral or otherwise distasteful to be quite ridiculous. In my
opinion, it used a very clever information hiding algorithm to
quite clearly make a very valid point.

Mathematics states that the counting theorem is truth.
Mathematics, however, requires careful wording in order that its
meaning cannot be confused.

Cheers,
Phil

Joshua Schmier

unread,

Apr 17, 2001, 7:44:24 AM4/17/01

to

He did use some bit of trickery to get there, though. Goldman should have
had some rest, he probably wouldn't have slipped up if he had. Trickery is
distasteful regardless of the context. :)

You must admit that this is the first time anyone has accepted the
challenge. I, for one, don't expect the challenger to execute everything
perfectly. The wording is absolutely fine. No one can send him _a_ file and
_a_ decompressor and still compress the file. It's that plain and simple.
The wording is absolutely 100% fine.

So, Patrick essentially proved that someone can be tricked. Nice.

"Phil Norman" <Phil....@genedata.com> wrote in message
news:slrn9doab9.6g...@jalua.ch.genedata.com...

Remzi Seker

unread,

Apr 17, 2001, 10:44:31 AM4/17/01

to

You can not compress a pure random data. However, even the randon number
generators have some statistical underlying dependence, so one may achieve
some compression, depending on how bad the randon number generator is.
Remzi

"Patrick Craig" <pat...@nag-j.co.jp> wrote in message
news:3ADACD0D...@nag-j.co.jp...

s...@nospam.unt.edu

unread,

Apr 17, 2001, 11:13:09 AM4/17/01

to

Joshua Schmier <jsch...@bellsouth.net> wrote:

> He did use some bit of trickery to get there, though. Goldman should have
> had some rest, he probably wouldn't have slipped up if he had. Trickery is
> distasteful regardless of the context. :)

Oh come on. Who was trying to trick who? Wasn't Mike trying to trick
some naive person into accepting a challenge that couldn't be met? So
Patrick out-tricked the tricker.... that's the thing that great tales
are made of.

Personally, I think it's a great story, and both people (Mike and
Patrick) seem to view it that way instead of as a serious/legal
matter. It wouldn't be amusing at all if lawyers got involved, and I
commend both of them for a mature approach to this incident (a few of
the whiners around here could certainly learn something from them).

SCOTT19U.ZIP_GUY

unread,

Apr 17, 2001, 12:07:31 PM4/17/01

to

jsch...@bellsouth.net (Joshua Schmier) wrote in

<N0WC6.152$yL1....@news4.mco>:

>He did use some bit of trickery to get there, though. Goldman should
>have had some rest, he probably wouldn't have slipped up if he had.
>Trickery is distasteful regardless of the context. :)

Then you live in a anal retentive world. The world does not
conform to your rules. The bet was over file length period.
If it was otherwise when Mike sent the file of the desired length
he would have had to say for the contest to be as in the FAQ I am
sending a file of X data bits but I am also adding a number equal
to base two of the bits or byte length to account for the extra
information in the file. He did not. He and Pat both strictly used
the data length as length. Mike should pay up. At the very least
if Patricj gives Mike a break he should pay the 100 dollars back
plus 100 hunred dollars for the leason Mike should have learned.

>

>You must admit that this is the first time anyone has accepted the
>challenge. I, for one, don't expect the challenger to execute everything
>perfectly. The wording is absolutely fine. No one can send him _a_ file
>and _a_ decompressor and still compress the file. It's that plain and
>simple. The wording is absolutely 100% fine.
>

Then You can't read and exaimine what the true facts the emails show.

Dror Baron

unread,

Apr 17, 2001, 3:06:47 PM4/17/01

to

> Oh come on. Who was trying to trick who? Wasn't Mike trying to trick
> some naive person into accepting a challenge that couldn't be met? So
> Patrick out-tricked the tricker.... that's the thing that great tales
> are made of.
>
> Personally, I think it's a great story, and both people (Mike and
> Patrick) seem to view it that way instead of as a serious/legal
> matter. It wouldn't be amusing at all if lawyers got involved, and I
> commend both of them for a mature approach to this incident (a few of
> the whiners around here could certainly learn something from them).

Precisely my previous point.
Mike and Patrick now have something to laugh over
when having their beers :)

Dror

Dror Baron

unread,

Apr 17, 2001, 3:09:07 PM4/17/01

to

On 17 Apr 2001, SCOTT19U.ZIP_GUY wrote:
> >He did use some bit of trickery to get there, though. Goldman should
> >have had some rest, he probably wouldn't have slipped up if he had.
> >Trickery is distasteful regardless of the context. :)
>
> Then you live in a anal retentive world. The world does not
> conform to your rules. The bet was over file length period.
> If it was otherwise when Mike sent the file of the desired length
> he would have had to say for the contest to be as in the FAQ I am
> sending a file of X data bits but I am also adding a number equal
> to base two of the bits or byte length to account for the extra
> information in the file. He did not. He and Pat both strictly used
> the data length as length. Mike should pay up. At the very least
> if Patricj gives Mike a break he should pay the 100 dollars back
> plus 100 hunred dollars for the leason Mike should have learned.

I for one wouldn't be surprised if Patrick is
one of David's paid agents...

Probably the trick he used can be formulated as
a bijection, which is clearly the best compression
and/or cryptographic method ever devised by man :)

Dror

Dale King

unread,

Apr 17, 2001, 3:40:23 PM4/17/01

to

"Tim Tyler" <t...@cryogen.com> wrote in message news:GBxM4...@bath.ac.uk...

> Patrick Craig <pat...@nag-j.co.jp> wrote:
>
> : http://www.geocities.com/patchnpuki/other/compression.htm
>
> Pat: ``can I send you a compressor and several compressed files whose
> total file size is less than the original uncompressed file and
> from which I can regenerate the original uncompressed file''
>
> Mike:``Sure''

In addition to Joshua's pointing out the ambiguity of the word several,
there is also the question of the definition of "total file size". That
could easily be construed to also take into account the file header
information. One could also construe that to include the unused portion of
the last sector of the file, since that is part of the size of the file.

Quoting the FAQ, "If external information such as a file name, a number of

iterations, or a bit length is necessary to decompress the data, the bits
necessary to provide the extra information must be included in the bit count

of the compressed data." Since the file lengths for all of the 218 files is

necessary to decompress the data, the bits necessary to provide the extra

information must be included in the bit count of the compressed data. That
is the common, accepted method for computing compression ratio (except for
David), that was given in the FAQ where Patrick read about the challenge. I
see no place where Patrick asked that this be changed.
--
Dale King

Joshua Schmier

unread,

Apr 17, 2001, 4:15:52 PM4/17/01

to

Hey, I did say in previous threads that Pratrick gave us a good show. I was
just defending the legal standpoint the others ridiculously proposed. I
totally agree, this is a great story, but if anyone was expecting anything
real out of it, they're more deluded than anyone who honestly accepts the
challenge (see: those who believe Goldman should pay up).

History was made. Nothing was done or ever intended to be done. I'm both
elated and disappointed.

<s...@nospam.unt.edu> wrote in message
news:9bhmi5$7q5$1...@hermes.acs.unt.edu...

Joshua Schmier

unread,

Apr 17, 2001, 4:24:10 PM4/17/01

to

Pfft, you don't think Patrick tricked Goldman into accepting guidelines that
were unlike the original challenge? It seems like you're the one attempting
to conform the world to your rules. The challenge and bet were completely
different creatures, Patrick got to where he did merely by trickery (I'm not
defending that Goldmans bet was not a trick in itself). If you can't see
that then you're more deluded than I before thought.

I saw the e-mails. I saw the word compression in there many times. However,
no compression took place. The wording for the challenge is fine. The
loosely defined correspondence is where the mix up occured. (Intentionally
or no? I personally believe Patrick wanted the money and thought $100 was a
nice gamble since his rules were predefined and accepted, IMHO.)

"SCOTT19U.ZIP_GUY" <david_...@emailv.com> wrote in message

news:908662A4AH110W...@207.36.190.226...

SCOTT19U.ZIP_GUY

unread,

Apr 17, 2001, 5:12:13 PM4/17/01

to

jsch...@bellsouth.net (Joshua Schmier) wrote in

<3E1D6.167$yL1....@news4.mco>:

>I saw the e-mails. I saw the word compression in there many times.
>However, no compression took place. The wording for the challenge is
>fine. The loosely defined correspondence is where the mix up occured.
>(Intentionally or no? I personally believe Patrick wanted the money and
>thought $100 was a nice gamble since his rules were predefined and
>accepted, IMHO.)

No there was no mix up in the emails. The emails was to clarify
the actaully rules. IN the emails mike allowed for nore files.

The bet was for string files sizes period. Besides just what do
you call compression. Since there are no real compressors for
gerneral files. Each one can only do some shuffling of file sizes.
No compressor really exist. So what the hell is your deination of
compressor. Do you know of any general compressors that can actaully
be count to make general files smaller no. The best they can do is
break even if there fully bijective that is.

Dale King

unread,

Apr 17, 2001, 5:53:42 PM4/17/01

to

"SCOTT19U.ZIP_GUY" <david_...@emailv.com> wrote in message

news:908696BC5H110W...@207.36.190.226...

> jsch...@bellsouth.net (Joshua Schmier) wrote in
> <3E1D6.167$yL1....@news4.mco>:
>

> No compressor really exist. So what the hell is your deination of
> compressor. Do you know of any general compressors that can actaully
> be count to make general files smaller no.

I can't believe I have to explain this basic information to you.

No compressor can make all bit sequences smaller, which I am sure you agree
with.

A compressor can make a certain subset of all possible bit sequences
smaller. If certain sequences are known or presumed to be more likely to
occur then those can be targeted to be smaller. When one factors in the
probability of these being more likely then one can achieve compression
averaged over all files if they occur with those probabilities.

> The best they can do is
> break even if there fully bijective that is.

Sorry, but your permutation does not break even (unless one let you get away
with ignoring the file length). You can only break even (assuming all
sequences are equally likely) if you make no sequences smaller. By the
counting theorem, "if a lossless compression program compresses some files,
it must expand others"
--
Dale King

SCOTT19U.ZIP_GUY

unread,

Apr 17, 2001, 6:34:44 PM4/17/01

to

Ki...@TCE.com (Dale King) wrote in <3adc...@news.tce.com>:

>No compressor can make all bit sequences smaller, which I am sure you
>agree with.

Yes you are honest about this one point.

>
>A compressor can make a certain subset of all possible bit sequences
>smaller. If certain sequences are known or presumed to be more likely to
>occur then those can be targeted to be smaller. When one factors in the
>probability of these being more likely then one can achieve compression
>averaged over all files if they occur with those probabilities.
>
>> The best they can do is
>> break even if there fully bijective that is.

Let me reword this in way you can understand. It can't break
even unless its bijective. I hope you had enough real logic
courses to understand this. But I doubt it.

>
>Sorry, but your permutation does not break even (unless one let you get
>away with ignoring the file length). You can only break even (assuming
>all sequences are equally likely) if you make no sequences smaller. By
>the counting theorem, "if a lossless compression program compresses some
>files, it must expand others"

Again as with bijective compression. Break even does not mean it makes
all files smaller maybe it does to you though. Suppose I have a program
that maps ever file to itself exceept file A and B. File A gets mapped
to file B and file A get mapped to file B. I would say the compression
discribed by the transform is a break even thing. You may not.

Yes if a lossless compression program compresses some files
if must expand others. ( assuming we are talking about general
file compressors) However if its your kind of compressor since it
fails to make full use of the file space it must expand even more
files than a simalar nonbijective compressor. Becasue of the gaps
introduced by your wasteful file compression that fails to use all
the data space the way normal files use it.

Tom Gutman

unread,

Apr 17, 2001, 7:28:00 PM4/17/01

to

"SCOTT19U.ZIP_GUY" <david_...@emailv.com> wrote in message

news:908561603H110W...@207.36.190.226...
> s...@nospam.unt.edu wrote in <9bf6iv$5p5$1...@hermes.acs.unt.edu>:
>
> >That said, as soon as I saw the request that multiple "compressed
> >files" be allowed, it was pretty obvious what game was going to
be
> >played. Unfortunately, Mike didn't see it coming, and didn't
object
> >to the "multiple files" requirement (in fact in an earlier
posting
> >here, I pointed out that you should require the decompressor and
data
> >to be bundled together in one file, and the size of that piece
should
> >determine the "compressed size").

> >
>
> So what is the bottom line in your view. Besides the fact Mike
was
> stupid and so eager for the 100 dollars that he allowed the
multiple
> files. I agree that he should have required one program which
internally
> must contain all info so that it expands to one file. But he did
not.
> So in your view should he pay the 5g's or not??

> The rules that both men agreed to was a series of files. The
total
> length being less that the random files. I think that means Mike
lost.
> Are people here going to vote. I would like to know what Gutman
and
> Matt have to say after they read the rules the two men agreed to
in
> there correspondse. Which explicitly allowed for many files.
>
> I already expect King to say Mike won since it obvous to him
that the
> length of a file is not the length but some number added to the
length.
>
>
As long as my name is mentioned here, although mostly it's all been
already said.

The analysis in the original thread still stands. The original
challenge remains unmet. It still has the weakness of allowing two
files (and therefor some additional information) but it is unlikely
that a combination of a sufficiently short decompressor and a
sufficiently long (but still practical) file can be found.

Mike got snookered into accepting a change in the rules. Why he did
not see the implications of multiple files is quite beyond me, but
that is what happened. But that's the heart of the situation --
once multiple files are allowed, there is no way to identify or
isolate the length information.

Patrick was clearly (beyond his statement) not interested in
actually collecting the money. He was quite conservative in
implementing his scheme. His scheme as implemented allows for
removing a byte in every 256 bytes -- about 12,000 bytes in a 3MB
file. He only did 200 bytes. Carried to an extreme, splitting the
file into segments could yield close to one extra bit per byte (but
the average file length would then be around two -- that means a
*LOT* of files). If he wanted to, he could have hidden the
underlying method by adding some fluff -- perhaps a MTF followed by
adaptive Huffman (which would do minimal expansion for
non-compressible files) would make to difficult to identify the
actual source of the compression (if the split were complete, the
Huffman coding could actually save a bit, as the segments then use
only 255 symbols).

As to who owes whom what, Patrick already relinquished claims to the
$5000, and even to the $100 fee. So nothing is legally or morally
owed. A gentlemen would return the $100, along with a case of wine
(or maybe just a case of very good wine).

I notice further down some nut nattering about bijections. I don't
see where bijections come into this and since the poster did not
explicate it seems completely nonsensical.

--
Tom Gutman

Tom Gutman

unread,

Apr 17, 2001, 7:35:47 PM4/17/01

to

"Dale King" <Ki...@TCE.com> wrote in message
news:3adb...@news.tce.com...
[snip]

>
> If the supposed new definition were allowed, I have already shown
how it can
> be done by using the archive and readonly flags from a large
number of
> files.
>

I would reject that, since the resulting files would not be normal
files and would not withstand the normal vicissitudes of file
maintenance. The archive bits disappear at the next backup, and the
read only bits do not survive copying to CD-ROM medium.

If you really want some extra information from the file system, I
would recommend the last modified date. That provides considerably
more bits, is generally kept intact by file maintenance routines,
has no functional implications, and is already used in some cases to
store extra information (such as Windows version).

However any of this puts you in the context of a specific file
system with whatever its specific facilities and quirks. The scheme
actually used requires nothing more than the standard abstraction of
a file -- a finite sequence of bytes.

--
Tom Gutman

Tom Gutman

unread,

Apr 17, 2001, 7:41:40 PM4/17/01

to

"Dale King" <Ki...@TCE.com> wrote in message

news:3adc...@news.tce.com...

I would like to point out that the files used came with only byte
lengths, not bit lengths. There is a reason why the FAQ
specifically specifies "bit length" and not just "length".

--
Tom Gutman

Phil Norman

unread,

Apr 18, 2001, 3:35:54 AM4/18/01

to

On Tue, 17 Apr 2001 13:24:10 -0700,
Joshua Schmier <jsch...@bellsouth.net> wrote:
>
>I saw the word compression in there many times. However, no
>compression took place.

The word 'compression' is a poorly chosen one anyway. It implies
that a so-say 'compressor' will make information take up less
storage space. This is not, in general, the case.

It is, in fact, possible to devise a storage medium in which
multiple files take up no extra storage space. For example, if my
filesystem stores 9 bits per byte, and sets the high bit to '1'
only when this byte is the start of a new file. Unlikely, yes,
and not a particularly useful storage system, but in this case
such a compressor, using multiple files, would be a good way to
pack the information.

Anyway, that's not the point at all. It was a good joke - I
enjoyed it a lot.

>The wording for the challenge is fine.

Wrong. The wording of the challenge allows for two files, a
decompressor and a file containing compressed data. That
'expansion' of one file to two in itself ignores the principle of
per-file overhead within the filesystem. Yes, with a large file
(and considering the size of the smallest piece of code possible
for reassembly - see Patrick's aforementioned URL for examples)
this expansion of filesystem overhead (and its inherent ability to
store information about the length of files without adding to the
perceived file length) is irrelevant. However, the principle was
not mentioned.

>loosely defined correspondence is where the mix up occured. (Intentionally
>or no? I personally believe Patrick wanted the money and thought $100 was a
>nice gamble since his rules were predefined and accepted, IMHO.)

Did you read the web page which Patrick posted? I won't bother
requoting it; see my previous post on the subject.

Cheers,
Phil

Mikael Lundqvist

unread,

Apr 18, 2001, 4:36:53 AM4/18/01

to

>===== Original Message From "Matt Timmermans"
<ma...@timmermans.nospam-remove.org> =====

>And what do you mean he wasted all of our time? What were you reading the
>thread for? We know that you can't compress random data, Mike knows it, and
>Patrick knows it. Nobody was expecting a miracle, so what were you hoping
>for that you didn't get here?
>

One might just wonder why some individuals are so inclined to return to
these
so utterly hopeless issues.
I suppose solving real problems like how to make PPM faster/better/more
efficient is too much of a challenge...

Regards,

Mikael Lundqvist
mailto:mikael.l...@spray.se
http://hem.spray.se/mikael.lundqvist/
Occum's Razor:
"The simplest solution is probably the correct one."

Mikael Lundqvist

unread,

Apr 18, 2001, 4:48:28 AM4/18/01

to

>===== Original Message From Dror Baron <dba...@chopin.ifp.uiuc.edu> =====
> ...

>I for one wouldn't be surprised if Patrick is
>one of David's paid agents...
>

Extremely funny!

>Probably the trick he used can be formulated as
>a bijection, which is clearly the best compression
>and/or cryptographic method ever devised by man :)
>

Data security problems (not just cryptography) are part of Computer Science
because they ARE a problem (unlike compressing random data). Viruses,
hackers,
fire ... the list continue.
The greatness in David's ideas may be debatable but even the slightest
improvement is still an improvement, isn't it?

Keep an open mind,

Phil Norman

unread,

Apr 18, 2001, 4:52:46 AM4/18/01

to

On Wed, 18 Apr 2001 01:00:46 -0700,
Joshua Schmier <jsch...@bellsouth.net> wrote:
>
>Give me an example where two files can be exploited to hide data in a
>current filesystem. If you can do that then I will conceed the argument that
>the wording itself is flawed.

Have you read the web page mentioned at the start of this thread?
It describes exactly how one can hide information in the file
length entry of a filesystem. The example given splits one file
into roughly 250 parts, but the principle holds for splitting into
2 parts.

Cheers,
Phil

Joshua Schmier

unread,

Apr 18, 2001, 6:11:07 AM4/18/01

to

How can that be? You still need space to store the decompression script. The
probablity of finding a sequence of bytes within a file that matches the
last bytes of the file is near zero. It would require a huge file to
compress the data in this fasion. So huge in fact, I do not think current
hard drives or file systems could hold it.

"Phil Norman" <Phil....@genedata.com> wrote in message

news:slrn9dqleu.hi...@jalua.ch.genedata.com...

Phil Norman

unread,

Apr 18, 2001, 6:31:39 AM4/18/01

to

On Wed, 18 Apr 2001 03:11:07 -0700,

Joshua Schmier <jsch...@bellsouth.net> wrote:
>"Phil Norman" <Phil....@genedata.com> wrote in message
>news:slrn9dqleu.hi...@jalua.ch.genedata.com...
>> On Wed, 18 Apr 2001 01:00:46 -0700,
>> Joshua Schmier <jsch...@bellsouth.net> wrote:
>> >
>> >Give me an example where two files can be exploited to hide
>> >data in a current filesystem. If you can do that then I will
>> >conceed the argument that the wording itself is flawed.
>>
>> Have you read the web page mentioned at the start of this thread?
>> It describes exactly how one can hide information in the file
>> length entry of a filesystem. The example given splits one file
>> into roughly 250 parts, but the principle holds for splitting into
>> 2 parts.

>How can that be? You still need space to store the decompression script. The
>probablity of finding a sequence of bytes within a file that matches the
>last bytes of the file is near zero. It would require a huge file to
>compress the data in this fasion. So huge in fact, I do not think current
>hard drives or file systems could hold it.

Hello? Read the web page with which this thread began! Read also
my posting two times ago (which you followed up to), where I
specifically mention that the saving which can be had by splitting
a file into just two parts would not offset the size of the
decompression script. Wake up, drink some coffee, whatever, just
please read and try to understand before reiterating what I've
already said, or claiming that the idea behind the *working
implementation* that started this thread could not possibly work.
I'm out of patience; if you continue talking rubbish I'll just
ignore your postings.

Cheers,
Phil

Patrick Craig

unread,

Apr 18, 2001, 7:27:30 AM4/18/01

to

> >===== Original Message From "Matt Timmermans"
> <ma...@timmermans.nospam-remove.org> =====
> >And what do you mean he wasted all of our time? What were you reading the
> >thread for? We know that you can't compress random data, Mike knows it, and
> >Patrick knows it. Nobody was expecting a miracle, so what were you hoping
> >for that you didn't get here?
> >
> One might just wonder why some individuals are so inclined to return to
> these
> so utterly hopeless issues.
> I suppose solving real problems like how to make PPM faster/better/more
> efficient is too much of a challenge...

What's the prize money for solving these problems?

Mikael Lundqvist

unread,

Apr 18, 2001, 8:14:37 AM4/18/01

to

>===== Original Message From pat...@nag-j.co.jp (Patrick Craig) =====

>> >===== Original Message From "Matt Timmermans"
>> <ma...@timmermans.nospam-remove.org> =====
>> >And what do you mean he wasted all of our time? What were you reading the
>> >thread for? We know that you can't compress random data, Mike knows it,
and
>> >Patrick knows it. Nobody was expecting a miracle, so what were you hoping
>> >for that you didn't get here?
>> >
>> One might just wonder why some individuals are so inclined to return to
>> these
>> so utterly hopeless issues.
>> I suppose solving real problems like how to make PPM faster/better/more
>> efficient is too much of a challenge...
>
>What's the prize money for solving these problems?
>

If your solution is good, better than most, I'm sure some will pay money for
using your software.
But making computing better should be reason enough. (just ask the
scientists)
What is the use in discussing something we all know is a waste of time?
Random data can't be compressed!!! This should be reason enough to end this
discussion.

Regards,

George Johnson

unread,

Apr 18, 2001, 9:58:56 AM4/18/01

to

"Remzi Seker" <re...@norm.dpo.uab.edu> wrote in message
news:9bhkj2$eio$1...@SonOfMaze.dpo.uab.edu...

| You can not compress a pure random data. However, even the randon number
| generators have some statistical underlying dependence, so one may achieve
| some compression, depending on how bad the randon number generator is.
| Remzi

Actually, I disagree. A pure random data compressor that works efficiently
on ONLY random data should be able to exist. The downside is that it would be
entirely worthless for non-random data and probably rather slow on the Kilobyte
per minute rate. It also wouldn't compress near-random data very well (like
the kind found in ordered compression files).

There should always be an inverse to any function.

Patrick BROSSET

unread,

Apr 18, 2001, 10:11:46 AM4/18/01

to

hi

i agree !

Patrick

"George Johnson" <matr...@voyager.net> wrote in message
news:3add9f23$0$18890$272e...@news.execpc.com...

SCOTT19U.ZIP_GUY

unread,

Apr 18, 2001, 10:21:07 AM4/18/01

to

mika...@MailAndNews.com (Mikael Lundqvist) wrote in
<3AE5...@MailAndNews.com>:

>>===== Original Message From pat...@nag-j.co.jp (Patrick Craig) =====
>>> >===== Original Message From "Matt Timmermans"
>>> <ma...@timmermans.nospam-remove.org> =====
>>> >And what do you mean he wasted all of our time? What were you
>>> >reading the thread for? We know that you can't compress random
>>> >data, Mike knows it,
>and
>>> >Patrick knows it. Nobody was expecting a miracle, so what were you
>>> >hoping for that you didn't get here?
>>> >
>>> One might just wonder why some individuals are so inclined to return
>>> to these
>>> so utterly hopeless issues.
>>> I suppose solving real problems like how to make PPM
>>> faster/better/more efficient is too much of a challenge...
>>
>>What's the prize money for solving these problems?
>>
>If your solution is good, better than most, I'm sure some will pay money
>for using your software.

Don't count on the money, Just like with Mike you most likely
would never see a penny. First of all IBM would use there over
paid lawyers saying you infringed on an existing patent. Of course
it would be a lie but money is what counts in this game not honesty
as you are already leraning.

>But making computing better should be reason enough. (just ask the
>scientists)

This is a good reason but don't expect your name to go down in
history since odds are some one else will get credit for the idea.
But for ones on satisfacation go ahead.

>What is the use in discussing something we all know is a waste of time?
>Random data can't be compressed!!! This should be reason enough to end
>this discussion.

Actually this is a good discussion why should the thread end
I think some may actually learn something.

Dale King

unread,

Apr 18, 2001, 10:31:45 AM4/18/01

to

"Tom Gutman" <Tom_G...@compuserve.com> wrote in message
news:9bijtb$os4$1...@sshuraaa-i-1.production.compuserve.com...

> "Dale King" <Ki...@TCE.com> wrote in message
> news:3adb...@news.tce.com...
> [snip]
> >
> > If the supposed new definition were allowed, I have already shown
> how it can
> > be done by using the archive and readonly flags from a large
> number of
> > files.
> >
> I would reject that, since the resulting files would not be normal
> files and would not withstand the normal vicissitudes of file
> maintenance. The archive bits disappear at the next backup, and the
> read only bits do not survive copying to CD-ROM medium.
>
> If you really want some extra information from the file system, I
> would recommend the last modified date. That provides considerably
> more bits, is generally kept intact by file maintenance routines,
> has no functional implications, and is already used in some cases to
> store extra information (such as Windows version).

Good point! But my general point was that if you are allowed to make use of
information in the file header it is trivial to simply produce a gazillion
files all of which have zero "length". According to David's logic if I
shrink a file down to zero length it has no information and therefore I
don't have to count it.

> However any of this puts you in the context of a specific file
> system with whatever its specific facilities and quirks. The scheme
> actually used requires nothing more than the standard abstraction of
> a file -- a finite sequence of bytes.

I make no distinction between the file length or any part of the header. If
it is outside of the data portion of the file and it is necessary for
decompressing you have to count it. That is what I keep trying to get David
to understand.
--
Dale King

Dale King

unread,

Apr 18, 2001, 10:35:57 AM4/18/01

to

"Tom Gutman" <Tom_G...@compuserve.com> wrote in message

news:9bik8c$qqs$1...@sshuraac-i-1.production.compuserve.com...

>
> I would like to point out that the files used came with only byte
> lengths, not bit lengths. There is a reason why the FAQ
> specifically specifies "bit length" and not just "length".

No, a file length IS a bit length! Multiply by 8 and you have the bit length
of the file.

The reason the FAQ says bit length is that it covers both lengths with bit
resolution or those with only byte resolution.
--
Dale King

Dale King

unread,

Apr 18, 2001, 10:48:33 AM4/18/01

to

"Tim Tyler" <t...@cryogen.com> wrote in message news:GBz9B...@bath.ac.uk...
> Dale King <Ki...@tce.com> wrote:
> : "SCOTT19U.ZIP_GUY" <david_...@emailv.com> wrote:
>
> :> The best they can do is break even if there fully bijective that is.

>
> : Sorry, but your permutation does not break even (unless one let you get
away
> : with ignoring the file length). You can only break even (assuming all
> : sequences are equally likely) if you make no sequences smaller. By the
> : counting theorem, "if a lossless compression program compresses some
files,
> : it must expand others"
>

> I'd say a compressor that produced no average change in the length of
> files of any specified size was "breaking even".
>
> What makes you think "you can only break even ... if you make no
> sequences smaller"...?

First off let's not use the word files. And forgive me for not being
precise. What I really should have said was smaller than the entropy of the
data source. If you encode any sequence in less bits than the entropy of the
source you will have to expand others. The amount of expansion will be
larger than the amount saved, so you will actually increase the average
compressed length. The only way to break even is to map all sequences to a
sequence equal in length to the entropy of the source. You cannot do any
better than that when averaging over all sequences that can be generated by
the source. See the FAQ and the counting theorem.

As I said above though that assumes that all sequences are equally likely.
That is not usually the case.

--
Dale King

SCOTT19U.ZIP_GUY

unread,

Apr 18, 2001, 10:53:17 AM4/18/01

to

I am one to pay off barroom bets. And this whole thread has my
interest so I am willing to step in and offer 50 american dollars
to Patrick if he can met my challenge by end of May. At that time
any one else is elgible to win by end of June. If no one solves
it. It may mean that the only solution is no solution.

Here is my contest. I have not checked what what you sent so
you may already be a winner. If your a winner its OK becasue it
proves that random.org is generating long strings thare are not
very complex according to K ( i can't spell or pronounce it anyway)
theory. It also implies that the random data my be a front for
groups such as the NSA.

take the "current random file that was used in last contest"
Use my code to add the number "zero". This just tacks a long
value of zero to file 32 bits of zero. Than run DSC to meld
the length to the previously given random file. It will change
the original file by a few bytes. It could even get shorter.
Take this new length call it X. However it will get no longer
than 4 more than the oringal file.

The goal is to archive the files you used together to a single
file and see if this resulting file is less than X bytes long.
data can't be hidden in file name flags all data used must be
in the file. However that archive is only data no file names are
in it except in your script.

First step in build archive. This is a First in Last out type
of archive. So just like the original data file to the program
add a long zero than run DSC. The output file is the archive
at this point. Next take the first data file you use add it to
the acrhive by appending the archive file to the end of the
data file. Then write a long to end of file and run DSC again
to get new archive.
Repeat for each addition data as stated here.
Append the archive to the current data file add long
value to end of current file. Then run DSC to creat new
archive file.
At this point you should have a 8-bit byte type of file that sort of
contains the complexity of the orginal file. If this resulting
file can be made shorter than X you win.

First let me say DSC bijectively combines a number to a file
the number is in range of 0 to N-1 where N is number of bytes
in the file. For the pusposes of archiving. Any 8-bit binary
file can be thought of as a file of N bytes with a pointer of
0 to N-1 attached. Therefor one can consider any file an acrhive
file. If the pointer is zero in value then there is only one
file in the archive. If not zero then the first N bytes can
be thought of as first file out of archive and UNDSC takes
number from the file leaving at end as a long. You then have
to remove the long from end of file to check for more files
to be removed.

Any questions or thoughts that need to be cleared up?

Dale King

unread,

Apr 18, 2001, 10:58:52 AM4/18/01

to

"Mikael Lundqvist" <mika...@MailAndNews.com> wrote in message
news:3AE2...@MailAndNews.com...

> Data security problems (not just cryptography) are part of Computer
Science
> because they ARE a problem (unlike compressing random data). Viruses,
> hackers,
> fire ... the list continue.
> The greatness in David's ideas may be debatable but even the slightest
> improvement is still an improvement, isn't it?
>
> Keep an open mind,

Unfortunately one must make sure that any attempt at improvement is grounded
in correct science and mathematics. In computer science that is information
theory and computational theory. It is wrong to let someone ignore the
basics of those using a sleight of hand to disprove what has already been
established to be fact. The counting theorem for instance is proven. It
can't be disproved. That is why it is a theorem. Note the difference between
a theorem and a theory. Theorems are proven facts, theories are hypotheses.

I think I've lost what my point was.

Note I have explained the encryption benefits that David gets in a way that
does not rely on a permutation compressor. The key is simply don't compress
the length information with the data.

--
Dale King

SCOTT19U.ZIP_GUY

unread,

Apr 18, 2001, 11:03:10 AM4/18/01

to

Ki...@TCE.com (Dale King) wrote in <3add...@news.tce.com>:

>Good point! But my general point was that if you are allowed to make use
>of information in the file header it is trivial to simply produce a
>gazillion files all of which have zero "length". According to David's
>logic if I shrink a file down to zero length it has no information and
>therefore I don't have to count it.

Actually for my BIJECTIONS I don't include the NULL file.
However I don't think the contest was structured so that they
were rulled out. I think Patrick could have used Empty files
if he wished to. In My challenge of today empty files are
ruled out.

>I make no distinction between the file length or any part of the header.
>If it is outside of the data portion of the file and it is necessary for
>decompressing you have to count it. That is what I keep trying to get
>David to understand.

Even DOS text files don't have the length in the data protion
of file. Why on earth would one force a compressed file to include
such data that other files don't have, Especailly since the goal
of the compressor is to make the file shorter. You seem to be
forgetting what compression of files is meant to be or not to be.

SCOTT19U.ZIP_GUY

unread,

Apr 18, 2001, 11:14:50 AM4/18/01

to

Ki...@TCE.com (Dale King) wrote in <3add...@news.tce.com>:

>"Tom Gutman" <Tom_G...@compuserve.com> wrote in message

Actually the FAQ does not go far enough. It really should have
only one basis for the length. A good base where all files of
any byte sizes of 1 to n bits. Is the distance to the last "one"
bit used in the infinitely finite odd file. One could easly
take an input file and find the distance to last one after
it is converted to a infinite file that is finitely odd as well
as one could take the distance of converted compressed file to
find the distance to last one bit in this converted format.
Multiply a byte file by 8 to get the number of bits. Does not
really compare to a sting of that many bits since there are
string of bits that are not multiplies of 8. Come on King use
your mind.

Dror Baron

unread,

Apr 18, 2001, 1:43:59 PM4/18/01

to

On Wed, 18 Apr 2001, George Johnson wrote:

> Actually, I disagree. A pure random data compressor that works efficiently
> on ONLY random data should be able to exist. The downside is that it would be
> entirely worthless for non-random data and probably rather slow on the Kilobyte
> per minute rate. It also wouldn't compress near-random data very well (like
> the kind found in ordered compression files).
>
> There should always be an inverse to any function.

That data was generated by hardware measurements and then
unbiased. If you guys sincerely think that "function" has
an inverse, you need to consult medical and not scientific
people.

Dror

Mikael Lundqvist

unread,

Apr 18, 2001, 1:58:35 PM4/18/01

to

>===== Original Message From "Dale King" =====
> ...

>Unfortunately one must make sure that any attempt at improvement is grounded
>in correct science and mathematics. In computer science that is information
>theory and computational theory. It is wrong to let someone ignore the
>basics of those using a sleight of hand to disprove what has already been
>established to be fact. The counting theorem for instance is proven. It
>can't be disproved. That is why it is a theorem. Note the difference between
>a theorem and a theory. Theorems are proven facts, theories are hypotheses.
>

You're correct of course!

>I think I've lost what my point was.
>

Maybe it's time to move on?

>Note I have explained the encryption benefits that David gets in a way that
>does not rely on a permutation compressor. The key is simply don't compress
>the length information with the data.
>

You and Mr Scott should have a beer or two together
some day and have a great laugh about it.
You two seem to have a lot in common.

Peace,

Dror Baron

unread,

Apr 18, 2001, 2:04:37 PM4/18/01

to

On Wed, 18 Apr 2001, Patrick Craig wrote:

> > One might just wonder why some individuals are so inclined to return to
> > these
> > so utterly hopeless issues.
> > I suppose solving real problems like how to make PPM faster/better/more
> > efficient is too much of a challenge...
>
> What's the prize money for solving these problems?

If you want to play "hide the information in the
header" preschool games, the payoff will be small.

If you have serious contributions:
1. Compression superior to PPM / BWT with LZ-like
complexity in software.
2. Compression similar to PPM / BWT in hardware.
In these cases the payoff could be nice.

Mikael wants us to stop playing games and start
coming up with real ideas.

Dror

Dale King

unread,

Apr 18, 2001, 2:25:29 PM4/18/01

to

"George Johnson" <matr...@voyager.net> wrote in message
news:3add9f23$0$18890$272e...@news.execpc.com...

I agree with what you meant, but your terminology and understanding of
information theory is a bit wrong. There really isn't such a thing as random
and non-random data. There are random and non-random data sources. It is
impossible to compress all data from a random source. What presumably you
are talking about is data that has no patterns to it. In other words all
symbols appear with equal frequency and sequences of symbols do not repeat,
etc. If you had a source that emitted sequences that were always or even
usually like this then you can compress that because the entropy is lower
than the size of the messages. You are saying that it will never generate a
sequence consisting of all the same symbol or at least its likelihood is
much lower than that of any other sequence. That is not however a random
source. A random source generates all seuences with equally likelihood.

> There should always be an inverse to any function.

All functions have inverse relationships, but that inverse relationship may
not be a function. Many functions do not have an inverse function: f(x) =
x^2; g(x) = 0; ceil(x); abs(x); signum(x) to name just a few for the set of
real numbers. Only when a function is bijective does it have an inverse
function. If the function is non-bijective it can have an inverse, but the
inverse is not a function.

--
Dale King

s...@nospam.unt.edu

unread,

Apr 18, 2001, 2:26:36 PM4/18/01

to

George Johnson <matr...@voyager.net> wrote:

> Actually, I disagree. A pure random data compressor that works
> efficiently on ONLY random data should be able to exist. The
> downside is that it would be entirely worthless for non-random data
> and probably rather slow on the Kilobyte per minute rate. It also
> wouldn't compress near-random data very well (like the kind found in
> ordered compression files).

Nope, I'm afraid that's not true. Of course, it depends on how you
want to define "works" (what compression is significant?) and
"random". I could probably define both of those words in such a way
as to make this possible, but it would be stretching the definitions a
bit.

Here's a situation -- you write a compressor as you describe, that
"works efficiently for random data". I'll start throwing random data
streams at it. If you look at the average compressed length over all
my random data files, it will be at least as large as the original
random data length (in other words, no compression takes place). Sure
there might be a file here or there that compresses by a maybe a whole
byte, but probably never more than a byte, and even that would be
pretty rare.

> There should always be an inverse to any function.

But random data is not a "function", so there's nothing to invert...

--
Steve Tate --- srt[At]cs.unt.edu | Gratuitously stolen quote:
Dept. of Computer Sciences | "The box said 'Requires Windows 95, NT,
University of North Texas | or better,' so I installed Linux."
Denton, TX 76201 |

s...@nospam.unt.edu

unread,

Apr 18, 2001, 3:01:36 PM4/18/01

to

Dale King <Ki...@tce.com> wrote:

> I agree with what you meant, but your terminology and understanding of
> information theory is a bit wrong. There really isn't such a thing as random
> and non-random data. There are random and non-random data sources.

Actually, that's not true. That's the Shannon Information Theory view
of things, but it's not the whole picture.

While Kolmogorov complexity is usually mentioned here in the context
of data compression, the original work was done to define precisely
what is meant by "random data" (*NOT* random sources). Kolmogorov
complexity in fact gives a very nice theory and definitions for what
we mean by "random data" versus "non-random data"...

SCOTT19U.ZIP_GUY

unread,

Apr 18, 2001, 3:09:53 PM4/18/01

to

s...@nospam.unt.edu wrote in <9bkm8s$act$1...@hermes.acs.unt.edu>:

>George Johnson <matr...@voyager.net> wrote:
>
>> Actually, I disagree. A pure random data compressor that works
>> efficiently on ONLY random data should be able to exist. The
>> downside is that it would be entirely worthless for non-random data
>> and probably rather slow on the Kilobyte per minute rate. It also
>> wouldn't compress near-random data very well (like the kind found in
>> ordered compression files).
>
>Nope, I'm afraid that's not true. Of course, it depends on how you
>want to define "works" (what compression is significant?) and
>"random". I could probably define both of those words in such a way
>as to make this possible, but it would be stretching the definitions a
>bit.
>

The word "random" has always bothered me. Everybody realizes
or should realize that a random bit string can be anything. But
if it just happend to be the string of all zeros then most would
intuitively say that result was not a random string of bits.
So maybe we need another type of string to call random. Instead
call strings or files since most deal with byte oriented files.
K random if they can't be resented by a series of simalar files.
With simple combining

Example take the file of the contest. Look for largest
repeated string. Call is R then the file is:
Part 1 + Part R + Part 2 + Part R + Part 3
now if a simple script exists not use hidden storage or specical
file names. One could do as Patrick and build the so called
Random file. If one combines all his DATA files and the Script
file into an archive that is bijective like my archive description.
If the resulting file is smaller than the actaul so called random
file we could reject for reasons of cyrpto graphical insecurtiy
or call it rejected for low complexity or because its to compressable.
Obviously one can look for many repeats and break this in to
several files. But the more files one has you are paying a
penalty when you make the archive. I am note sure if the
file given Patric is actaully very complex. He may meet the
challenge. Actually if he meets it this way he has totally
succeeded in the original challege that is the way the original
challege should have been worded. That give be a single file that
represent the larger file. That single file can be the archive
of the program or script file that uses the resulting data files
from the archive. I beive even King might think this would have
been a fairer test since in thoery one knows files exist that
can't be compressed to file that represents the larger
file. The question is wether or not the test file given Patrick
is one that is complex enough not to be represented by the smaller
archive file. Please correct me if I am missing something.

Dale King

unread,

Apr 18, 2001, 3:16:01 PM4/18/01

to

"SCOTT19U.ZIP_GUY" <david_...@emailv.com> wrote in message
news:90875FF69H110W...@207.36.190.226...

> Ki...@TCE.com (Dale King) wrote in <3add...@news.tce.com>:
>

> >I make no distinction between the file length or any part of the header.
> >If it is outside of the data portion of the file and it is necessary for
> >decompressing you have to count it. That is what I keep trying to get
> >David to understand.
>
> Even DOS text files don't have the length in the data protion
> of file. Why on earth would one force a compressed file to include
> such data that other files don't have, Especailly since the goal
> of the compressor is to make the file shorter. You seem to be
> forgetting what compression of files is meant to be or not to be.

First you assume all compressors are concerned with compressing to a "file".
Most compressors aren't. Most compressors instead compress to a prefix-free
code so that they can be stored in a file or the can be sent in a serial
stream without encoding the length through some external means. Of course if
you store the output of one of these compressors into a file it takes a few
more bytes because it is not taking advantage the storage available in the
file header including a file length that is stored automatically. But
considering that files are allocated in terms of sectors or blocks, you
really only see a difference if you average over a great many files. Why
take advantage of something that will decrease the "length" of a file a few
bytes when that will rarely have any effect on disk space used. It makes
more sense to go for the more general purpose solution that works whether
you have a file or not.

And I disagree with the statement that the goal of a compressor is to make a
file shorter. The goal of a comrpessor is to decrease the amount of
information that must be presented to the decompressor to recreate the
original data. Relying on the storage in the file header makes the file
length smaller (and occasionally the amount of disk space taken up), but
does not decrease the amount of information that must be presented to the
decompressor to recreate the original file. Compression ratio does not
concern itself with where the information is stored only whether it is
needed for decompression.
--
Dale King

Dale King

unread,

Apr 18, 2001, 3:45:41 PM4/18/01

to

"SCOTT19U.ZIP_GUY" <david_...@emailv.com> wrote in message

news:90875551BH110W...@207.36.190.226...

> Ki...@TCE.com (Dale King) wrote in <3add...@news.tce.com>:
>
> >"Tom Gutman" <Tom_G...@compuserve.com> wrote in message
> >news:9bik8c$qqs$1...@sshuraac-i-1.production.compuserve.com...
> >>
> >> I would like to point out that the files used came with only byte
> >> lengths, not bit lengths. There is a reason why the FAQ
> >> specifically specifies "bit length" and not just "length".
> >
> >No, a file length IS a bit length! Multiply by 8 and you have the bit
> >length of the file.
> >
> >The reason the FAQ says bit length is that it covers both lengths with
> >bit resolution or those with only byte resolution.
>
> Actually the FAQ does not go far enough. It really should have
> only one basis for the length. A good base where all files of
> any byte sizes of 1 to n bits. Is the distance to the last "one"
> bit used in the infinitely finite odd file. One could easly
> take an input file and find the distance to last one after
> it is converted to a infinite file that is finitely odd as well
> as one could take the distance of converted compressed file to
> find the distance to last one bit in this converted format.
> Multiply a byte file by 8 to get the number of bits. Does not
> really compare to a sting of that many bits since there are
> string of bits that are not multiplies of 8. Come on King use
> your mind.

A file of length n bytes contains 8*n bits. The file length tells you how
many bits are present in the file. Whether you use all of those bits is
beside the point. The file length tells you the bit length of the file
simply by multiplying by 8. Are you trying to say a file doesn't have a
multiple of 8 bits? Come on Scott, use your mind.

Looking at your compressor in h2unc.c we see that is exactly how your
decompressor uses the file length. Look in the get_bit function which
returns one bit from the file. You have calls of the form:

t = fread(buff+2, sizeof(char), 128, fin);

Which says read up to 128 bytes from the file and t is set to the number of
bytes read (the first time you only read 2 bytes).

The next statement (after the assert which you conditionally compile out
because it will be false when you hit the EOF) is:

cch +=t*8;

Or in the case of the first read you just say:

cch = t*8;

cch is the count of bits you currently have in your buffer. You decrement it
for every bit you return.

If you were to sum all the values that get stored in t by the fread function
(not counting where you reuse t later in the function for something else),
the total would be exactly equal to the length of the file. If you total the
amounts by which cch is increased you will find it is exactly 8 times the
file length.

So you are using the file length information to determine the bit length of
the file. And once again we are back to the quote from the FAQ, "If external
information such as a file name, a number of iterations, or a bit length is
necessary to decompress the data, the bits necessary to provide the extra
information must be included in the bit count of the compressed data."

--
Dale King

SCOTT19U.ZIP_GUY

unread,

Apr 18, 2001, 3:54:09 PM4/18/01

to

s...@nospam.unt.edu wrote in <9bkoag$adl$1...@hermes.acs.unt.edu>:

>Dale King <Ki...@tce.com> wrote:
>
>> I agree with what you meant, but your terminology and understanding of
>> information theory is a bit wrong. There really isn't such a thing as
>> random and non-random data. There are random and non-random data
>> sources.
>
>Actually, that's not true. That's the Shannon Information Theory view
>of things, but it's not the whole picture.
>
>While Kolmogorov complexity is usually mentioned here in the context
>of data compression, the original work was done to define precisely
>what is meant by "random data" (*NOT* random sources). Kolmogorov
>complexity in fact gives a very nice theory and definitions for what
>we mean by "random data" versus "non-random data"...
>

I think the problem with King is like if he was taught Eucliden
Geomtry in school. So everything is formulated towards that. THen
when he get comfronted with something more suited to Projective
Geometry he would get lost. Its hard to mix the two geometres so
I try to be of the type uses which ever one fits the problem.
One has to be flexable.

The word random is a nasty word whose meaning is very vague
many times people think of it in the Shannon sense and yet they
are using it in the K complexity sense but the fact is they are
not the same. And I think it leads to strange results.
Example supose one is using a OTP should one check to see if
the string is K complex enough. I feel most would say yes.
They may even eye the data to see if if looks random. They
may than apply the OTP to the data and then do another check
to see if it "looks" random. But if Shannon is correct should
one check at all since any sequence is as likely as any other
to be the output of a "uniformly random source". Also
if you make the set of possible OTPs smaller then when one
decrypts you are reducing the set of possible messages to
choose from. What is one really to do or make of this?

Dror Baron

unread,

Apr 18, 2001, 4:15:06 PM4/18/01

to

On 18 Apr 2001, SCOTT19U.ZIP_GUY wrote:

> I think the problem with King is like if he was taught Eucliden
> Geomtry in school. So everything is formulated towards that. THen
> when he get comfronted with something more suited to Projective
> Geometry he would get lost. Its hard to mix the two geometres so
> I try to be of the type uses which ever one fits the problem.
> One has to be flexable.

Dude, I rest my case. That's like cheating whichever
way works best for any given problem.

Dror

SCOTT19U.ZIP_GUY

unread,

Apr 18, 2001, 4:19:27 PM4/18/01

to

I am trying to say that if one is looking at file compressors
and is really concerned as you pretend to be. One should count
the bits in a more consistant way that has meaning for bit files
and files on systems that are laid out in fields other than
8-bit bytes. Since you seem so obsessed with data length should
count as bits used. This is really on part of the picture. The
8 bit field used in most files is part of picture. One should
transform the data to a common playing field before the bits are
count as you wish. But I see the concept is over your head.
In which case if your only dealing with 8 bit bytes files to
exculsion of anything else. Then this extra crap you think is
overhead you should use only the BYTE count and not the bit count.

>
>Looking at your compressor in h2unc.c we see that is exactly how your
>decompressor uses the file length. Look in the get_bit function which
>returns one bit from the file. You have calls of the form:
>
> t = fread(buff+2, sizeof(char), 128, fin);
>
>Which says read up to 128 bytes from the file and t is set to the number
>of bytes read (the first time you only read 2 bytes).
>
>The next statement (after the assert which you conditionally compile out
>because it will be false when you hit the EOF) is:
>
> cch +=t*8;
>
>Or in the case of the first read you just say:
>
> cch = t*8;
>
>cch is the count of bits you currently have in your buffer. You
>decrement it for every bit you return.
>
>If you were to sum all the values that get stored in t by the fread
>function (not counting where you reuse t later in the function for
>something else), the total would be exactly equal to the length of the
>file. If you total the amounts by which cch is increased you will find
>it is exactly 8 times the file length.

Thats becuase my code is desinged for only 8-bit byte types of
files. I think I made that clear. However its not that hard to
make it work on string of bits that aren't a multiple of 8 bits
of one desired. But I tend to work with 8-bit byte types of files
where any value is possible. And I use "t" a lot it just a temporary
varilble. I hare long names there to dam hard to type repeatablely
and its harder for me to read the code.

>
>So you are using the file length information to determine the bit length
>of the file. And once again we are back to the quote from the FAQ, "If

Again yes I am usnig the EOF as the indicator of EOF as most normal
programd that I have looked at seem to do.

>external information such as a file name, a number of iterations, or a
>bit length is necessary to decompress the data, the bits necessary to
>provide the extra information must be included in the bit count of the
>compressed data."

Yes Mr King I read that. IF EXTERNAL INFORMATION IS NEEDED.
I don't need external information. Some compressors do. Such as
the length of file they are compressing. Or the big problem in
many poorly implimented huffman and arithmetic compressors where
the actually compression ends in a bounary on the last byte.
Let me put another way which you can't seem to understand. I
need the external length like I need the name. I can't decompress
the file till I open it. I can't open it till I have the name.
So by your logic one should count the bits uses to discribe the
name since I can't decompress a file till I know its name.

Dale King

unread,

Apr 18, 2001, 5:06:36 PM4/18/01

to

"SCOTT19U.ZIP_GUY" <david_...@emailv.com> wrote in message

news:908784DFFH110W...@207.36.190.226...

> s...@nospam.unt.edu wrote in <9bkoag$adl$1...@hermes.acs.unt.edu>:
>
> >Dale King <Ki...@tce.com> wrote:
> >
> >> I agree with what you meant, but your terminology and understanding of
> >> information theory is a bit wrong. There really isn't such a thing as
> >> random and non-random data. There are random and non-random data
> >> sources.
> >
> >Actually, that's not true. That's the Shannon Information Theory view
> >of things, but it's not the whole picture.
> >
> >While Kolmogorov complexity is usually mentioned here in the context
> >of data compression, the original work was done to define precisely
> >what is meant by "random data" (*NOT* random sources). Kolmogorov
> >complexity in fact gives a very nice theory and definitions for what
> >we mean by "random data" versus "non-random data"...
> >
>
> I think the problem with King is like if he was taught Eucliden
> Geomtry in school. So everything is formulated towards that. THen
> when he get comfronted with something more suited to Projective
> Geometry he would get lost. Its hard to mix the two geometres so
> I try to be of the type uses which ever one fits the problem.
> One has to be flexable.

You don't need to go beyond euclidian geometry to show that something is
untrue according to euclidian geometry. Kolmogorov did not disprove or
replace Shannon. Your claims are incorrect according to Shannon, going to
Komogorov won't make them right any more than going to projective geometry
will allow you to disprove the pythagorean theorem.

And I do know a bit more than Euclidean geometry. I do have a Master's
degree in computer science, so quit treating me like a kid that doesn't know
Shannon from Kolmogorov.

> The word random is a nasty word whose meaning is very vague

Which is why I responded as I did to try to clear up the difference.
Kolmogorov is concerned with how to represent one given file, compression is
based on information theory which tries to compress data from a source.

> many times people think of it in the Shannon sense and yet they
> are using it in the K complexity sense but the fact is they are
> not the same. And I think it leads to strange results.
> Example supose one is using a OTP should one check to see if
> the string is K complex enough. I feel most would say yes.
> They may even eye the data to see if if looks random. They
> may than apply the OTP to the data and then do another check
> to see if it "looks" random. But if Shannon is correct should
> one check at all since any sequence is as likely as any other
> to be the output of a "uniformly random source". Also
> if you make the set of possible OTPs smaller then when one
> decrypts you are reducing the set of possible messages to
> choose from. What is one really to do or make of this?

But my point was that if you had a source that produced only strings that
had a high K complexity it is possible to write a compressor that compressed
the data from that source. The entropy is basically decreased because while
any particular file seems random as it has a high K complexity, the source
is not random because it will not produce messages with low K complexity.
And also Kolmogorov included the size of the program, but in the compression
case we are not counting the size of the decompressor.

That is basically what I thought the OP was trying to say, but didn't quite
word it correctly.
--
Dale King

Tom Gutman

unread,

Apr 18, 2001, 5:38:26 PM4/18/01

to

"Joshua Schmier" <jsch...@bellsouth.net> wrote in message
news:DIbD6.5987$3e.42...@news3.mco...
> I will not argue the validity of the bet as it's open to anyones
opinion.
> I've already stated that it wasn't the same as the challenge
itself. If you
> can't see that Patrick indeed tricked Goldman into accepting a
'multiple
> file' challenge, then there is no point in trying to prove it to
you.
>
> Give me an example where two files can be exploited to hide data
in a
> current filesystem. If you can do that then I will conceed the
argument that
> the wording itself is flawed.

Theoretically or practically? If theoretically, see Patrick's web
page for a demonstration of exactly how this can be done. See that
same page for a discussion of why the file size required for this is
impractically large. But even there, that depends on how small the
decompressor can be made. If you can get the decompressor down to
three bytes or so, then it becomes practical. If I can fully
specify/build the execution machine, I can get the decompressor down
to that size.

So I could meet the slightly modified challenge: Before seeing the
data, I will send an interpreter for some language of my choosing.
After getting the data, I will send two files, a decompressors and a
data file. The sum of the file sizes for the two files will be less
than the size of the original data file. When the previously
provided interpreter is used to run the decompressor with the data
file as input, it will recreate the original file. Note that by
providing the interpreter before seeing the data, the interpreter
cannot contain any information relating to the content of the file.
Its only purpose is to allow a sufficiently compact representation
of the logic of the decompressor so that the small amount of
compression available with a single file split is sufficient to
yield a net compression.

>
[snip]

--
Tom Gutman

Tom Gutman

unread,

Apr 18, 2001, 5:53:11 PM4/18/01

to

"Dale King" <Ki...@TCE.com> wrote in message

news:3add...@news.tce.com...

Sorry, I don't buy that. If it wanted to cover any length, whether
bit or byte resolution, it would not have needed any qualifier. A
bit length means specifically a length with bit resolution. Length
without any qualification is somewhat ambiguous and depends on the
context. In principle it could be of any resolution, but so much is
done with byte files that very often it is clearly a byte length
that is meant.

In a purely theoretic context files are commonly modeled as finite
sequences of bits. As such, they implicitly come with bit lengths.
But to conform better with commonly available files systems in a
practical or semi-practical context files are usually modeled as
finite sequences of bytes. As such, they implicitly come with byte
lengths. If one has a scheme (such as a naive Huffman encoding)
that generates arbitrary bit files, then in practice one needs to
find a way to map these bit file into byte files. A bit length
(plus padding) is one way to do that. Some people have found better
ways.

--
Tom Gutman

Tom Gutman

unread,

Apr 18, 2001, 5:58:18 PM4/18/01

to

"Dror Baron" <dba...@chopin.ifp.uiuc.edu> wrote in message
news:Pine.GSO.4.21.01041...@chopin.ifp.uiuc.edu...

That's kind of weird! So you think it's like cheating for me to
decide whether I should use a hammer or a screwdriver after
examining the fastener?

--
Tom Gutman

Tom Gutman

unread,

Apr 18, 2001, 6:04:42 PM4/18/01

to

"Patrick Craig" <pat...@nag-j.co.jp> wrote in message
news:3ADD797C...@nag-j.co.jp...

>
>
> > >===== Original Message From "Matt Timmermans"
> > <ma...@timmermans.nospam-remove.org> =====
> > >And what do you mean he wasted all of our time? What were you
reading the
> > >thread for? We know that you can't compress random data, Mike
knows it, and
> > >Patrick knows it. Nobody was expecting a miracle, so what were
you hoping
> > >for that you didn't get here?
> > >

> > One might just wonder why some individuals are so inclined to
return to
> > these
> > so utterly hopeless issues.
> > I suppose solving real problems like how to make PPM
faster/better/more
> > efficient is too much of a challenge...
>
> What's the prize money for solving these problems?
>

> Patrick
>
Sorry, that is not acceptable. You already stated that you did not
expect to collect the prize money, and even though you had a
possible legal case you chose not to pursue it.

While you are under no obligation to tell us your motivation, if you
do choose to do so at least be truthful.

--
Tom Gutman

Willem-Jan Monsuwe

unread,

Apr 18, 2001, 5:15:05 PM4/18/01

to

Just one thing that bothers me:

) ...<snip>...
) considering that files are allocated in terms of sectors or blocks, you
) really only see a difference if you average over a great many files. Why
) take advantage of something that will decrease the "length" of a file a few
) bytes when that will rarely have any effect on disk space used. It makes
) more sense to go for the more general purpose solution that works whether
) you have a file or not.
) ...<snip>...

If I compare two compressors, A and B, and (on my data set) A averages two
bytes smaller than B, then that will _also_ be true on a filesystem that
allocates in sectors. If file A(x) is two bytes smaller than file B(x), and
the sector size is, say, 32Kb, then there will be a chance of 1/16384 that
A(x) will take one less block than B(x). That is, it will take _32Kb_ less.
So, _on_average_, A(x) _will_be_two_bytes_shorter_ ! So _please_ don't give
me this 'but filesystems store in terms of sectors' _crap_!

SaSW,
--
Willem (at stack dot nl)

Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !

#EOT

Tom Gutman

unread,

Apr 18, 2001, 6:35:51 PM4/18/01

to

"Dale King" <Ki...@TCE.com> wrote in message

news:3add...@news.tce.com...

You continue to misrepresent what David claims, and to misunderstand
what bijectiveness is all about. I have not seen David claim that
zero length files contain no information and can be ignored. I have
not seen anything to suggest that he is not aware that k files whose
lengths add up to n are not the same as a single file of length n.

For normal usage, I define a file as a finite sequence of byte. As
such, it implicitly has a byte length. It matters not how it
happens to be represented, that may or may not be a disk file
system, and the length may or may not be stored in some associated
data structure. If you choose to work with a different definition
of a file, then please state it clearly and accurately. If your
definition does not include all of what one typically finds on a
computer hard disk, please explain any discrepancy.

One can find subsets of files that have the prefix property (no file
in the subset is a prefix of any other file in the subset). Any
such subset will be a relatively small fraction of all files,
although they may be useful in some contexts.

I consider compressors as functions that map files to files.
Generally I require that the domain be all files. Usually the
codomain is also all files, although it may sometimes be desirable
to restrict the codomain to some subset with the prefix property.
It doesn't make much difference, although as shown below a
compressor with a restricted codomain cannot be optimal with respect
to compressors without such a restriction.

If one rates compressors purely on their ability to compress, then
an optimum compressor will be a bijection. This is independent of
whether the codomain is all files (in which case the optimal
compressor will be a permutation) or whether the codomain is
restricted to a subset. Since a bijection to a restricted codomain
will no longer be surjective when looked at with respect to the full
codomain, such a compressor will no longer be optimal in that
context.

--
Tom Gutman

x...@anadigi.com

unread,

Apr 18, 2001, 6:33:12 PM4/18/01

to

On Wed, 18 Apr 2001 09:58:56 -0400, "George Johnson"
<matr...@voyager.net> wrote:

>
> There should always be an inverse to any function.
>
>
>

This is not always true. One of the first couonterexamples is

Sin(0) = 0
The inverse "function", arcsine (0) = ...,-5pi,-4pi,-3pi, -2pi,
-pi,0,pi,2pi,3pi,4pi,...

The inverse function has an infininte number of answers, so it is not
a function.

Some people use the principle inverse, which limits the answer to
the range of (-pi .. pi], or (-pi/2 .. pi/2], or some other standard.

But basically, the transcendental functions do not have true inverses.
/nit-picking mode off

Dror Baron

unread,

Apr 18, 2001, 8:00:46 PM4/18/01

to

On Wed, 18 Apr 2001, Tom Gutman wrote:

> > > Geometry he would get lost. Its hard to mix the two geometres so
> > > I try to be of the type uses which ever one fits the problem.
> > > One has to be flexable.
> >
> > Dude, I rest my case. That's like cheating whichever
> > way works best for any given problem.
> >
> That's kind of weird! So you think it's like cheating for me to
> decide whether I should use a hammer or a screwdriver after
> examining the fastener?

He isn't only changing his tools. He is changing his
problem-definitions. Every time he adapts the rules,
the operating system, the way he determines the length,
every time he uses the most convenient set of rules.

Dror

Phil Norman

unread,

Apr 19, 2001, 2:30:04 AM4/19/01

to

On Wed, 18 Apr 2001 14:16:01 -0500, Dale King <Ki...@TCE.com> wrote:
>"SCOTT19U.ZIP_GUY" <david_...@emailv.com> wrote in message
>news:90875FF69H110W...@207.36.190.226...
>> Ki...@TCE.com (Dale King) wrote in <3add...@news.tce.com>:
>>
>> >I make no distinction between the file length or any part of the header.
>> >If it is outside of the data portion of the file and it is necessary for
>> >decompressing you have to count it. That is what I keep trying to get
>> >David to understand.
>>
>> Even DOS text files don't have the length in the data protion
>> of file. Why on earth would one force a compressed file to include
>> such data that other files don't have, Especailly since the goal
>> of the compressor is to make the file shorter. You seem to be
>> forgetting what compression of files is meant to be or not to be.
>
>First you assume all compressors are concerned with compressing to a "file".
>Most compressors aren't. Most compressors instead compress to a prefix-free
>code so that they can be stored in a file or the can be sent in a serial
>stream without encoding the length through some external means.

Wouldn't it be more sensible to include the length within the
transmission protocol? Given that uncompressed files do not
contain their own length within their data portion, if you need to
transmit an uncompressed file you'll need to have some external
(to the file) method of sending the length within the transmission
protocol. If you can also sent a compressed file which contains
its length, to remove redundancy you need to remove the length
from the transmission protocol and rely just on the one within the
compressed file. I can easily imagine a situation where coding a
program to receive such transmitted data would be considerably
more difficult if the data length has to be extracted from the
compressed data, rather than known on the level of the
transmission protocol.

So it makes perfect sense to never include the data length within
the compressed data, and instead ensure that whatever storage
medium, transmission protocol or whatever you're using has the
capability to transmit a length. If I found an existing protocol
which doesn't send lengths by default, I'd be very surprised.
Even early HTTP included this functionality, although it was
admittedly implemented by simply closing the connection to mark
the EOF.

Of course, there could be highly specialised cases where it would
be slightly more efficient the other way around, but these are
probably quite rare, and in such cases it's generally more
worthwhile using an existing protocol, since that can reduce the
development time.

Cheers,
Phil

Phil Norman

unread,

Apr 19, 2001, 2:38:45 AM4/19/01

to

>On Wed, 18 Apr 2001, George Johnson wrote:
>
>> There should always be an inverse to any function.

Here's a function. It inputs two chars (a, b) and outputs two
chars (res_a, res_b) (it outputs two chars simply to underline the
fact that the same number of bits go out as go in):

char a = some_value;
char b = some_other_value;

short result = a * b; // This is the function itself.

char res_a = result & 0xff;
char res_b = (result >> 8) & 0xff;

Now, what's the inverse of the function? 'a*b' is a particularly
simple function, so it *must* have an inverse, don't you think?

Cheers,
Phil

Patrick Craig

unread,

Apr 19, 2001, 4:57:22 AM4/19/01

to

>
> > > One might just wonder why some individuals are so inclined to
> return to
> > > these
> > > so utterly hopeless issues.
> > > I suppose solving real problems like how to make PPM
> faster/better/more
> > > efficient is too much of a challenge...
> >
> > What's the prize money for solving these problems?
> >
> > Patrick
> >
> Sorry, that is not acceptable. You already stated that you did not
> expect to collect the prize money, and even though you had a
> possible legal case you chose not to pursue it.

I was just trying to point out why there is so much interest in
this "utterly hopeless issue".

> While you are under no obligation to tell us your motivation, if you
> do choose to do so at least be truthful.

I've just updated my web page.

Patrick

--
Posted from IDENT:ro...@ns.nag-j.co.jp [211.0.18.242]
via Mailgate.ORG Server - http://www.Mailgate.ORG

Patrick Craig

unread,

Apr 19, 2001, 5:02:36 AM4/19/01

to

Hello

Excuse my ignorance, but I really don't know much about
compression theory.

> I am one to pay off barroom bets.

I can believe that.

> And this whole thread has my
> interest so I am willing to step in and offer 50 american dollars
> to Patrick if he can met my challenge by end of May.

Sounds good. If I fail I don't pay anything, right?

> Here is my contest. I have not checked what what you sent so
> you may already be a winner.

Could you check, and if I've already won send me the money?

> take the "current random file that was used in last contest"
> Use my code to add the number "zero". This just tacks a long
> value of zero to file 32 bits of zero. Than run DSC to meld
> the length to the previously given random file. It will change
> the original file by a few bytes. It could even get shorter.
> Take this new length call it X. However it will get no longer
> than 4 more than the oringal file.

Take Mike Goldman's original.dat, append 4 zero bytes and run the
result through DSC (Dave Scott's Compressor?). X is the byte
length of the resulting file.

> The goal is to archive the files you used together to a single
> file and see if this resulting file is less than X bytes long.
> data can't be hidden in file name flags all data used must be
> in the file. However that archive is only data no file names are
> in it except in your script.

So I have to create a single archive file that contains all my files
(dfi,comp,comp.0,comp.1,...,comp.217) whose byte length
is shorter than X without storing data in the file header, but
I don't have to store the filenames in the archive.

> First step in build archive. This is a First in Last out type
> of archive. So just like the original data file to the program
> add a long zero than run DSC. The output file is the archive
> at this point. Next take the first data file you use add it to
> the acrhive by appending the archive file to the end of the
> data file. Then write a long to end of file and run DSC again
> to get new archive.
> Repeat for each addition data as stated here.
> Append the archive to the current data file add long
> value to end of current file. Then run DSC to creat new
> archive file.
> At this point you should have a 8-bit byte type of file that sort of
> contains the complexity of the orginal file.

This sounds like instructions on how to create an archive using
your software.

> If this resulting
> file can be made shorter than X you win.

By what method?

Dale King

unread,

Apr 19, 2001, 10:15:50 AM4/19/01

to

<x...@anadigi.com> wrote in message news:3ade1477....@news.mnsinc.com...

More precise nit picking. He did not say there should be an inverse function
to any function. Every binary relation (whether it is a function or not)
does have an inverse relation. The question is whether that inverse relation
is a function or not. Recall the definition of a function:

A function from set A to a set B is a binary relation R on A and B with the
special property that for each element a in A there is exactly one ordered
pair in R with first component a.

That means that a function must be defined for each element in the domain
and each element in the domain can only correspond to one element in the
codomain. Sin meets this qualification, but arcsine doesn't.

If a relation R and its inverse relation are both functions for a given
domain and codomain, then they are bijective with respect to that domain and
codomain.
--
Dale King

Joshua Schmier

unread,

Apr 19, 2001, 10:46:53 AM4/19/01

to

My final word to the thread: to meet Goldman's challenge you must meet one
very simple and obvious requirment. You must compress the data in accordance
to the definition(s) in the comp.compression FAQ. There is no loophole that
allows you to exploit file systems, as the definition clearly states that
such exploits must be counted towards the size of the compressed data.

It is true that Patrick accomplished the task, and I do believe that he
actually won the bet which was proposed to him (be it through trickery or
otherwise - we must admit that the challenge is a trick, as we know it is
impossible). However, the challenge itself was never in harm of being
accepted, nor do I believe that it was.

Is the bet still open? ;)

SCOTT19U.ZIP_GUY

unread,

Apr 19, 2001, 11:40:05 AM4/19/01

to

pat...@nag-j.co.jp (Patrick Craig) wrote in <3ADEA908.FAA92774@nag-
j.co.jp>:

>Hello
>
>Excuse my ignorance, but I really don't know much about
>compression theory.
>
>
>> I am one to pay off barroom bets.
>
>I can believe that.
>
>> And this whole thread has my
>> interest so I am willing to step in and offer 50 american dollars
>> to Patrick if he can met my challenge by end of May.
>
>Sounds good. If I fail I don't pay anything, right?

correct. And there is only one winner. After your
time period expires it open for a nonth to any one.
I also think if no one wins I may change more and
more rules till someone does win. "I may" and the
reason I may is to see if the so called random
string is random in the k sense. I really have my
doubts and it has caught my interest.

>
>
>> Here is my contest. I have not checked what what you sent so
>> you may already be a winner.
>
>Could you check, and if I've already won send me the money?

Yes I will check. I don't have the archiver yet. I need
to write a few more routines then will write a batch stream
to to it. But DSC is the heart of it. It will be at last
a week before I get the archiver but you can do a partial
check your self. Since the max expansion when combining
is pretty easy to check.

There are four main reasons for the bet. One I don't
trust the data is random and two you should get a chance
at some money since you really did win the bet with Mike.
Three I want some public exposure to the routine DSC
which bijectively combines a number 0 to N-1 to file N
such that any byte file can be thougt of as a number
plus some file. It is an optimal way to combine such
a number. For example take all files of 256 bytes in
length the pointer would seem to be 256 values. Yet
when I add data to the file in the optimal way No file
is longer than 257 bytes and many shorter than the
original file of 256 bytes. though the vast majority
is going to be 257 bytes.Four I like japanese beer
almost as much as german beer and your in Japan.

Actually it is highly unlikely you have alreasy won. Since
as archive is built each new file add would "most of the time"
add to the length in this way take current length of
archive add lentgh of file that is to be added. Then add
the number of bytes need to represent that number in binary.

>
>
>> take the "current random file that was used in last contest"
>> Use my code to add the number "zero". This just tacks a long
>> value of zero to file 32 bits of zero. Than run DSC to meld
>> the length to the previously given random file. It will change
>> the original file by a few bytes. It could even get shorter.
>> Take this new length call it X. However it will get no longer
>> than 4 more than the oringal file.
>
>Take Mike Goldman's original.dat, append 4 zero bytes and run the
>result through DSC (Dave Scott's Compressor?). X is the byte
>length of the resulting file.

Actually DSC is "Data Sensisitive Combiner" but I use to use
DS a lot at work making up terms like "Data Scaler Constant" if
one evers looks at code I wrote it was my common practice.
I used others people intials it they were invloved in the
projects. It helped me remember names and associates with what
part of project I had to interface with them on. Sometimes
I used long names like CHADS for "Carl Hall and David Scott"
I think for docs called it something like "Cordinate Handler
and Direction Scaler" or something close to that.

I said four zeros since that is what a long is. But the
code I have is all GNU C using DJGPP. With adding another
file to archive say 50 bytes. You put the 50 bytes in
front of current archive file and then add the long of 50
and end of archive file. DSC then combines the long with
the file in an optimal way. Which in 99% of cases is just
the min lenght in bytes that would be needed to describe
the total length of current archive. So while for dfi you
could add one byte for each routine when total length
greater than 256 your need 2 bytes when greater then
64k think of 3 byts. This appoximation will most likely
be when in one byte of final file length.

>
>> The goal is to archive the files you used together to a single
>> file and see if this resulting file is less than X bytes long.
>> data can't be hidden in file name flags all data used must be
>> in the file. However that archive is only data no file names are
>> in it except in your script.
>
>So I have to create a single archive file that contains all my files
>(dfi,comp,comp.0,comp.1,...,comp.217) whose byte length
>is shorter than X without storing data in the file header, but
>I don't have to store the filenames in the archive.

Well the archive method I discribe doen't use file names.
however your "script does" so I am assuming as you unarchive
it will put first in a file called comp. then comp.o ...
comp.n then last to dfi. Sure the archiver needs the name
to store the data but only the data is stored. and on unarchiving
the comp. comp.o etc is such that comp.n is followed by comp.n+1
when the archive gets to last file signaled by the UNDSC putting
a long 0 on the file. then that file minus the long 0 is the dfi
file.

>
>
>> First step in build archive. This is a First in Last out type
>> of archive. So just like the original data file to the program
>> add a long zero than run DSC. The output file is the archive
>> at this point. Next take the first data file you use add it to
>> the acrhive by appending the archive file to the end of the
>> data file. Then write a long to end of file and run DSC again
>> to get new archive.
>> Repeat for each addition data as stated here.
>> Append the archive to the current data file add long
>> value to end of current file. Then run DSC to creat new
>> archive file.
>> At this point you should have a 8-bit byte type of file that sort of
>> contains the complexity of the orginal file.
>
>This sounds like instructions on how to create an archive using
>your software.

Yes that what it is.

>
>
>> If this resulting
>> file can be made shorter than X you win.
>
>By what method?

I don't understand the question. We are counting the archive
file in BYTES period. I should have used byte LENGTH of the
data file as X but was generous and and used LENGTH of file
after the zero long and DSC. In fact its possible that X
is smaller the the Length of data file mike sent. I haven't
looked at the end of the file. But if it ends in say 5 bytes
with same value then X could be smaller than the the length of
file which would make me think the random data was not very
random.

David A. Scott
--
SCOTT19U.ZIP NOW AVAILABLE WORLD WIDE "OLD VERSIOM"
http://www.jim.com/jamesd/Kong/scott19u.zip
My website http://members.nbci.com/ecil/index.htm
My crypto code http://radiusnet.net/crypto/archive/scott/
MY Compression Page http://members.nbci.com/ecil/compress.htm
**NOTE FOR EMAIL drop the roman "five" ***

Dale King

unread,

Apr 19, 2001, 12:12:42 PM4/19/01

to

"Tom Gutman" <Tom_G...@compuserve.com> wrote in message

news:9bl4oo$c6h$1...@sshuraaa-i-1.production.compuserve.com...

> "Dale King" <Ki...@TCE.com> wrote in message

> > I make no distinction between the file length or any part of the
> header. If
> > it is outside of the data portion of the file and it is necessary
> for
> > decompressing you have to count it. That is what I keep trying to
> get David
> > to understand.
> >
> You continue to misrepresent what David claims, and to misunderstand
> what bijectiveness is all about. I have not seen David claim that
> zero length files contain no information and can be ignored. I have
> not seen anything to suggest that he is not aware that k files whose
> lengths add up to n are not the same as a single file of length n.

I have seen plenty to suggest that he is not aware of that. While not
specifically mentioning zero length files, he has said basically that a k
byte message contains only k bytes or 8*k bits of information. If k can vary
and is not known beforehand a k byte message carries more than k bytes of
information. In addition to the k bytes of information it contains the
information that it is only k bytes long. David disputed that. Extending
that logic to k = 0, says that a variable length message of 0 bytes contains
zero information.

The information content of a sequence is defined as the number of bits
required to transmit that sequence using an optimal encoding. If the length
of a message can vary, then it requires more than k bytes of information to
transmit a message of k bytes.

That is the fundamental flaw in David's argument. He says basically that for
any value of k, any sequences of k bytes only requires k bytes to represent
it. That disagrees with basic information theory.

> For normal usage, I define a file as a finite sequence of byte. As
> such, it implicitly has a byte length. It matters not how it
> happens to be represented, that may or may not be a disk file
> system, and the length may or may not be stored in some associated
> data structure. If you choose to work with a different definition
> of a file, then please state it clearly and accurately. If your
> definition does not include all of what one typically finds on a
> computer hard disk, please explain any discrepancy.

Well in information theory one doesn't talk in terms of files because that
can lead you into fallacies like David's. The correct way to view it is as
data sources. That way there is no file headers to store data in. A data
source is basically an infinite source of symbols. There is no finite
sequence of symbols. A finite sequence of symbols is basically a source that
sources the sequence and thereafter only sources the last symbol. Therefore
when the last symbol is transmitted or read the information content of that
source drops to zero because nothing new will be learned by reading from the
source at that point.

In order for the reader of the data source to know that there is no more
information that must somehow be part of the stream of symbols or else it is
known beforehand for the case of fixed length sequences. The reason you
don't want to think of the stream of symbols as having an associated length
is that length is also information. If the receiver of the data source needs
to know that information it must be transmitted and the transmission must be
counted. If you try to treat it as a separate thing it is easy to fool
yourself into thinking that it doesn't count as David does.

In order to be able to uniquely decipher a message, it must be prefix free.
Which means that no message can be the prefix of another sequence. Consider
if a source could have both a message "abcd" and another message "abcdd".
After one reads "abcd" how do you know whether this is the message "abcd" or
just the start of the message "abcdd"? David would say you just see if there
are any more symbols. But how do you know if there are any more symbols? The
only way is to look at the length. But if you need to look at the length to
decode it that length must be counted as it is necessary information that
must be transmitted. That is why the idea of data sources is so fundamental
to information theory. It only gives you one way to transmit the
information. If you talk about files you have two ways to transmit
information, in the data portion of the file and in the file header.

There are basically 2 ways to view a file as a data source. One can view it
as a length that is prefixed to the data. This is basically how it is
usually stored on a hard disk. Not that the finite sequence for a disk file
is often longer than the actual "file length" due to the fact that files are
allocated in terms of sectors.

Or one can view it as a source that has a 257 symbol alphabet which has the
additional EOF symbol. Once an EOF symbol is sent by the data source, it
will continue to be sent forever. That is the way a file is usually viewed
in a programming language, such as C. These are entirely consistent with
hard drive technology.

> One can find subsets of files that have the prefix property (no file
> in the subset is a prefix of any other file in the subset). Any
> such subset will be a relatively small fraction of all files,
> although they may be useful in some contexts.

Quit assuming compression is to files. It can be but often isn't. If you
look at a file correctly as a data source, you see that a file is a subset
of all possible bit sequences. David thinks it is all possible bit sequences
because he splits off the length information and pretends it doesn't count.

> I consider compressors as functions that map files to files.
> Generally I require that the domain be all files. Usually the
> codomain is also all files, although it may sometimes be desirable
> to restrict the codomain to some subset with the prefix property.
> It doesn't make much difference, although as shown below a
> compressor with a restricted codomain cannot be optimal with respect
> to compressors without such a restriction.

I consider a compressor a function that maps one sequence of symbols to
another sequence of symbols. I don't assume files. I could compress a file
into an 8VSB stream of symbols which are not even bytes. In that case the
set of inputs is completely disjoint from the set of inputs. How do you
compare that compressor to some other compressor. The only way is to look at
the information content.

> If one rates compressors purely on their ability to compress, then
> an optimum compressor will be a bijection.

You mean permutation.

> This is independent of
> whether the codomain is all files (in which case the optimal
> compressor will be a permutation) or whether the codomain is
> restricted to a subset.

That is where the problem is. You consider a set of prefix-free sequences to
be a subset of the set of files. If you store a prefix-free string into a
file it will be a subset of all files. In other words a prefix-free code
encoded as a file will be a subset of all files.

Most compressors do not assume they will be stored in a file. If you look at
it in the light of information theory you see that the set of can actually
be disjoint sets (they could overlap, or be the same). Consider the view of
files as strings over a 257 symbol alphabet with EOF as the last symbol in a
string. Prefix-free codes are a language over a 256 symbol alphabet. In that
case they are disjoint sets. For the view of files with a prefixed length a
set of prefix-free strings could be disjoint, could overlap, could be the
same as, or could be a subset of the set of "files".

> Since a bijection to a restricted codomain
> will no longer be surjective when looked at with respect to the full
> codomain, such a compressor will no longer be optimal in that
> context.

Right. A prefix free code stored in the data portion of a file is not the
optimal encoding in the context of "files". But a prefix-free code can be
optimal in the context of prefix-free codes. (A prefix-free language could
be suboptimal if there exists any string that is not the prefix of a string
in the language and no string in the language is a prefix of it. This would
make it a subset of some other prefix-free language).

Most compressors are compressing to the context of prefix-free codes,
because that is what is most generally usefull. It makes no assumptions
about the context. The fallacy is to get locked into this file mindset and
think that is the only context of any importance. David is basically
criticizing other compressors because they don't restrict themselves to the
file context.

Why don't more compressors restrict themselves to the file context.
Basically because the gain is minimal, 8 bytes maximum per compression when
averaged over a large number of compressions. (Note that is a more optimal
encoding, not a more optimal compression). But the fact is why worry about
that piddly amount when a file system is already so far from optimal to
begin with. If a 1 byte file can take 32K of disk space, why worry about 8
bytes. Particularly when doing so restricts you such that you only work when
stored as a file. It is better to go for a slightly less optimal encoding in
the file domain when that opens you up to all other domains, like serial
streams, compression to memory, etc.
--
Dale King

Dror Baron

unread,

Apr 19, 2001, 12:11:25 PM4/19/01

to

On 19 Apr 2001, SCOTT19U.ZIP_GUY wrote:

> I also think if no one wins I may change more and
> more rules till someone does win. "I may" and the
> reason I may is to see if the so called random
> string is random in the k sense. I really have my
> doubts and it has caught my interest.

Read about two things in any information theory text,
i.e., "elements of information theory" by Cover Thomas
which I cite in most my papers:
1. Kraft inequality (also known as the "counting argument").
2. Kolmogorov complexity.

Your conclusion will be that "truly random" data cannot
be compressed. Not "on average", anyway.
Kolmogorov complexity is more of a philosophy than a
practical tool...

> Three I want some public exposure to the routine DSC
> which bijectively combines a number 0 to N-1 to file N

David, trust me, you have provided it lots of
exposure. You are quite famous by now, at least on
this newsgroup :)

Dror

Dale King

unread,

Apr 19, 2001, 12:27:35 PM4/19/01

to

"Phil Norman" <Phil....@genedata.com> wrote in message
news:slrn9dt1fc.3e...@jalua.ch.genedata.com...

It is perfectly fine to rely on the transmission protocol to send the
length. Nowhere have I disputed that. That forms a more optimal encoding for
that protocol. BUT it is wrong to say that you have achieved better
compression by doing so. The information must be sent to the decompressor to
reconstruct the data. You can rely on the transmission protocol to send it
for you, but that doesn't mean you can ignore it when computing compression
ratio. If I sent it using a protocol that did not send the length for me I
would have to encode the length within the data. Does the fact that you used
a protocol that sent the length for you, mean that you compressed the data
better than me? No, because the length information is being sent in both
cases, the only difference is where it is being sent.

> So it makes perfect sense to never include the data length within
> the compressed data, and instead ensure that whatever storage
> medium, transmission protocol or whatever you're using has the
> capability to transmit a length. If I found an existing protocol
> which doesn't send lengths by default, I'd be very surprised.

Connect a null modem between 2 PC's, you have a transmission protocol that
only sends bytes. If you want to send a length you have to do it yourself.
Or consider compressing something to RAM.If you have a decompressor function
that only accepts a pointer to the beginning of the data. You could add a
length parameter, but that constitutes sending the information so it must be
counted.

> Of course, there could be highly specialised cases where it would
> be slightly more efficient the other way around, but these are
> probably quite rare, and in such cases it's generally more
> worthwhile using an existing protocol, since that can reduce the
> development time.

The reason to encode within the data instead of rely on the protocol is that
it doesn't rely on the protocol. I could send the data with any protocol
then. I could target encoding for a particular protocol, but I would gain
little difference in the encoding but restrict myself to only work with that
protocol.

--
Dale King

Dale King

unread,

Apr 19, 2001, 12:43:59 PM4/19/01

to

"Phil Norman" <Phil....@genedata.com> wrote in message

news:slrn9dt1vl.3e...@jalua.ch.genedata.com...

Simple the inverse relation for that function is the relation R over the
sets A, B, C, D such that the tuple ((c,d),(a,b)) where a is in A, etc. is
in the relation R iff c = (a*b)%256 and d = floor( a * b / 256 ).

Over the sets A, B, C, D all equal to the set of integers from 0 to 255 that
relation would include these tuples for example:

( (0,6), (1,6) ), ( (0,6), (6,1) ), ( (0,6), (2,3) ), ( (0,6), (3,2) )

This relation would not be a function since your function is not a bijection
for these sets. Over different sets it would be a function. For instance for
A,B,C,D = { 0 } the inverse is a function. Over this domain and codomain the
set is a bijection, more specifically a permutation (actually the function
is an equivalence relation).
--
Dale King

SCOTT19U.ZIP_GUY

unread,

Apr 19, 2001, 12:46:36 PM4/19/01

to

dba...@jadzia.ifp.uiuc.edu (Dror Baron) wrote in
<Pine.GSO.4.21.010419...@jadzia.ifp.uiuc.edu>:

>On 19 Apr 2001, SCOTT19U.ZIP_GUY wrote:
>
>> I also think if no one wins I may change more and
>> more rules till someone does win. "I may" and the
>> reason I may is to see if the so called random
>> string is random in the k sense. I really have my
>> doubts and it has caught my interest.
>
>Read about two things in any information theory text,
>i.e., "elements of information theory" by Cover Thomas
>which I cite in most my papers:
>1. Kraft inequality (also known as the "counting argument").
>2. Kolmogorov complexity.

I understand the counting argument and I see why there
are strings that can not be compressed by given methods.
What I also see is that if one is limited to a single file
there do exist files that can't be expanded to. If one
is locked in to a set of rules. However when one gets
the data first as in the challenge of Mike and Patrick
then since the rules for the program part of a file are
not fixed. Example I might say its script or I might pick
from several different meanings. In this case if one
considers an arhive file. It may be shorter than the
random file simply beacuse when its broken to the data
componts and the so called program compontents. One needs
to only realize that if one has say 1 byte of program space
you could say there are only 256 programs. But that is not
true. You could say its scritpt for method X or script for
method Y since you seem to be free to pick the type of program
them there is many more programs than what the one byte means
since I could arbitrarily pick from severl types. This means
that to consider a string complex one must lock in the
rules of the "program" because if you don't then no
long string is complex enough that it can't be reduced.

and I will look for book at store thanks

>
>Your conclusion will be that "truly random" data cannot
>be compressed. Not "on average", anyway.
>Kolmogorov complexity is more of a philosophy than a
>practical tool...
>
>> Three I want some public exposure to the routine DSC
>> which bijectively combines a number 0 to N-1 to file N
>
>David, trust me, you have provided it lots of
>exposure. You are quite famous by now, at least on
>this newsgroup :)
>
>Dror
>
>

Paul Mitchell

unread,

Apr 19, 2001, 1:24:23 PM4/19/01

to

We have an ascii file on a solaris (2.7) system:

file ./li11to20
./li11to20: ascii text

of considerable size:

head ./li11to20
head: ./li11to20: Value too large for defined data type

gzip ./li11to20
./li11to20: Value too large for defined data type

ls -l
total 2486440
-rw-r----- 1 cporter scprogs 2215677248 Apr 16 14:43 li11to20

The user wants to compress the file in such a way that they can move it to
a (gulp) windows box and uncompress it there. The ideal solution would
be:

zip -l ./li11to20

however, this inevitably produces:

zip error: Nothing to do! (./li11to20.zip)

echo $?
12 (no surprise here!)

I'm assuming that zip is unable to deal with the size as well.
Independent of just cutting the file into smaller pieces, is there some
solaris compatible method of compresssion which would provide the CR LF
string for the Unix LF

Paul

==============================================================================
Paul Mitchell
email: pmit...@email.unc.edu
phone: (919) 962-9778
office: I have an office, room 28, Phillips Hall
==============================================================================

SCOTT19U.ZIP_GUY

unread,

Apr 19, 2001, 1:42:54 PM4/19/01

to

Ki...@TCE.com (Dale King) wrote in <3adf...@news.tce.com>:

>That is the fundamental flaw in David's argument. He says basically that
>for any value of k, any sequences of k bytes only requires k bytes to
>represent it. That disagrees with basic information theory.

Actaully that is not what I say. What I say is if one wishes to
represent a K bytes file as a K bytes files. Then one only needs
K bytes to discribe it. NAMELY IDENTITY. Just bacause a Marnran
Orange is an Orange does mean its wrong to call it citrus fruit.

If one wish to express the information of a K byte file as a
string of bits. It usually takes a smaller string than 8*K.
that is what I say. But then again you never really bother to
look at the other side.

What pisses me off is you over and over say I say this and
that. Becuase you want to warp it to something it is not.
Again I must remind you I am talking about file compression
you can talk about something else if you wish but I don't wish to.

Dale King

unread,

Apr 19, 2001, 2:43:27 PM4/19/01

to

"Willem-Jan Monsuwe" <wil...@toad.stack.nl> wrote in message
news:slrn9ds106....@toad.stack.nl...

> Just one thing that bothers me:
>
> ) ...<snip>...
> ) considering that files are allocated in terms of sectors or blocks, you
> ) really only see a difference if you average over a great many files. Why
> ) take advantage of something that will decrease the "length" of a file a
few
> ) bytes when that will rarely have any effect on disk space used. It makes
> ) more sense to go for the more general purpose solution that works
whether
> ) you have a file or not.
> ) ...<snip>...
>
> If I compare two compressors, A and B, and (on my data set) A averages two
> bytes smaller than B, then that will _also_ be true on a filesystem that
> allocates in sectors. If file A(x) is two bytes smaller than file B(x),
and
> the sector size is, say, 32Kb, then there will be a chance of 1/16384 that
> A(x) will take one less block than B(x). That is, it will take _32Kb_
less.
> So, _on_average_, A(x) _will_be_two_bytes_shorter_ ! So _please_ don't
give
> me this 'but filesystems store in terms of sectors' _crap_!

If you plan to disagree with me that usually involves giving an opinion that
contradicts mine. All you did was restate my opinion. Note that I said you
will "see a difference if you average over a great many files."

Reread my sentence again. While you can make files a few bytes shorter on
average, the problem is that limits you so that you only work when stored to
a file. It makes more sense to suffer that when your compressor output is
stored to a file it will be a few bytes larger on average if it allows your
compressor output to work no matter how it is transmitted.

It is sort of like having a design for a car that improves gas mileage by
one mile per gallon, but only works on specially designed roads. I'd give up
the 1 mpg for the ability to drive wherever I wanted.
--
Dale King

Tom Gutman

unread,

Apr 19, 2001, 4:06:06 PM4/19/01

to

"Dale King" <Ki...@TCE.com> wrote in message

news:3adf...@news.tce.com...

As stated, that is meaningless. Any sequence can be transmitted
with only one bit provided one has a suitably prepared receiver.
You need a lot of context before you can make a statement like that.

>
> That is the fundamental flaw in David's argument. He says
basically that for
> any value of k, any sequences of k bytes only requires k bytes to
represent
> it. That disagrees with basic information theory.

Exactly which argument is that? I see you saying that he says that,
but I don't see him saying that. And this has nothing to do with
the issue of bijection. The concept and optimality of bijection has
nothing to do with how a file length is stored.

If you do not wish to work with files, that is fine. But then you
have nothing to say about systems that work with files. You are
inhabiting a different universe with different rules.

I also take exception to your characterization of a finite sequence.
In general it is no knowable if a source has produced its last
symbol and is repeating itself. Thus if I consider sources that
produce letters as symbols, I cannot say that "ab" and "abb" are the
same because the last symbol is repeated.

Until you provide some form of redundancy, restriction, out of band
signals, or similar you cannot talk about the last symbol and hence
cannot talk about a finite source. The C run time library get
character routine is not the ultimate nor fundamental definition of
how a finite source (such as a file) is delimited. And it works
only by restricting the significant data to using 256 out of a
possible 4294967296 symbols.

But just about all real sources do have some redundancy that allows
them to delimit the data. A common encoding scheme used not all
that long ago on disks and still used for many networks actually has
an alphabet of four symbols, but uses only two of them for normal
data. The remaining two are used only for special markers. Another
approach, commonly used on communication links is to use a
multi-symbol escape sequence and then use escapes to bracket
segments of data. This is because while infinite sources are good
for defining and calculating information rates, people really want
to work with files (finite sequences of bytes) and for other reasons
actually want to transmit smaller units (typically called records or
packets).

>
> In order for the reader of the data source to know that there is
no more
> information that must somehow be part of the stream of symbols or
else it is
> known beforehand for the case of fixed length sequences. The
reason you
> don't want to think of the stream of symbols as having an
associated length
> is that length is also information. If the receiver of the data
source needs
> to know that information it must be transmitted and the
transmission must be
> counted. If you try to treat it as a separate thing it is easy to
fool
> yourself into thinking that it doesn't count as David does.

When dealing with compression I usually don't want to work with
streams at all. I want to work with files -- and files have
lengths. I am concerned with the data on my disk, not with the
output of my Geiger counter.

>
> In order to be able to uniquely decipher a message, it must be
prefix free.
> Which means that no message can be the prefix of another sequence.
Consider
> if a source could have both a message "abcd" and another message
"abcdd".
> After one reads "abcd" how do you know whether this is the message
"abcd" or
> just the start of the message "abcdd"? David would say you just
see if there
> are any more symbols. But how do you know if there are any more
symbols? The
> only way is to look at the length. But if you need to look at the
length to
> decode it that length must be counted as it is necessary
information that
> must be transmitted. That is why the idea of data sources is so
fundamental
> to information theory. It only gives you one way to transmit the
> information. If you talk about files you have two ways to transmit
> information, in the data portion of the file and in the file
header.

Only if you choose to embed the message into a stream with no
delimiters. David does not choose to do that. I do not choose to
do that.

>
> There are basically 2 ways to view a file as a data source. One
can view it
> as a length that is prefixed to the data. This is basically how it
is
> usually stored on a hard disk. Not that the finite sequence for a
disk file
> is often longer than the actual "file length" due to the fact that
files are
> allocated in terms of sectors.
>
> Or one can view it as a source that has a 257 symbol alphabet
which has the
> additional EOF symbol. Once an EOF symbol is sent by the data
source, it
> will continue to be sent forever. That is the way a file is
usually viewed
> in a programming language, such as C. These are entirely
consistent with
> hard drive technology.

So? I choose to view files as files (finite sequences of bytes). I
see no reason to map them into data sources, at least in terms of
any theory. Some file interfaces may be viewed as sources, but that
is an issue for that particular interface, not for the concept of a
file itself.

However, if you do choose to represent files as sources that have a
257 character alphabet but choose to waste bandwidth by not using
the 257'th character for data and further guaranteeing that once
they output the 257'th character they will output an infinite string
of such characters, it doesn't make any difference. David's
compressors are designed to use the same set as domain and codomain.
So you can design them to use one of your 256+1 symbol sources as
input, and their output will be a similar 256+1 source. He claims
that his methods improve the compression, as measured by the number
of characters preceding the EOF, using the same basic algorithm.

You seem to be claiming that the only compressors worthy of
consideration take a 256+1 source as input and produce a pure 256
symbol source as output, with the behaviour of the output source
undefined past some point (where the decompressor starts producing
EOFs). While there might be special cases where this is desirable,
it seems rather bizzarre to make it a general requirement.

>
> > One can find subsets of files that have the prefix property (no
file
> > in the subset is a prefix of any other file in the subset). Any
> > such subset will be a relatively small fraction of all files,
> > although they may be useful in some contexts.
>
> Quit assuming compression is to files. It can be but often isn't.
If you
> look at a file correctly as a data source, you see that a file is
a subset
> of all possible bit sequences. David thinks it is all possible bit
sequences
> because he splits off the length information and pretends it
doesn't count.

David is working with compressors from files to files. That is his
universe of discourse. If you do not find that interesting and wish
to consider compressors in other contexts, fine. But then don't
bother making irrelevant comments on David's work.

>
> > I consider compressors as functions that map files to files.
> > Generally I require that the domain be all files. Usually the
> > codomain is also all files, although it may sometimes be
desirable
> > to restrict the codomain to some subset with the prefix
property.
> > It doesn't make much difference, although as shown below a
> > compressor with a restricted codomain cannot be optimal with
respect
> > to compressors without such a restriction.
>
> I consider a compressor a function that maps one sequence of
symbols to
> another sequence of symbols. I don't assume files. I could
compress a file
> into an 8VSB stream of symbols which are not even bytes. In that
case the
> set of inputs is completely disjoint from the set of inputs. How
do you
> compare that compressor to some other compressor. The only way is
to look at
> the information content.

Are you talking about finite or infinite sequences? And are you
talking about the same set of symbols for the input and output, of
different alphabets? If finite sequences, then it doesn't much
matter -- although if the alphabet is not closely related to bytes
you may have the problem of mapping them into bytes for
representation on most hardware. But the basic properties are quite
independent of any particular representation, so if you never need
to map anything into bytes you still get the same conclusions, just
with somewhat different terminology.

>
> > If one rates compressors purely on their ability to compress,
then
> > an optimum compressor will be a bijection.
>
> You mean permutation.

No, I mean bijection. While if the domain and codomain are the
same, then it will be a permuation, if you read my preceding
paragraph you will notice that I am allowing them to be different.
The conclusion stated still holds.

>
> > This is independent of
> > whether the codomain is all files (in which case the optimal
> > compressor will be a permutation) or whether the codomain is
> > restricted to a subset.
>
> That is where the problem is. You consider a set of prefix-free
sequences to
> be a subset of the set of files. If you store a prefix-free string
into a
> file it will be a subset of all files. In other words a
prefix-free code
> encoded as a file will be a subset of all files.

I have defined what a file is. Since that definition allows one
file to be a prefix of another file, clearly any set of files where
this is not true must be a subset. In general, the only way you are
going to get any additional properties in a set of files (whether
the property is one for each individual file or only for the set as
a whole) is to have a proper subset of all possible files.

>
> Most compressors do not assume they will be stored in a file. If
you look at
> it in the light of information theory you see that the set of can
actually
> be disjoint sets (they could overlap, or be the same). Consider
the view of
> files as strings over a 257 symbol alphabet with EOF as the last
symbol in a
> string. Prefix-free codes are a language over a 256 symbol
alphabet. In that
> case they are disjoint sets. For the view of files with a prefixed
length a
> set of prefix-free strings could be disjoint, could overlap, could
be the
> same as, or could be a subset of the set of "files".

Your representation of files as finite sequences of bytes followed
by one non-byte EOF character is trivially homomorphic to my
definition of files a finite sequences of bytes. Your chosen prefix
free subset of all files will therefor map back to your
representation of files+EOF, and still be a subset. As a practical
matter, with current technology building an efficient 257 state
storage cell is something of a challenge.

>
> > Since a bijection to a restricted codomain
> > will no longer be surjective when looked at with respect to the
full
> > codomain, such a compressor will no longer be optimal in that
> > context.
>
> Right. A prefix free code stored in the data portion of a file is
not the
> optimal encoding in the context of "files". But a prefix-free code
can be
> optimal in the context of prefix-free codes. (A prefix-free
language could
> be suboptimal if there exists any string that is not the prefix of
a string
> in the language and no string in the language is a prefix of it.
This would
> make it a subset of some other prefix-free language).

I have not even considered the issue of optimality of a prefeix free
subset -- only the optimality of a compressor. A compressor that
restricts itself to a prefix free subset for its output will bu
suboptimal with respect to all compressors, without such a
restriction. A compressor that is optimal with respect to
compressors restricting themselves to such an output set will be a
bijection.

>
> Most compressors are compressing to the context of prefix-free
codes,
> because that is what is most generally usefull. It makes no
assumptions
> about the context. The fallacy is to get locked into this file
mindset and
> think that is the only context of any importance. David is
basically
> criticizing other compressors because they don't restrict
themselves to the
> file context.

Not at all. He is pointing out the advantages of bijectivity, and,
as I have repeatedly pointed out, that holds even if you restrict
the output to a limited subset of files.

>
> Why don't more compressors restrict themselves to the file
context.
> Basically because the gain is minimal, 8 bytes maximum per
compression when
> averaged over a large number of compressions. (Note that is a more
optimal
> encoding, not a more optimal compression). But the fact is why
worry about
> that piddly amount when a file system is already so far from
optimal to
> begin with. If a 1 byte file can take 32K of disk space, why worry
about 8
> bytes. Particularly when doing so restricts you such that you only
work when
> stored as a file. It is better to go for a slightly less optimal
encoding in
> the file domain when that opens you up to all other domains, like
serial
> streams, compression to memory, etc.

I've never claimed that the gains were significant. You are
claiming that they are completely illusionary, and I am disagreeing
with that claim. If you've read my postings to David you might have
noticed that I already pointed out that while the thoretical gains
are there, they may not have any practical significance.

Where did you get the limit on the gains by not imposing a prefix
free subset restriction? It would depend on the technology used.
If one uses Huffman encoding and the 257 symbol alphabet approach,
at least one of the 256 data symbols must have its code increased by
at least one bit. Since that code could appear at a frequency
approaching one in 256, the theoretical limit on the expansion
entailed by the EOF encoding would appear to be around one part in
four thousand. That can be quite a bit more than eight bytes, even
for reasonably sized files.

For memory compression, I have no problems with always passing a
length along with an address when specifying a string. I need to do
that for the input file anyway, as there is no way to insure that a
random input file is a member of a known prefix free subset.

Another case of storage compression has recently come up here. The
requirement was to compress a large number of relatively short
strings. Thus even a fairly small saving per string could be
significant. There was also a need to be able to quickly access any
individual string. If a table of pointers to the start of each
compressed string were used to implement the access capability, that
table would also provide the length for each compressed string. In
such a context it would be wasteful to insist on a prefix free
encoding to provide redundant data.

If you have to output the compressed file (and only compressed
files) to a very primitive serial stream (more primitive than a
standard teletype, where the out of band break signal can be used to
indicate an end of file) a prefix free code can be useful. I would
expect that I would be more likely to want to send it over a TCP
connection, where there are several ways to mark an end of file
(including the primitive but effective one of simply closing the
connection at the end of the file).
>
>
>

--
Tom Gutman

Tom Gutman

unread,

Apr 19, 2001, 4:10:40 PM4/19/01

to

"Dror Baron" <dba...@chopin.ifp.uiuc.edu> wrote in message
news:Pine.GSO.4.21.01041...@chopin.ifp.uiuc.edu...

Could you supply a more specific example? As far as I can tell, he
consistenly works with the concept of a file as a finite sequence of
bytes. He doesn't care how the sequence is delimited (a passed
length, a return code, an extension character) or how it is
externally stored.

--
Tom Gutman

SCOTT19U.ZIP_GUY

unread,

Apr 19, 2001, 3:58:26 PM4/19/01

to

Ki...@TCE.com (Dale King) wrote in <3adf...@news.tce.com>:

>Reread my sentence again. While you can make files a few bytes shorter

>on average, the problem is that limits you so that you only work when
>stored to a file. It makes more sense to suffer that when your
>compressor output is stored to a file it will be a few bytes larger on
>average if it allows your compressor output to work no matter how it is
>transmitted.
>

Gee I transmitt my files via FTP and use whatever protcals as they
have changed over the years. Funny I have not had a problem maybe
your not trasnmitting files properly?

>It is sort of like having a design for a car that improves gas mileage
>by one mile per gallon, but only works on specially designed roads. I'd
>give up the 1 mpg for the ability to drive wherever I wanted.

Actually it more like unhitching the trailer that was attacked to
your car. When you take it off your gas mileage goes up.

David A. Scott
--
SCOTT19U.ZIP NOW AVAILABLE WORLD WIDE "OLD VERSIOM"
http://www.jim.com/jamesd/Kong/scott19u.zip
My website http://members.nbci.com/ecil/index.htm
My crypto code http://radiusnet.net/crypto/archive/scott/
MY Compression Page http://members.nbci.com/ecil/compress.htm
**NOTE FOR EMAIL drop the roman "five" ***

Tom Gutman

unread,

Apr 19, 2001, 4:14:04 PM4/19/01

to

"Patrick Craig" <pat...@nag-j.co.jp> wrote in message

news:3ADEA7CF...@nag-j.co.jp...

>
>
> >
> > > > One might just wonder why some individuals are so inclined
to
> > return to
> > > > these
> > > > so utterly hopeless issues.
> > > > I suppose solving real problems like how to make PPM
> > faster/better/more
> > > > efficient is too much of a challenge...
> > >
> > > What's the prize money for solving these problems?
> > >
> > > Patrick
> > >
> > Sorry, that is not acceptable. You already stated that you did
not
> > expect to collect the prize money, and even though you had a
> > possible legal case you chose not to pursue it.
>
> I was just trying to point out why there is so much interest in
> this "utterly hopeless issue".

My apologies. I was assuming that you were telling us why you were
interested, not speculating on why others appear so interested.

>
>
> > While you are under no obligation to tell us your motivation, if
you
> > do choose to do so at least be truthful.
>
> I've just updated my web page.
>

--
Tom Gutman

Dale King

unread,

Apr 19, 2001, 7:17:51 PM4/19/01

to

"Tom Gutman" <Tom_G...@compuserve.com> wrote in message

news:9bngbq$1o2$1...@sshuraac-i-1.production.compuserve.com...

Which is exactly why information theory is built on the concept of data
sources and the entropy of those sources. A million byte message may
actually require zero bits to send if that is the only sequence a data
source produces. You don't need to actually read the message in that case,
because you know what it will be. Perhaps I should have been more explicit.
The context was a data source that could send a sequence of bytes of varying
length.

> > That is the fundamental flaw in David's argument. He says
> basically that for
> > any value of k, any sequences of k bytes only requires k bytes to
> represent
> > it. That disagrees with basic information theory.
>
> Exactly which argument is that? I see you saying that he says that,
> but I don't see him saying that. And this has nothing to do with
> the issue of bijection. The concept and optimality of bijection has
> nothing to do with how a file length is stored.

He said it a few weeks ago in a different thread. Here is the information to
which I refer:

> SCOTT19U.ZIP_GUY <this.a...@is.for.spamers> wrote:
> > Ki...@TCE.com (Dale King) wrote in <3ac3...@news.tce.com>:
> >>But the information content of a single byte file is more than a single
> >>byte. It is the byte of data + the information that there are no more
> >>bytes of data.
> > A single byte of file has 256 pieces of information.
> > if you want to consider less I guess any empty file as 2 pieces
> > of data. Either its exists or doesn't but in either case
> > it has 0 data assigned to it. I don't use these as files.
> > many you do in your compression methods.
> > I use as states 2**8N where N number of bytes in the file.
> > and to play games yes a byte is more that that its also the
> > fact there are no more in front of that byte to so don't
> > forget that.
>
> I think you both miss the point in a way. A single-byte file DOES
> have more than 8 bits of information -- the information that it is in
> fact a single byte is something that must be considered. I think
> Scott misses this.

In the context of a variable length message a byte of information does have
more than 8 bits of information. It also has the information that it is
present.

I could not have said it better myself. David claims he is doing better than
a compressor that encodes the length in the data. All he is really saying is
that they are wrong for not following his rules. His rules say that there
exists a separate way to convey the file length information. Most
compressors don't assume that.

> I also take exception to your characterization of a finite sequence.
> In general it is no knowable if a source has produced its last
> symbol and is repeating itself. Thus if I consider sources that
> produce letters as symbols, I cannot say that "ab" and "abb" are the
> same because the last symbol is repeated.

Correct and I never said that you can tell that a sequence has ended because
the symbol repeats. The point is that a data source is not allowed to not
send a symbol. If it were allowed to not send a symbol, that absence of a
symbol is basically another symbol as in the case of EOF view of files.

That is why the sender and receiver have to agree on some prefix-free
encoding. In the case of files that encoding consists of a length and the
sequence of data. The fact that the length is stored in a different place
than the data is immaterial. Both are stored and in David's case both are
needed to decompress.

> Until you provide some form of redundancy, restriction, out of band
> signals, or similar you cannot talk about the last symbol and hence
> cannot talk about a finite source. The C run time library get
> character routine is not the ultimate nor fundamental definition of
> how a finite source (such as a file) is delimited. And it works
> only by restricting the significant data to using 256 out of a
> possible 4294967296 symbols.

That's news to me! Last time I checked the get routine returned one of 257
different values. The size of the type used to return those values is
immaterial. The fact that there are only 257 possible return values is what
is important.

> But just about all real sources do have some redundancy that allows
> them to delimit the data. A common encoding scheme used not all
> that long ago on disks and still used for many networks actually has
> an alphabet of four symbols, but uses only two of them for normal
> data. The remaining two are used only for special markers. Another
> approach, commonly used on communication links is to use a
> multi-symbol escape sequence and then use escapes to bracket
> segments of data. This is because while infinite sources are good
> for defining and calculating information rates, people really want
> to work with files (finite sequences of bytes) and for other reasons
> actually want to transmit smaller units (typically called records or
> packets).

And each of these is simply another way of describing a prefix-free code.
You can make a sequence prefix-free in an infinite number of ways. The point
is that whatever way you use to make it prefix free must be included in the
count for the compressed data.

> > In order for the reader of the data source to know that there is
> no more
> > information that must somehow be part of the stream of symbols or
> else it is
> > known beforehand for the case of fixed length sequences. The
> reason you
> > don't want to think of the stream of symbols as having an
> associated length
> > is that length is also information. If the receiver of the data
> source needs
> > to know that information it must be transmitted and the
> transmission must be
> > counted. If you try to treat it as a separate thing it is easy to
> fool
> > yourself into thinking that it doesn't count as David does.
>
> When dealing with compression I usually don't want to work with
> streams at all. I want to work with files -- and files have
> lengths. I am concerned with the data on my disk, not with the
> output of my Geiger counter.

And that's fine. I will not dispute that David's compressors will make your
files a few bytes smaller on average than if you store the output of stream
compressor into a file. But that does not make them better compressors than
a stream compressor, which is what David is claiming. That just means that
the stream compressor is targetted at a different use, namely where one
cannot rely on file headers or external means of determining the length.

Consider a compressor that is designed for compressing for some form of
non-binary format. Perhaps I have some signal format we will call trinary.
It has three different states. So I need to compress to a sequence of values
from 0 to 2. That is easily done and is perfectly valid. Shannon even tells
you how to compute the entropy in this case. If I store the output into a
file as one value per byte, that wouldn't be a very efficient way to encode
it into a file. While that isn't an efficient way to encode into a file the
data isn't necessarily less compressed in a theoretical sense. The
compression could be optimal for a trinary stream it isn't for a byte
stream. What is the difference in this case? The number of symbols in the
alphabet.

But that is no different than the case we are discussing. GZIP targets a 256
symbol alphabet. Using the EOF view of a file, David's compressor targets a
257 symbol alphabet.

No, it still must be prefix free. The addition of those delimiters makes it
prefix free as presumably those delimiters cannot appear within the
sequence. And yes you will save some space by letting the system put the
delimiters in for you, but that is not compressing better than a compressor
that assumes there are no delimiters. When computing comppressed size you
have to count the delimiters if they are necessary for decompression.

> > There are basically 2 ways to view a file as a data source. One
> can view it
> > as a length that is prefixed to the data. This is basically how it
> is
> > usually stored on a hard disk. Not that the finite sequence for a
> disk file
> > is often longer than the actual "file length" due to the fact that
> files are
> > allocated in terms of sectors.
> >
> > Or one can view it as a source that has a 257 symbol alphabet
> which has the
> > additional EOF symbol. Once an EOF symbol is sent by the data
> source, it
> > will continue to be sent forever. That is the way a file is
> usually viewed
> > in a programming language, such as C. These are entirely
> consistent with
> > hard drive technology.
>
> So? I choose to view files as files (finite sequences of bytes). I
> see no reason to map them into data sources, at least in terms of
> any theory. Some file interfaces may be viewed as sources, but that
> is an issue for that particular interface, not for the concept of a
> file itself.

As long as you count the data that tells you how long the sequence is when
computing amount of compression, I have no problem with it. David's is not
counting that information. The reason for looking at things as data sources
is you are forced to count that information. I am not saying you have to
encode the EOF information in the file, merely that if you don't include it
you have to count the file length information when computing compression
ratio.

> However, if you do choose to represent files as sources that have a
> 257 character alphabet but choose to waste bandwidth by not using
> the 257'th character for data and further guaranteeing that once
> they output the 257'th character they will output an infinite string
> of such characters, it doesn't make any difference. David's
> compressors are designed to use the same set as domain and codomain.
> So you can design them to use one of your 256+1 symbol sources as
> input, and their output will be a similar 256+1 source. He claims
> that his methods improve the compression, as measured by the number
> of characters preceding the EOF, using the same basic algorithm.

No most compressors are designed for the 256 symbol domain. Note that is a
different domain than what David designs for. Just like the trinary
compression is designed for a 3 symbol domain. One can store them in a 257
symbol domain, and the 257th symbol is basically unused. But that says
nothing about compression ratio. Note I could also map a file from David's
domain into the same domain as other compressors's by prefixing the length
and we would see an increase in space used.

> You seem to be claiming that the only compressors worthy of
> consideration take a 256+1 source as input and produce a pure 256
> symbol source as output, with the behaviour of the output source
> undefined past some point (where the decompressor starts producing
> EOFs). While there might be special cases where this is desirable,
> it seems rather bizzarre to make it a general requirement.

It isn't a requirement. But the fact that most compressors are designed for
this domain does not make them worse compressors than David's simply because
they are targetting a different domain. The way you make sure you are
comparing apples to apples when comparing compressors is to look at not the
size of the file, but what data is necessary for decompression.

> > > One can find subsets of files that have the prefix property (no
> file
> > > in the subset is a prefix of any other file in the subset). Any
> > > such subset will be a relatively small fraction of all files,
> > > although they may be useful in some contexts.
> >
> > Quit assuming compression is to files. It can be but often isn't.
> If you
> > look at a file correctly as a data source, you see that a file is
> a subset
> > of all possible bit sequences. David thinks it is all possible bit
> sequences
> > because he splits off the length information and pretends it
> doesn't count.
>
> David is working with compressors from files to files. That is his
> universe of discourse. If you do not find that interesting and wish
> to consider compressors in other contexts, fine. But then don't
> bother making irrelevant comments on David's work.

My complaint is that he is making irrelavent comments on other's work. If I
am targetting stream compression there is no such thing as David's
permutation property (I do not call it bijectivity as all lossless
compressors are bijective). If one is compressing to a domain that does not
have file lengths are any of David's idea relavent? No because by definition
the set of outputs is disjoint from the set of inputs.

If one assumes that the file length doesn't exist that can basically be
viewd as different alphabets. 257 vs.256 symbols.

> >
> > > If one rates compressors purely on their ability to compress,
> then
> > > an optimum compressor will be a bijection.
> >
> > You mean permutation.
>
> No, I mean bijection. While if the domain and codomain are the
> same, then it will be a permuation, if you read my preceding
> paragraph you will notice that I am allowing them to be different.
> The conclusion stated still holds.

If a compressor is not a bijection from the set of inputs to the set of
outputs then it is not lossless.

Sorry, that is all I have time to respond to today.
--
Dale King

SCOTT19U.ZIP_GUY

unread,

Apr 19, 2001, 8:23:18 PM4/19/01

to

Ki...@TCE.com (Dale King) wrote in <3adf...@news.tce.com>:

>"Tom Gutman" <Tom_G...@compuserve.com> wrote in message
>news:9bngbq$1o2$1...@sshuraac-i-1.production.compuserve.com...
>> "Dale King" <Ki...@TCE.com> wrote in message
>> news:3adf...@news.tce.com...
>> > "Tom Gutman" <Tom_G...@compuserve.com> wrote in message
>> > news:9bl4oo$c6h$1...@sshuraaa-i-1.production.compuserve.com...

<cut>

>
>He said it a few weeks ago in a different thread. Here is the
>information to which I refer:
>
>> SCOTT19U.ZIP_GUY <this.a...@is.for.spamers> wrote:
>> > Ki...@TCE.com (Dale King) wrote in <3ac3...@news.tce.com>:
>> >>But the information content of a single byte file is more than a
>> >>single byte. It is the byte of data + the information that there are
>> >>no more bytes of data.
>> > A single byte of file has 256 pieces of information.
>> > if you want to consider less I guess any empty file as 2 pieces
>> > of data. Either its exists or doesn't but in either case
>> > it has 0 data assigned to it. I don't use these as files.

I was trying to make sense of what you said by the comment
"if you want" just as you continual to say that my logic
allows one to store info in the name, This is false.

>> > many you do in your compression methods.
>> > I use as states 2**8N where N number of bytes in the file.
>> > and to play games yes a byte is more that that its also the
>> > fact there are no more in front of that byte to so don't
>> > forget that.

< more cutting>

>I could not have said it better myself. David claims he is doing better
>than a compressor that encodes the length in the data. All he is really
>saying is that they are wrong for not following his rules. His rules say
>that there exists a separate way to convey the file length information.
>Most compressors don't assume that.
>

Let me tell you what I am saying. I say if you take your stream
compressor that encodes the ending. Then use it basically unmodifed
for a file compressor. That one your wasting space. And two that it
would weaken a following encryption pass since if a false key used
it would confuse the decompressor since its counting on the EOF
being encoded in the data. This would give information
to an attacker that the wrong key was used. Is it clear yet MR KING.
and yes most likely very little space would be saved. Agian I stress
some but not much. The big effect is when one encrypts the FILE after
compressing. I personally feel your type of compressors would be
bad to use if encryption is occuring afterwards. Yes you can make
a work aroung my modifying the compressor and put length out side
the data compressed. By why go to that trouble when its not needed
and you can most likely fix the compressor in the first place by
use the fact the file ends.

Randy Styka

unread,

Apr 20, 2001, 12:33:40 AM4/20/01

to

Paul Mitchell wrote:
>
> We have an ascii file on a solaris (2.7) system:
>
> file ./li11to20
> ./li11to20: ascii text
>
> of considerable size:
>

> gzip ./li11to20
> ./li11to20: Value too large for defined data type
>
> ls -l
> total 2486440
> -rw-r----- 1 cporter scprogs 2215677248 Apr 16 14:43 li11to20
>
> The user wants to compress the file in such a way that they can move it to
> a (gulp) windows box and uncompress it there. The ideal solution would
> be:
> zip -l ./li11to20
>
> however, this inevitably produces:
>
> zip error: Nothing to do! (./li11to20.zip)
>

> I'm assuming that zip is unable to deal with the size as well.
> Independent of just cutting the file into smaller pieces, is there some
> solaris compatible method of compresssion which would provide the CR LF
> string for the Unix LF
>

Our company's ZIP for Unix can handle large files on
Solaris, and can do files and directories, and is PC
compatible. Download the zip_solaris_large version from our
web site for a free 30day trial. This may be all you need
if this is a one time thing. Send a message to
in...@computron.com if you have questions, or visit
www.computron.com/ne for the software and manual. Good
luck!

--
+-----------------------------------------------------------------+
| Computronics Randy Styka, ra...@computron.com |
| 4N165 Wood Dale Road Phone: 630/941-7767 |
| Addison, Illinois 60101 USA Fax: 630/941-7714 |
+-----------------------------------------------------------------+

Patrick Craig

unread,

Apr 20, 2001, 2:32:53 AM4/20/01

to

I'm sorry, but I still don't get it. It sounds like both the test file (length X) and the archive file are generated in a predefined way using your software. Where's my input?

As you probably know, I like to modify the rules of challenges to give myself more chance of winning...

> After your
> time period expires it open for a nonth to any one.
> I also think if no one wins I may change more and
> more rules till someone does win.

It sounds like you're intent on giving the money to somebody. I think you would prefer to give it to me as I didn't get anything from Mike. How about we make the contest open to everyone from now until the end of June? If nobody wins, you send the money to me. I still have the option of entering
the contest if I can work out what I have to do before it closes.

> Four I like japanese beer
> almost as much as german beer and your in Japan.

If I win, I'll buy you a Sapporo draft next time you're in Japan.