Re: Hutter Participation - Babel (Library of Babel)

23 views
Skip to first unread message

Matt Mahoney

unread,
May 12, 2009, 5:09:03 PM5/12/09
to Ian Latter, Hutter-Prize

--- On Tue, 5/12/09, Ian Latter <ian.l...@midnightcode.org> wrote:

> From: Ian Latter <ian.l...@midnightcode.org>
> Subject: Hutter Participation - Babel (Library of Babel)
> To: matma...@yahoo.com
> Date: Tuesday, May 12, 2009, 2:30 PM
> Hello Matt,
>
>
> I would like to enter my compressor/decompressor,
> Babel, into the Hutter Prize per the Hutter Prize rules at
> this URL; http://prize.hutter1.net/hrules.htm
>
> I have three versions to submit. All three use the
> same code (babel-1.0), but each has had two
> variables tweaked for your ease of use. All sizes
> are in bytes/octets unless otherwise specified.
>
> The Babel Project page can be found here;
> http://midnightcode.org/projects/babel/
>
> 1. Babel Demo
> Name: Babel
> Version: 1.0 Hutter 5
> Binary size: 10,048 (dynamic) or 463,180 (static)
> Enwik8 size: 760,000,023
> Decompression: Hours, 1.3Gbytes RAM
> DL:
> http://midnightcode.org/projects/babel/code/hutter-1.0.5.tgz
>
> 2. Babel 10%
> Name: Babel
> Version: 1.0 Hutter 380
> Binary size: 10,048 (dynamic) or 463,276(static)
> Enwik8 size: 10,000,027
> Decompression: Years, 26Mbytes RAM
> DL:
> http://midnightcode.org/projects/babel/code/hutter-1.0.380.tgz
>
> 3. Babel 1%
> Name: Babel
> Version: 1.0 Hutter 3800
> Binary size: 10,048 (dynamic) or 463,276(static)
> Enwik8 size: 1,000,031
> Decompression: Decades/Centuries, 7Mbytes RAM
> DL:
> http://midnightcode.org/projects/babel/code/hutter-1.0.3800.tgz
>
>
> Background on the three versions;
>
> Babel Demo consumes too much memory (1.2Gbytes)
> and makes enwik8 almost 10 times larger, but will run
> fast enough to work as a demonstrator of the success
> of the compression principle.
> Babel 10% applies a reasonable use of the underlying
> principles and achieves ~90% compression of enwik8/
> Babel 1% may apply unreasonable use of the underlying
> principles (without modification) making it contentious
> in review, though it achieves ~99% compression of
> enwik8.
>
>
> Babel has been tested on an AMD Sempron 2200+
> under 32bit Linux. It has been released under the
> GPL.
>
> Babel has been programmed to run as both a
> compressor and decompressor. It takes no parameters,
> as these are set at compile time. When it starts it
> assumes that it is to compress, and tries to open
> enwik8 in the current directory. If that fails it will
> switch
> to decompressor mode, and will look for the Crimson
> Hexagon file enwik8.crimsonhexagon@{number}
> where {number} is the pre-configured block/chunk
> size.
>
> I have provided the Crimson Hexagon file for the
> 380 and 3800 versions, but the 5 version file is too
> large for my web server -- you shouldn't have any
> problems generating it at your end.
>
> Although each package contains the source code
> for that version, I have also provided pre-compiled
> Linux 32 bit binaries - one statically compiled and
> one dynamically, for your convenience. I was not
> clear on the use of standard crypto libraries (as
> the rules mention standard file libraries) so I have
> made both options available.
>
> Please let me know if you have any issues or
> queries with the submission, configuration or any
> technical detail.
>
>
> Thanks for your time,
>
>
>
>
>
> --
> Ian Latter
> Late night coder ..
> http://midnightcode.org/

Thank you for your Hutter prize submission. Unfortunately none of your three submissions qualifies for the prize. Submission #1 does not meet the 3% threshold improvement over the current best submission, in addition to failing the memory requirement. Submissions #2 and #3 fail to meet the time limits according to the rules.

Also, from your description of the algorithms and my examination of the code (which is very nicely documented BTW), decompression will fail for #2 and #3 because a brute force search of blocks with more than 288 bits of entropy (equivalent to 36 random bytes) will with high probability produce both MD5 and SHA1 collisions with hashes of the original compressed blocks, although it will take on average 10^70 years to find each collision assuming a rate of one hash computation per nanosecond. If you disagree with my conclusion, then you will need to submit a version that runs within acceptable time limits so that compression can be verified.

Also, I should let you know that Alexander Ratushnyak has submitted a Hutter prize entry of 15,958,674 bytes on Apr. 22, 2009 which I verified meets the requirements and is currently in the 30 day waiting period for approval. To qualify, you would need to beat that by 3%. His entry, (decomp8b) can be found at http://cs.fit.edu/~mmahoney/compression/text.html#1323
(direct link, http://cs.fit.edu/~mmahoney/compression/decomp8b.zip )

-- Matt Mahoney, matma...@yahoo.com

Matt Mahoney

unread,
May 13, 2009, 10:48:17 AM5/13/09
to Ian Latter, Hutter-Prize

So if I understand, the rule,

- Programs should run in less than 10 hours on a 2GHz Pentium 4 with 1GB RAM and 10GB free HD for temporary files.

is unclear because it says "programs should" rather than "programs must"?

Just to clarify, that is a condition for the prize.

In any case, the burden of proof is on you to show that your program will decompress correctly, i.e. that none of the blocks will find any simultaneous MD5-SHA1 collisions. The fact that such collisions are hard to find (10^70 years) is not a proof.


-- Matt Mahoney, matma...@yahoo.com


--- On Tue, 5/12/09, Ian Latter <ian.l...@midnightcode.org> wrote:

> From: Ian Latter <ian.l...@midnightcode.org>
> Subject: Re: Hutter Participation - Babel (Library of Babel)
> To: "Matt Mahoney" <matma...@yahoo.com>
> Cc: "Hutter-Prize" <Hutter...@googlegroups.com>
> Date: Tuesday, May 12, 2009, 11:04 PM
>
> Thanks for your response Matt,
>
>
> I am disappointed that you won't accept #2 and #3
> on a time basis as I didn't perceive an imposed time
> limit from the rules;
>
>   http://prize.hutter1.net/hrules.htm
>     - Publish a program [...]
>     - If run, it produces a [...]
>     - Programs must be [...]
>     - Programs must run [...]
>     - Programs should run in less than 10 hours
> [...]
>
>   My program satisfies the four mandatory
> requirements (publish, produces and must).
>
>   As the creator of CHAOS;
>     http://midnightcode.org/projects/chaos/
>
>   .. I am happy to produce a version that will
> horizontally scale (per the program comments)
> so that I can work through a proof of concept
> for the numerical proof (and make tweaks to
> it where necessary), but obviously only if the
> program will eventually be accepted.
>
>
>   If you are adamant that the program must
> complete in under 10 hours on the test hardware,
> then will you please also revise the Hutter Prize
> Rules page so that others do not burn
> significant time for a semantic bug?
>
>
> Thanks,

Reply all
Reply to author
Forward
0 new messages