Hutter prize entries?

Michael Goldshteyn

unread,

Aug 21, 2006, 10:08:17 AM8/21/06

to

Why were the hutter prize entries removed from the Hutter prize page? There
is no longer a mention of the candidate entries at either:

http://prize.hutter1.net/

or

http://cs.fit.edu/~mmahoney/compression/text.html

Thanks,

Mike

moogie

unread,

Aug 21, 2006, 7:46:32 PM8/21/06

to

Hmm, quite odd!

Matt Mahoney

unread,

Aug 21, 2006, 8:28:36 PM8/21/06

to

There is some controversy over what the rules are and who won a prize,
if anyone. Marcus Hutter will make the final decision on both matters,
since it's his money. So I removed this information from my website.

The controversy stems from how much time and memory should be allowed.
My benchmark was vague on this and simply said that a program should
use "reasonable" resources, otherwise nobody will test the program and
it won't be listed. This was adequate for an open benchmark where
anyone can submit results, but not for one where prize money is
involved and we want to test the claims ourselves.

The current rules require that a decompressor use no more than 1 GB of
memory, 10 GB of temporary files and run in under 8 hours on a "1 GHz
machine". However, these rules were not stated at the time of the
submission. After these programs were submitted, I had to upgrade my
PC to 2 GB and install Linux (dual boot) to verify the two claims.
Both were valid at the time of submission, but neither would be now.

A second controversy is over who should win the prize if we do allow 2
GB. The two submissions were a few hours apart. raq8g was first, with
a claimed a 0.9% improvement using 1 GB (-7 option), which was below
the 1% threshold for a prize. (The author claimed the right amount but
mistakenly believed it was over 1%). I tested it with 2 GB (-8 option)
and found a 2.0% improvement (taking 2 x 9 hours on a 2.2 GHz Athlon 64
under 64 bit Ubuntu Linux). The controversy here is that the author
never claimed the higher amount because he did not test with 2 GB. The
new rules make it clear that we will only test as instructed by the
author and not grant more than the claimed amount. No such rule
existed at the time of submission. My policy was to experiment with
options, making a "reasonable" effort to improve compression. It is
reasonable to use as much memory as possible, since this almost always
helps.

durilca 0.5(Hutter) was submitted a few hours after raq8g. It improved
by 1.5% over the original amount using 2 GB, but by less than 1% using
1 GB. There was no claim for 1 GB, but since tests only took about 10
minutes, so I was able to experiment with options to get what I think
is the best result. So even if we allowed 2 GB, only one of the two
would win a prize.

Results for both can be found at
http://cs.fit.edu/~mmahoney/compression/text.html but you have to look
for them because they are not in the main table. That is for enwik9.
durilca 0.5(Hutter) crashes on enwik9 even with 2 GB. raq8g would take
too long to test, so I probably will not. However, you can find enwik8
results at 1 GB and 2 GB in the detailed descriptions for each program.

Unfortunately, this doesn't give you the decompressor size using the
same measurement as the Hutter prize. I allow a zipped archive
containing either source code (any language) or .exe, plus dictionaries
and other files. Hutter allows only a single uncompressed .exe (self
extractors like UPX are OK). Such rule incompatiblities are necessary
when money is involved.

This is a messy situation, unfortunately.

It is not over yet. I am testing a third submission that should
improve by 2.6% with 1 GB.

-- Matt Mahoney

michael

unread,

Aug 21, 2006, 10:54:41 PM8/21/06

to

A very tough situation. I find it interesting how something which we
would normally
toss up to legal nonsense is now something which we all wish was worked
out before
hand.

I believe that Hutter offered the prize in good faith. And I believe
the first few
successes offered their solutions with good faith and within the spirit
of the
competition. It would be wise to immediately suspend the competition
to prevent
any further winners until this is worked out.

For what it is worth, I think Shkarin's entry should be honored as a
prize winner
because he cleared the 1% mark above what was posted at the time of his
submission.
To do otherwise would be to invite the idea of 'delayed submissions' to
negate prize winning.
Not that I'm suggesting this is the case.

Also, removing the 1% improvement will eliminate this problem in the
future. And I doubt
that winners with little money gain will claim their prize.

At any rate, silence is a very bad idea now. Please ... suspend the
competition until the
rules are worked out. Anything else will become a very big problem.

Finally, I'm not surprised at all that Dmitry was the first 'winner'.
Even on your test, Matt, ppmonster is the clear 'winner' in my eyes.
(^:

- Michael Maniscalco

Michael Goldshteyn

unread,

Aug 22, 2006, 8:56:43 AM8/22/06

to

"Matt Mahoney" <matma...@yahoo.com> wrote in message
news:1156206516....@74g2000cwt.googlegroups.com...

> There is some controversy over what the rules are and who won a prize,
> if anyone. Marcus Hutter will make the final decision on both matters,
> since it's his money. So I removed this information from my website.

> ...

I think another issue with the current rule set is the 1% rule to win 500
euros. In my opinion, especially now that winning is so dang difficult with
the rule change to 1GB/1GHz machine/8 hours, I think it would be more
reasonable to at least pay out 100 euro increments for submissions that are
(0.5% improvement <= submission < 1%). In other words, those submissions
that are greater than or equal to a 5% improvement, but are less than 1%,
should at least get some sort of a prize, since even that with the current
state of affairs, is quite an accomplishment. On the subject of the first
submissions, it's a coin toss as to who, if anyone, should win, although I
would tip the hat to the first submission (raq8g with -8), because, he was
after all first.

Mike

Michael Goldshteyn

unread,

Aug 22, 2006, 9:15:06 AM8/22/06

to

"Matt Mahoney" <matma...@yahoo.com> wrote in message
news:1156206516....@74g2000cwt.googlegroups.com...

> It is not over yet. I am testing a third submission that should
> improve by 2.6% with 1 GB.
>
> -- Matt Mahoney

It looks like paq8hp1, which is available for download
(http://www.binet.com.ua/~artest/HKCC/), but not even on hutter's page as a
submission, yet, may be the best one so far, beating even paq8hkcc on the
subset of the enwik8 file I tried it on. It is very picky about parameters
with file paths, however, so you may want to place both the archive and the
file to compress in the "current working directory" and specify them without
path prefixes, at least on windows.

Mike

Matt Mahoney

unread,

Aug 22, 2006, 6:28:50 PM8/22/06

to

Results are here.
http://cs.fit.edu/~mmahoney/compression/text.html#1479

Yes, paq8hp1 must be run in the current directory so it can find its
own self-extracted dictionary. The build process is clever. After the
.exe is compiled, the dictionary is spliced into the middle of it.
Then the whole thing is compressed with a self extracting .exe packer.
The dictionary is tuned to enwik8.

The Hutter prize table at http://prize.hutter1.net/ has not yet been
updated but I expect that will happen soon.

-- Matt Mahoney

Phil Carmody

unread,

Aug 22, 2006, 7:08:31 PM8/22/06

to

"Matt Mahoney" <matma...@yahoo.com> writes:
> There is some controversy over what the rules are and who won a prize,
> if anyone. Marcus Hutter will make the final decision on both matters,
> since it's his money. So I removed this information from my website.

If what you have is facts, then you should leave them on display.
I don't believe you ever put any non-facts on your site.

Let the interpreter take responsibility for how those facts should
be utilised.

Phil
--
"Home taping is killing big business profits. We left this side blank
so you can help." -- Dead Kennedys, written upon the B-side of tapes of
/In God We Trust, Inc./.

Matt Mahoney

unread,

Aug 22, 2006, 11:34:36 PM8/22/06

to

Phil Carmody wrote:
> "Matt Mahoney" <matma...@yahoo.com> writes:
> > There is some controversy over what the rules are and who won a prize,
> > if anyone. Marcus Hutter will make the final decision on both matters,
> > since it's his money. So I removed this information from my website.
>
> If what you have is facts, then you should leave them on display.
> I don't believe you ever put any non-facts on your site.
>
> Let the interpreter take responsibility for how those facts should
> be utilised.

I realized I removed too much information and put back the exact
submission times (got 2 in one day), and uncompressed .exe size in the
detailed descriptions of the Hutter prize candidates. I included
results for both 1 GB and 2 GB memory.

-- Matt Mahoney

Rudi Cilibrasi

unread,

Aug 23, 2006, 1:11:15 AM8/23/06

to

Hi everybody,

I think the 0.5%-1% relaxed requirement is a good one and would make
contestants happier but more work for judges. And I think if we really
intend to advance the state of the art then we should set up a
Subversion repository somewhere and put the best open source compressor
in there and offer a separate L for open-source entries only. Two
arenas: open and closed source with two L's. Otherwise I think a
rational self-interested player is disincented to release source in the
current scheme when releasing only an executable is possible to prevent
others from gaining the advantage of the improved technique. With my
repository suggestion, we can have at ongoing offer for money to
improve the open source one no matter how good private compressors
become. I don't think the requirement to publish an executable
decompressor solves the problem because reverse engineering is hard.
To really accelerate the art beyond it's natural pace we have to share
source and there should be a contest that essentially 'funds'
development of an Open Source compressor for the benefit of all by
paying out according to progress on the goal. But we also want to
measure the absolutely best compressor so we need an arena for closed
source too. I think if we continue with the current ruleset it will
stifle cooperation because we are being incented to withhold our
'secrets'.
Cheers,

Rudi

michael

unread,

Aug 23, 2006, 8:35:44 AM8/23/06

to

Rudi,

I agree. There contest actually encourages hording of knowledge to
maintain
an advantage. But I think this is always the case where financial
rewards are
the primary incentive.

For my two cents on the contest, I think it does very little to
advance the field of compression
in any practical way. If there are no restrictions on practical
aspects (such as time and space)
then this is little more than a competition to build the biggest 'gas
guzzler' when what we should
be aiming at is practical efficiency.

When you can download a version of the data from a remote location over
56K modem
faster than you can decompress it ... then what value is the compressor
at all?
After all, it takes ZERO space on my drive if I can download it faster
when I need it.

And compressors tuned to the test set in order to achieve better
compression are,
at least to me, equally unimpressive. They certainly do not address
the competition in
the spirit which it was offered. That is, to address AI.

A closed test set with compulsory publication of source code upon
success is a much
stronger way to advance the science.

- Michael Maniscalco

Matt Mahoney

unread,

Aug 23, 2006, 11:42:06 PM8/23/06

to

First, I don't think there is any way to enforce an open source
requirement. If a contestent wants to keep their algorithm secret, the
source code can be obfusticated so that it is no better than executable
code. What is really needed is a document explaining the algorithm.

I think that documented open source will have an advantage by the
simple fact that many people can work on it. I don't think any single
person could have produced the PAQ series. The GPL license was key to
its success.

Second, I must restate that the shared goal of both contests is not to
produce more practical compressors. It is to motivate research in
language modeling. We expect that the top compressors will be slow,
use lots of memory, and be tuned to the data set. That is OK. The
first few entries will pick this low hanging fruit. Future gains are
going to increasingly depend on solving harder problems in modeling
semantics, syntax, and higher level reasoning. It simply will not be
practical to encode the millions of rules that distinguish between
coherent English paragraphs and gibberish into the decompressor. A
solution will require new algorithms for language learning, and perhaps
a better understanding of how children do it. The data set size was
chosen large enough to make this possible, but not so large as to allow
a brute force approach.

-- Matt Mahoney

Michael Goldshteyn

unread,

Aug 24, 2006, 9:34:10 AM8/24/06

to

"Matt Mahoney" <matma...@yahoo.com> wrote in message

news:1156390926....@74g2000cwt.googlegroups.com...

> Second, I must restate that the shared goal of both contests is not to
> produce more practical compressors. It is to motivate research in
> language modeling. We expect that the top compressors will be slow,
> use lots of memory, and be tuned to the data set. That is OK. The
> first few entries will pick this low hanging fruit. Future gains are
> going to increasingly depend on solving harder problems in modeling
> semantics, syntax, and higher level reasoning. It simply will not be
> practical to encode the millions of rules that distinguish between
> coherent English paragraphs and gibberish into the decompressor. A
> solution will require new algorithms for language learning, and perhaps
> a better understanding of how children do it. The data set size was
> chosen large enough to make this possible, but not so large as to allow
> a brute force approach.

It is unfortunate that a larger data set was not chosen, because, in my
opinion, a brute force approach, at least in defining "the millions of rules
that distinguish between coherent English paragraphs and gibberish" would
have been a far more valuable contribution to the area of compression, since
it would be very useful for the general compression of English text. In the
end, the size of the compressor/decompressor is less relevant if it can
compress/decompress multiple different compressions of a particular type of
file, reusing the rules bound in its larger size executable. I suppose that
this is why compressors/decompressors with dictionaries would be more
advantages, long term, at least on the data they are meant to compress well.

Mike

Matt Mahoney

unread,

Aug 24, 2006, 1:43:33 PM8/24/06

to

I estimate that the size of the set of rules needed to distinguish
properly formed English text (as well as a human proofreader) is about
100-150 MB using an efficient representation, and that these rules can
be learned from 1 GB of text. If this is so, then the task of
compressing 1 GB of text can be approached in two ways. One is to
encode these rules into the compressor/decompressor (offline model),
and the other way is to start with no knowledge and learn the rules as
compression proceeds (online model). I believe that the second
approach will prove more fruitful, for the simple reason that people
have been using the first approach to AI for the last 50 years and it
hasn't worked. The effort of coding all human knowledge (e.g. Cyc) is
enormous.

As a simple example, paq8h uses a static English dictionary (offline
model). paq8hp1 uses a dictionary extracted from the input data
(learned model) and compresses better.
(paq8hp1 actually uses a static dictionary customized to enwik8 but it
would be easy to generalize the technique).

The contest properly discourages brute force approaches. A brute force
approach to estimating the probability of a 5 word sequence would be to
count hits in Google, or equivalently, use the 5-gram statistics made
available by Google through the LDC on a 6 DVD set. This obviously
won't work for the Hutter prize because your decompressor would be over
25 GB. Nor will it work to collect statistics from a smaller data set
because the most likely hit count for most sequences will be 0. You
will have to be smarter and figure out how to learn and apply rules of
syntax and semantics. That knowledge would be more valuable than the
rules themselves.

So why 1 GB? This is discussed in
http://cs.fit.edu/~mmahoney/compression/rationale.html
but boils down to (1) Turing's guess (2) Landauer's estimate of human
long term memory capacity and (3) the amount of text you can read in a
lifetime. I assume the entropy of text is 1 bpc as estimated by
Shannon. I know the Hutter Prize is 100 MB but that should still be
enough to learn a "child like" model far beyond what we can do now, so
it is a good intermediate goal.

I also assume (and this is controversial) that it is possible to learn
a language from unlabeled text. A simple example is extracting a
dictionary and word frequency statistics. Semantics and syntax is
harder, but can be done. For example, a model without vision can learn
that blue is a color by observing that the words tend to occur near
each other. How much text do you need to learn this? Lets count
Google hits:

"the" 22,600,000,000 (at least this many English pages indexed)
"blue" 992,000,000
"color" 1,030,000,000
"color blue" 208,000,000

So "color" appears on 4.5% of all web pages, but on over 20% of pages
containing "blue". I could probably have gotten similar numbers with
10^-7 as many pages, maybe even 10^-8. Lets try enwik8 and take each
line (paragraph) as a separate document.

C:\res\data\wiki>grep -c . enwik8
File enwik8:
919075 lines match
enwik8: grep: input lines truncated - result questionable

C:\res\data\wiki>grep -i -c "the" enwik8
File enwik8:
222186 lines match
enwik8: grep: input lines truncated - result questionable

C:\res\data\wiki>grep -i -c "blue" enwik8
File enwik8:
2619 lines match
enwik8: grep: input lines truncated - result questionable

C:\res\data\wiki>grep -i -c "color" enwik8
File enwik8:
7496 lines match
enwik8: grep: input lines truncated - result questionable

C:\res\data\wiki>grep -i "blue" enwik8 | grep -i -c "color"
File STDIN:
357 lines match

Looks like more than enough data. If nothing more, your compressor
ought to predict "color" whenever it has seen "blue" recently, and vice
versa.

What about syntax? If you see a novel word X in a context like "the X
is", you already know that X is a noun and you can predict sequences
like "a X was" or "Xs" but not "he X" or "Xed". I suppose that future
Hutter entrants will want to duplicate such learning.

-- Matt Mahoney