Introducing the Hutter Prize for Lossless Compression of Human Knowledge

222 views
Skip to first unread message

jabowery

unread,
Aug 6, 2006, 10:10:07 AM8/6/06
to
Introducing the Hutter Prize for Lossless Compression of Human
Knowledge

Researchers in artificial intelligence are being put to the test by a
new competition: The Hutter Prize.

The Hutter Prize challenges researchers to demonstrate their programs
are intelligent by finding simpler ways of representing human knowledge
within computer programs. The researcher that can produce the smallest
program that outputs a selected large sample of Wikipedia wins money.
Moreover, progressively smaller such programs win incrementally more
money. The formula for winnings is modeled after the M-Prize or
Methuselah Mouse Prize, which awards money to longevity researchers for
progress in keeping mice alive the longest.

The purse for the Hutter Prize was initially underwritten with a 50,000
Euro commitment to the prize fund by Marcus Hutter of the Swiss Dalle
Molle Institute for Artificial Intelligence, affiliated with the
University of Lugano and The University of Applied Sciences of Southern
Switzerland.

The theoretic basis of the Hutter Prize is related to an insight by the
12th century philosopher, William of Ockham, called "Ockham's Razor",
sometimes quoted as: "It is vain to do with more what can be done with
less." But it was not till the year 2000 that this was mathematically
proven*, by Marcus Hutter, to be a founding principle of intelligence.
Indeed, Hutter's Razor** might be phrased, "It is truer to explain with
less that which can be explained with more."

Finding a more succinct way to represent human knowledge is not merely
an exercise in aesthetics or efficiency -- it is the pursuit of truth
itself.

For details of the Hutter Prize see:

http://www.hutter1.net/prize.htm

For discussion of the Hutter Prize see:

http://groups.google.com/group/Hutter-Prize

-- Jim Bowery

* http://www.hutter1.net/ai/uaibook.htm

** Hutter's Razor has some caveats relating to the nature of the
universe and computability, but those conditions must be met for any
computer-based intelligence.

Ben Rudiak-Gould

unread,
Aug 6, 2006, 11:48:56 AM8/6/06
to
jabowery wrote:
> The Hutter Prize challenges researchers to demonstrate their programs
> are intelligent by finding simpler ways of representing human knowledge
> within computer programs. The researcher that can produce the smallest
> program that outputs a selected large sample of Wikipedia wins money.

Though I understand the motivation behind this prize, I don't think that
this particular compression challenge correlates well with intelligence. The
problem is that lossless compression requires you to save not only the
informative content of the Wikipedia articles but also every meaningless
detail of how the information is presented, and I'd expect these details to
dominate the size of the compressed file. I think intelligence is more about
identifying what's important than it is about identifying what's not
redundant. You cite Ockham's Razor, but our simple physical theories do not
encode the results of past experiments, only their broad statistical
properties: a faint pattern hidden behind an enormous amount of noise. A
losslessly compressed history of those experiments might contain a few
kilobytes of Standard Model and terabytes of randomness. Intelligence buys
you essentially nothing here.

A lossy text-compression challenge would have to be judged, and 100MB of
text is far too much for a human judge to read. The solution is to judge a
random sample which isn't known to the compressor -- at which point you have
something similar to the Loebner Prize. So I think the Loebner Prize is a
better compression-based test of intelligence than the Hutter Prize. The
Loebner Prize has to my knowledge completely failed to inspire interesting
progress in artificial intelligence, and I very much doubt that the Hutter
Prize will be different.

-- Ben

James A. Bowery

unread,
Aug 6, 2006, 1:46:51 PM8/6/06
to
Ben Rudiak-Gould wrote:
> Though I understand the motivation behind this prize, I don't think that
> this particular compression challenge correlates well with intelligence. The
> problem is that lossless compression requires you to save not only the
> informative content of the Wikipedia articles but also every meaningless
> detail of how the information is presented, and I'd expect these details to
> dominate the size of the compressed file.

A primary reason Wikipedia sample was chosen is that that its
"meaningless details" are not so meaningless when viewed in light of
the fact that, unlike noise in physics experiments, they are
interesting in their own right as artifacts of human behavior. For
example, deriving the specific syntax and grammar of English may be
viewed as meaningless detail but it gives insight into how people
communicate. The enwik8 file (10e8 sample of Wikipedia) will probably
be big enough start to discover/confirm some interesting facts
pertaining to linguistics such as the most likely types of grammar
errors, etc. but also some interesting facts pertaining to formatting
of articles.

Whether it enwik8 is large enough to start to get into conceptual
frameworks or not is less likely but as the size of the file grows to
the full corpus, including the multiple languages, there is a lot more
that can be done at the margins of compressibility that is interesting.

It may at some point,especially when experimental noise actually
becomes part of the knowledge being compessed, it will probably be
necessary to reformulate the prize in terms of award per fractional
improvement so that tiny changes in compression ratio are rewarded
adequately but since the prize is an incremental one this should not
present a major difficulty for the participants.

Matt Mahoney

unread,
Aug 6, 2006, 9:23:24 PM8/6/06
to
Ben Rudiak-Gould wrote:
> jabowery wrote:
> > The Hutter Prize challenges researchers to demonstrate their programs
> > are intelligent by finding simpler ways of representing human knowledge
> > within computer programs. The researcher that can produce the smallest
> > program that outputs a selected large sample of Wikipedia wins money.
>
> Though I understand the motivation behind this prize, I don't think that
> this particular compression challenge correlates well with intelligence. The
> problem is that lossless compression requires you to save not only the
> informative content of the Wikipedia articles but also every meaningless
> detail of how the information is presented, and I'd expect these details to
> dominate the size of the compressed file. I think intelligence is more about
> identifying what's important than it is about identifying what's not
> redundant. You cite Ockham's Razor, but our simple physical theories do not
> encode the results of past experiments, only their broad statistical
> properties: a faint pattern hidden behind an enormous amount of noise. A
> losslessly compressed history of those experiments might contain a few
> kilobytes of Standard Model and terabytes of randomness. Intelligence buys
> you essentially nothing here.

Acutally Wikipedia is very low in noise compared to other text sources.
The text is high quality. Wikipedia pages typically rank high in
Google. Text is checked and corrected by many people. There are very
few misspellings and typos. The text contains line breaks only on
paragraph boundaries. There is some randomness in the ordering of
articles, timestamps, trailing digits in tables of numeric data, etc.
but the entropy of this noise is a tiny fraction of the data set. In
any case, the incompressible parts should have the same effect on
everyone so it is irrelevant to the contest. See
http://cs.fit.edu/~mmahoney/compression/textdata.html for an analysis
of the test data.

It is understandable that many people doubt the link between
compression and AI. (The comp.ai moderator rejected the OP). The
human brain cannot compress data in the sense that a program does.
Decompression requires an exact copy of the model or mapping used for
compression, which must run on a deterministic machine. The brain is
not deterministic and lacks a mechanism for exact copying.

There are many arguments that compression implies AI (but not vice
versa).

1. Hutter proved that the optimal behavior of a rational agent in a
computable environment (to maximize an accumulated reward signal from
the environment) is to assume the environment is controlled by the
shortest program consistent with all observations so far.

2. A natural language model (the ability to compute the probabilty p(s)
of any text string s in human dialog) enables one to pass the Turing
test for AI.

3. Humans exceed machines in the ability to predict successive
characters or words in running text, as measured by entropy bounds
(estimated from the Shannon game, or compressed size, respectively).
Text prediction requires vast, real-world knowledge that machines lack.

These arguments are described in more detail here.
http://cs.fit.edu/~mmahoney/compression/rationale.html

> A lossy text-compression challenge would have to be judged, and 100MB of
> text is far too much for a human judge to read. The solution is to judge a
> random sample which isn't known to the compressor -- at which point you have
> something similar to the Loebner Prize. So I think the Loebner Prize is a
> better compression-based test of intelligence than the Hutter Prize. The
> Loebner Prize has to my knowledge completely failed to inspire interesting
> progress in artificial intelligence, and I very much doubt that the Hutter
> Prize will be different.
>
> -- Ben

The contest is for lossless compression. There is no ambiguity in the
result.

It remains to be seen if the Hutter Prize will inspire new results in
AI. Data compressors usually model text at the character level, or at
most, the lexical level. Researchers in language modeling (primarily
for speech recognition) typically map words to tokens and model the
semantics and syntax to evaluate perplexity (compressed size), but
usually using offline models on a held back test set from different
corpora. These differences make it difficult to integrate these
methods or compare results, but it must be done if the language
modeling problem is to be solved.

-- Matt Mahoney

ianpa...@gmail.com

unread,
Aug 9, 2006, 11:48:10 AM8/9/06
to
In many ways I would have said the opposite. AI is certainly
compressive but compression by itself may or may not imply AI. The
algorithms being used to date are interesting but do not give any deep
insights.

AI implies compression, for Natural Language has words with more than
one meaning. I will be using a second language (Spanish) to describe
meanings.

Lock = eclusia, cerradura
Spring = primavera, ressorte, mamantal

These meanings are completely different, although us as intelligent
beings find no difficulty in getting the correct one. Let us construct
a compressed language. Suppose our language consists of 256 separate
words (including punctuation marks). These words have multiple meanings
which we try to separate as much as possible. Having put Wiki, or any
other text into this language we see if English can be reconstructed.
This reconstruction being done in an AI way. An AI way being the way
you would translate English into Spanish. Extra markers are put in, in
the rare event of an ambiguity. We can then use the compression
algorithms previously attempted.

In fact a data stream is incompressible iff the digits are completely
random. This definition is the basic information theoretic definition.
http://www.idsia.ch/~marcus/ai/paixi.pdf
It is clear that Hutter himself envisages an AI approach in his paper.
Consider

El barco attravesta una cerradura.

If we choose "eclusia" using "attravesta" our AI has not gained
anything over the word grouping approach used in existing algorithms.
In fact we can see that in information theoretic terms "attravesta" is
identical to the pairing stage. If AI is to have advantages it must
work on the general context and on words removed by at least 2.

It is obvious that AI compresses, although time will indeed tell
whether it has a significant advantage over the existing algorithm.
Eventually AI will be needed to the highest compression, this is
clearly implied by a basic information theoretic approach. How soon we
will get to this stage and whetger Hutter is going to directly advance
AI I just don't know.

Ian Parker

Matt Mahoney

unread,
Aug 9, 2006, 12:18:00 PM8/9/06
to

To use your example, a good data compression program will assign a
higher probability, and hence a shorter code, to "the boat went through
the lock" than "the boat went through the padlock". To do this, the
compressor must know about boats and the different meanings of locks.

The purpose of a compression contest is to test how much the compressor
knows, or alternatively, how fast it can learn. Knowledge can either
be hard coded into the decompressor, or it can be learned from the data
itself. In the first case, adding knowledge to the decompressor will
improve compression but will increase the size of the decompression
program. In the second case, a smarter learning algorithm will result
in better compression.

It is my hope that this compression contest will lead to smarter
learning algorithms and a better understanding of how children acquire
language. The best compressors currently model text at the lexical
level. There has been experimental work in language modeling at the
semantic and syntactic levels (mostly in speech recognition research)
on cleaned text, i.e text reduced to a stream of tokens from a fixed
vocabulary. These two approaches have not yet been integrated. I
believe the reason is that for small text files, it is not worthwhile
to add syntactic and semantic modeling. This will not be the case for
data sets large enough to enable such learning. The 100 MB data set is
equivalent to the amount of verbal training data experienced by a young
child. The 1 GB data set is probably equivalent to the knowledge
attained by an average educated adult.

-- Matt Mahoney

ianpa...@gmail.com

unread,
Aug 9, 2006, 3:12:26 PM8/9/06
to

Matt Mahoney wrote:

> It is my hope that this compression contest will lead to smarter
> learning algorithms and a better understanding of how children acquire
> language. The best compressors currently model text at the lexical
> level. There has been experimental work in language modeling at the
> semantic and syntactic levels (mostly in speech recognition research)
> on cleaned text, i.e text reduced to a stream of tokens from a fixed
> vocabulary. These two approaches have not yet been integrated. I
> believe the reason is that for small text files, it is not worthwhile
> to add syntactic and semantic modeling. This will not be the case for
> data sets large enough to enable such learning. The 100 MB data set is
> equivalent to the amount of verbal training data experienced by a young
> child. The 1 GB data set is probably equivalent to the knowledge
> attained by an average educated adult.

When you talk about educational attainment you are talking about lossy
compression. If someone reads Wikipea and understands it and reproduces
in in their own words it will have the same meaning but not be
identical to Wiki. Lossy compression is indeed about AI uniquely so.

My remarks on the meaning of words and the construvtion of a 256
character language was loss free. I think that only AI can achieve the
ultimate in loss free compression, this is a necessary result of
information theory, but that for various reasons AI will have to be
pretty good to beat the non AI approach adopted now.

BTW eclusia and cerradura are totally different meanings. eclusia means
water that goes up and down. sim primavera, ressorte, mamantial. If we
have a languasge of 256 words each word has a number of meanings. The
Eucidean distance can be dynamically adjusted.

I agree with you that AI will only really work with large databases.
Only with a large database can meaninful statistical information be
extracted.

Ian Parker

niels.f...@seies.de

unread,
Aug 9, 2006, 3:36:30 PM8/9/06
to
> To use your example, a good data compression program will assign a
> higher probability, and hence a shorter code, to "the boat went through
> the lock" than "the boat went through the padlock". To do this, the
> compressor must know about boats and the different meanings of locks.

The problem is, if a AI driven self-learning compressor will receive a
meaning
that is analogue to what a human understands, without ever seeing a
book
or opening a lock.
Human expression (in the form of human language) is based in
describing
various aspects of 'reality' (from different perception-domains [sight,
hearing,
smelling, touching, ...]), which (I would say) is impossible to
reverse-engineer
without a connection to that 'reality'.

> The purpose of a compression contest is to test how much the compressor
> knows, or alternatively, how fast it can learn. Knowledge can either
> be hard coded into the decompressor, or it can be learned from the data
> itself. In the first case, adding knowledge to the decompressor will
> improve compression but will increase the size of the decompression
> program. In the second case, a smarter learning algorithm will result
> in better compression.

You forget, that the algorithms nessesarily _must_ grow with it's
complexity
(Pigeon-hole applies to algorithms too). It took some time to converge
to
pure statistical compression-boundaries. So it will with the
super-algorithm
descriptivity. In the end we're reaching Kolmogorov-Complexity, which
(for
all possible sources best compressor) will probaby an algorithm that is

far more bigger than all know algorithms stuffed together (because it's
the best possible on everything). The question is, what does it mean
when
we detect that KC has no practical meaning, and not at all the
expected.

Anyway I would say in most cases for the last decade and for the
forseeable
future, data-compression has nothing to do with true AI (un-supervised
algorithm development, self-modifying code and/or parameter-space), but

is a competition of HI (human intelligence) or rather HS (human
smartness).
It's maybe arrogant (but because I don't kow it better) but for me
finite-state
machines do not carry the spark of AI.

> -- Matt Mahoney

Ciao
Niels

Matt Mahoney

unread,
Aug 9, 2006, 6:15:13 PM8/9/06
to
ianpa...@gmail.com wrote:
> Matt Mahoney wrote:
>
> > It is my hope that this compression contest will lead to smarter
> > learning algorithms and a better understanding of how children acquire
> > language. The best compressors currently model text at the lexical
> > level. There has been experimental work in language modeling at the
> > semantic and syntactic levels (mostly in speech recognition research)
> > on cleaned text, i.e text reduced to a stream of tokens from a fixed
> > vocabulary. These two approaches have not yet been integrated. I
> > believe the reason is that for small text files, it is not worthwhile
> > to add syntactic and semantic modeling. This will not be the case for
> > data sets large enough to enable such learning. The 100 MB data set is
> > equivalent to the amount of verbal training data experienced by a young
> > child. The 1 GB data set is probably equivalent to the knowledge
> > attained by an average educated adult.
>
> When you talk about educational attainment you are talking about lossy
> compression. If someone reads Wikipea and understands it and reproduces
> in in their own words it will have the same meaning but not be
> identical to Wiki. Lossy compression is indeed about AI uniquely so.

Humans are better than machines at lossy compression, for example,
describing a picture in words, then reproducing an approximation of the
picture. This task requires vast knowledge about the world, which
machines lack. However, machines have a capability that humans lack,
the ability to perform repeatable, deterministic computation, such as
lossless copying of information in memory. Lossless text compression
requires both of these abilities. This is why I claim that lossless
text compression implies AI, but not vice versa.

There are many ways to measure intelligence: passing the Turing test,
word error rate in speech recognition, OCR, quality of language
translation, precision/recall in information retrieval, and so on. All
of these tests have the problem that they require human judges to
assign scores or grading criteria subjectively. But text compression
is an objective test. There is no ambiguity in the result. It
measures exactly the ability of a machine to predict successive
characters or words in text. This is a hard AI problem, which people
can do well but machines cannot. What other test would you suggest?

A shortcoming of the test is that the machine must learn all of the
vast knowledge that humans possess from the data set itself (because
the decompressor size is included). Of course, this knowledge is not
all present. However, I believe that much of the knowledge needed for
the task of *text prediction* (or equivalently, passing the Turing
test) is present. For these tasks, the meanings of words do not need
to be grounded, or associated with nonverbal perceptions and actions.
There are many things that people learn nonverbally: how to swim, what
a peach tastes like, what somebody's face looks like. But there is no
way to test for this knowledge verbally. What question could you ask
about swimming that a nonswimmer could not answer correctly after
having read about it?

Regarding the different translations of "lock" into Spanish, I believe
that 1 GB of text is sufficient to resolve word sense ambiguity
problems like this. One form would be associated by proximity in
running text to "key", "door", etc. another to "ship", "canal", etc.
There are LSA models that do this now after training on a few hundred
megabytes.

It is an open question whether 1 GB of Wikipedia is the best data set
for this test. Probably not, but I think it is close enough.

-- Matt Mahoney

Matt Mahoney

unread,
Aug 9, 2006, 6:52:21 PM8/9/06
to
niels.f...@seies.de wrote:
> > To use your example, a good data compression program will assign a
> > higher probability, and hence a shorter code, to "the boat went through
> > the lock" than "the boat went through the padlock". To do this, the
> > compressor must know about boats and the different meanings of locks.
>
> The problem is, if a AI driven self-learning compressor will receive a
> meaning
> that is analogue to what a human understands, without ever seeing a
> book
> or opening a lock.
> Human expression (in the form of human language) is based in
> describing
> various aspects of 'reality' (from different perception-domains [sight,
> hearing,
> smelling, touching, ...]), which (I would say) is impossible to
> reverse-engineer
> without a connection to that 'reality'.

For text based AI problems (language modeling), the meanings of words
does not need to be grounded. It is only necessary to know that one
form of the word "lock" is statistically associated syntactically with
"went through" and semantically with "boat" to derive the correct
meaning.

> > The purpose of a compression contest is to test how much the compressor
> > knows, or alternatively, how fast it can learn. Knowledge can either
> > be hard coded into the decompressor, or it can be learned from the data
> > itself. In the first case, adding knowledge to the decompressor will
> > improve compression but will increase the size of the decompression
> > program. In the second case, a smarter learning algorithm will result
> > in better compression.
>
> You forget, that the algorithms nessesarily _must_ grow with it's
> complexity
> (Pigeon-hole applies to algorithms too). It took some time to converge
> to
> pure statistical compression-boundaries. So it will with the
> super-algorithm
> descriptivity. In the end we're reaching Kolmogorov-Complexity, which
> (for
> all possible sources best compressor) will probaby an algorithm that is
> far more bigger than all know algorithms stuffed together (because it's
> the best possible on everything). The question is, what does it mean
> when
> we detect that KC has no practical meaning, and not at all the
> expected.

The general problem of algorithmic compression is not computable, and
not our goal. Text compression and text-based AI are problems in
language modeling, which I believe is tractable.

> Anyway I would say in most cases for the last decade and for the
> forseeable
> future, data-compression has nothing to do with true AI (un-supervised
> algorithm development, self-modifying code and/or parameter-space), but
> is a competition of HI (human intelligence) or rather HS (human
> smartness).
> It's maybe arrogant (but because I don't kow it better) but for me
> finite-state
> machines do not carry the spark of AI.

As I explained in my last post, lossless text compression requires both
AI and deterministic computation. The latter is not possible for
humans, but easy for machines. Therefore, text compression reduces to
text-based AI. This is all explained at
http://cs.fit.edu/~mmahoney/compression/rationale.html

I don't believe that the human brain does anything that can't be done
on a computer with the right algorithm. However, we don't know any
algorithm for learning language that performs as well as humans in many
tasks, such as distinguishing likely text sequences from unlikely. The
Hutter prize will reward incremental steps toward this goal.

-- Matt Mahoney

jacko

unread,
Aug 9, 2006, 9:49:18 PM8/9/06
to
i built therfore i drempt and so was
we're are digital, and imageno deterministic
rational zoning for conversing quantize omni

so does this not base three thing work then?

a base two solution is prefered hypbolically extream volt logically

http://indi.joox.net/

and look for the K Ring Compression Codec directory, SQUIRT look for
pdf

block coding included?

cheesr

ianpa...@gmail.com

unread,
Aug 10, 2006, 6:50:34 AM8/10/06
to

I agree that maximal compression requires AI. However I do not know
whether maximum compression is computable in a time shorter than the
age of the Universe. Human intellence will only compress losslessly to
a limited extent. As I said putting things into your own words is lossy
compression.

There is one issue in your reference which I really have to take issue
wth.

"Turing does not require language translaton" - It does. The way to
beat Turing is to put in a nonsense sentence which is grammatically
correct.

"El barco attravesta una cerradura" is just such a sentence. It is
nonsense but grammatically correct and arises from a perfectly sensible
English sentence. To discuss your boating holiday "bueno espagnol" is
needed.

Wiki is multilingual. You say rightly that if a multilingual text were
to be compressed competitors would install language translators. I say
that that would put Hutter on a definitively AI track from the word go.

I accept that if there is no Kolmogorov criterion built in people will
cheat, although it could be that you could screen with a small data set
without Kolmogorov (size of program) being an issue.

As I said earlier if perfect compression were obtained it would be AI
hard. However I feel that Hutter (monolingual) might be won by a
program that did not show AI and was hard to improve upon. That is to
say a better compression was either not computable or computable with
considerable difficulty.


>
> I don't believe that the human brain does anything that can't be done
> on a computer with the right algorithm. However, we don't know any
> algorithm for learning language that performs as well as humans in many
> tasks, such as distinguishing likely text sequences from unlikely. The
> Hutter prize will reward incremental steps toward this goal.
>

That is not really the issue. Human intelligence is lossy.

> -- Ian Parker

niels.f...@seies.de

unread,
Aug 10, 2006, 11:41:31 AM8/10/06
to
> For text based AI problems (language modeling), the meanings of words
> does not need to be grounded. It is only necessary to know that one
> form of the word "lock" is statistically associated syntactically with
> "went through" and semantically with "boat" to derive the correct
> meaning.

I think you're wrong when you say that AI is (only) statistical based.
For sure you will receive statistical bounds for your algorithm if
you're only working in the statistical domain. But as you say below
(that you think a computer can 'simulate' anything a human can 'think')
you _must_ leave the statistical domain to make AI, which for example
means to explore semantic (which is not statistical based).

> The general problem of algorithmic compression is not computable, and
> not our goal. Text compression and text-based AI are problems in
> language modeling, which I believe is tractable.

I understood you believe that AI algorithms are within a sub-domain of
compression algorithms (may be exagerrated -> compression implies AI).
I can hardly believe RLE possesses self-inteligence (not gifted by the
programmer). Compression is transform-coding nothing more, if you make
a NN mixer, you have to split your thing apart i definition, the mixer
might make /intelligent/ decisions, but doesn't compress, the driven
transform-coders compress. A human is not called or defined intelligent
in it's mere whole form (is my food intelligent?), but the brain only.
But then if we compare the complexity in self-organisation of your NN
mixer with the NN of an ant (which we mostly not call intelligent, but
an amazing math-machine), it's about magnitudes less flexible, less
/intelligent/, so it's less intelligent than something that is not
intelligent at all.

> As I explained in my last post, lossless text compression requires both
> AI and deterministic computation.

Order-0 Huffman possesses intelligence?

> The latter is not possible for
> humans, but easy for machines. Therefore, text compression reduces to
> text-based AI. This is all explained at
> http://cs.fit.edu/~mmahoney/compression/rationale.html
>
> I don't believe that the human brain does anything that can't be done
> on a computer with the right algorithm.

Creativity and Intuition (metaphor -> semantic/personality)? Sorry I
start to be a bit sarcastic, not because I put the humans so high on a
socket, but because you're too simplifistic for my taste. We possess a
brain with billions of neurons and helper-cells that pseudo-supervised
(by the brain-stem, so by genome) create un-understandable
connectivity. Which so far seems to be requirement for human language
(less complex systems have less complex languages).
My point of view is from the philosophical/logic one: Can a system
understand itself? To understand a system it's nessessary to have a
view of it, where comes the 'memory' from this view in the self? Also
if the memory would be available unlimited, the growth of the memory
(which is part of the self) means recursive infinite expansion (of
self).
Can a system understand more complex things than itself?
How much more complex need a system to be than the thing to
understand?
Which computer-AI is complex enough to understand for example human
language (it's about understanding, not about statistical
distribution)?

> However, we don't know any
> algorithm for learning language that performs as well as humans in many
> tasks, such as distinguishing likely text sequences from unlikely. The
> Hutter prize will reward incremental steps toward this goal.

Without creativity and perception it's not very likely that an
algorithm that successfully 'learned' english, will ever produce
reasonable english text on it's own.

>> The problem is, if a AI driven self-learning compressor will
>> receive a meaning that is analogue to what a human understands,
>> without ever seeing a book or opening a lock.

And because of that inability (of creating language-fragments) it will
not be able to translate languages either.

Matt Mahoney

unread,
Aug 10, 2006, 1:56:10 PM8/10/06
to
ianpa...@gmail.com wrote:
> I agree that maximal compression requires AI. However I do not know
> whether maximum compression is computable in a time shorter than the
> age of the Universe. Human intellence will only compress losslessly to
> a limited extent. As I said putting things into your own words is lossy
> compression.

Maximal compression in general is not computable. The goal is better
compression. We do not know how much computation is required to
achieve any given compression level. That problem is not computable
either. That is why we have is a contest.

> There is one issue in your reference which I really have to take issue
> wth.
>
> "Turing does not require language translaton" - It does. The way to
> beat Turing is to put in a nonsense sentence which is grammatically
> correct.

As Turing described the test for AI, he did not require any of the
people or machines to be multilingual. Many people that you might
consider intelligent speak only their native language.

A monolingual language model has applications in language translation.
A person speaking only one language can clean up a bad machine
translation from Babelfish, for example. In general, if x is a string
in English, and y is a string in Spanish, then the problem of
translating English to Spanish is to find y maximizing p(y|x). By
Bayes law, this is the same as maximizing p(x|y)p(y). p(y) is a
language model that only knows Spanish.

> I accept that if there is no Kolmogorov criterion built in people will
> cheat, although it could be that you could screen with a small data set
> without Kolmogorov (size of program) being an issue.

This is an issue with many data compression benchmarks. For example,
maximumcompression.com has two benchmarks, one public and one not, to
prevent people from tuning their compressors to the benchmark. I felt
it was important that all data be public so that test results could be
verified. If you have only a public test set, then you have to include
the decompressor size, as in the Calgary challenge.

> > I don't believe that the human brain does anything that can't be done
> > on a computer with the right algorithm. However, we don't know any
> > algorithm for learning language that performs as well as humans in many
> > tasks, such as distinguishing likely text sequences from unlikely. The
> > Hutter prize will reward incremental steps toward this goal.
> >
> That is not really the issue. Human intelligence is lossy.

That is because the human brain is not deterministic. I believe it is
valid to use a test for AI that requires deterministic computation,
because that is not a hardship for machines.

-- Matt Mahoney

Matt Mahoney

unread,
Aug 10, 2006, 2:36:23 PM8/10/06
to
niels.f...@seies.de wrote:
> > For text based AI problems (language modeling), the meanings of words
> > does not need to be grounded. It is only necessary to know that one
> > form of the word "lock" is statistically associated syntactically with
> > "went through" and semantically with "boat" to derive the correct
> > meaning.
>
> I think you're wrong when you say that AI is (only) statistical based.
> For sure you will receive statistical bounds for your algorithm if
> you're only working in the statistical domain. But as you say below
> (that you think a computer can 'simulate' anything a human can 'think')
> you _must_ leave the statistical domain to make AI, which for example
> means to explore semantic (which is not statistical based).

Statistical models have been used to pass the word analogy section of
the SAT exams.
http://arxiv.org/ftp/cs/papers/0508/0508053.pdf

> > As I explained in my last post, lossless text compression requires both
> > AI and deterministic computation.
>
> Order-0 Huffman possesses intelligence?

Order 0 models do poorly on text.
http://cs.fit.edu/~mmahoney/compression/text.html
Isn't this evidence that good compression requires intelligence?

In order to predict sequences like "3+8=_", it should be apparent that
a good compressor needs to be able to solve arithmetic problems.
Likewise, it needs vast knowledge, the ability to reason and solve
problems, common sense, etc. Furthermore, the compresssor needs to be
able to learn most of what it knows from the test data itself, because
any knowledge directly coded into the decompressor will increase its
size. If you want to improve compression, you will need to figure out
how to model these things that the human brain does so well. That is
the goal.

> > I don't believe that the human brain does anything that can't be done
> > on a computer with the right algorithm.
>
> Creativity and Intuition (metaphor -> semantic/personality)? Sorry I
> start to be a bit sarcastic, not because I put the humans so high on a
> socket, but because you're too simplifistic for my taste. We possess a
> brain with billions of neurons and helper-cells that pseudo-supervised
> (by the brain-stem, so by genome) create un-understandable
> connectivity. Which so far seems to be requirement for human language
> (less complex systems have less complex languages).

Ancient people believed that the gods made the sun move around the
earth, because there was no other reasonable explanation. We do not
know how to model creativity and intuition, but this is not evidence
for mystical forces outside the laws of physics and the firing of
neurons.

> My point of view is from the philosophical/logic one: Can a system
> understand itself? To understand a system it's nessessary to have a
> view of it, where comes the 'memory' from this view in the self? Also
> if the memory would be available unlimited, the growth of the memory
> (which is part of the self) means recursive infinite expansion (of
> self).
> Can a system understand more complex things than itself?
> How much more complex need a system to be than the thing to
> understand?

A state machine cannot simulate itself. This is easily proven. The
simulation requires at least as many states as the simulator.

Likewise, a human will not be able to comprehend the workings of a
machine as smart as itself. This does not mean that humans cannot
build such a machine.

> Which computer-AI is complex enough to understand for example human
> language (it's about understanding, not about statistical
> distribution)?

How do you test if a machine "understands", as opposed to giving the
right answers?

-- Matt Mahoney

ianpa...@gmail.com

unread,
Aug 13, 2006, 11:32:18 AM8/13/06
to

Matt Mahoney wrote:
> Regarding the different translations of "lock" into Spanish, I believe
> that 1 GB of text is sufficient to resolve word sense ambiguity
> problems like this. One form would be associated by proximity in
> running text to "key", "door", etc. another to "ship", "canal", etc.
> There are LSA models that do this now after training on a few hundred
> megabytes.
>
Indeed, this is why I think a monolingual Hutter was a missed
opportunity. How good is each LSA model?

The examples I gave were all obtained fropm Google Translate. Google
would be well advised to incorporate these models into its search
pattern. The different "locks" and the different "springs" are surely
important in retrieving the right information.

> It is an open question whether 1 GB of Wikipedia is the best data set
> for this test. Probably not, but I think it is close enough.
>

Probably right. Hutter competitors could try other texts and the rests
would be of interest.

-- Ian Parker

Matt Mahoney

unread,
Aug 13, 2006, 2:18:11 PM8/13/06
to
ianpa...@gmail.com wrote:
> Matt Mahoney wrote:
> > Regarding the different translations of "lock" into Spanish, I believe
> > that 1 GB of text is sufficient to resolve word sense ambiguity
> > problems like this. One form would be associated by proximity in
> > running text to "key", "door", etc. another to "ship", "canal", etc.
> > There are LSA models that do this now after training on a few hundred
> > megabytes.
> >
> Indeed, this is why I think a monolingual Hutter was a missed
> opportunity. How good is each LSA model?
>
> The examples I gave were all obtained fropm Google Translate. Google
> would be well advised to incorporate these models into its search
> pattern. The different "locks" and the different "springs" are surely
> important in retrieving the right information.

LSA maps words into a vector space of a few hundred dimensions, such
that words that are likely to appear close together in running text
will also be close together in this space. Such words tend to be
related semantically as well, like "key" and "door". If a word has
more than one meaning, like "lock", it ends up somewhere between the
clusters corresponding to each of its meanings.

LSA can be used for word sense disambiguation. If you slide a window
over text, then average all of the LSA vectors in that window, it will
tell you what the text in that window is "about". If you plot these
window averages each time you encounter the word "lock" in the middle
of the window, these points will tend to form clusters, one for each
sense of the word.

LSA is generally expressed using matrices, but you can think of it as a
3 layer linear neural network that predicts nearby words given some of
the words in a window. There is one input neuron and one output neuron
for each word in the vocabulary. There are a few hundred neurons in
the hidden layer, which represents the semantic vector.

-- Matt Mahoney

ianpa...@gmail.com

unread,
Aug 14, 2006, 3:25:46 PM8/14/06
to
I have looked up the references to LSA. It appears that the initial
choice of dimensions is somewhat arbitary. In fact LSA is designed to
look at essays!

Suppose we take a monolingual text and try to predict the next word and
draw up a list of possibilities. During decompression the same
probabilities are calculated. We usually have an information content of
about 10 words. Hence a series of words is described by short bit
strings.

If we are looking at speech we may need a number of iterations to
converge to the correct answer. In the case of the two locks
convergence is in one stage.

A compressed string will be a string which is apparently random.
Getting to this stage implies AI and no one can really claim it does
not. In the case of multilingual texts of course human translators
usually find the right word. The additional information content is
therefore zero.

I accept what you say that the converse is not true. AI implies only
lossy compression.

BTW with large matrices only the standard metods of diagonalization
like Householder have guaranteed convergence.

- Ian Parker

ianpa...@gmail.com

unread,
Aug 16, 2006, 6:29:32 AM8/16/06
to

Matt Mahoney wrote:
> ianpa...@gmail.com wrote:
> > I agree that maximal compression requires AI. However I do not know
> > whether maximum compression is computable in a time shorter than the
> > age of the Universe. Human intellence will only compress losslessly to
> > a limited extent. As I said putting things into your own words is lossy
> > compression.
>
> Maximal compression in general is not computable. The goal is better
> compression. We do not know how much computation is required to
> achieve any given compression level. That problem is not computable
> either. That is why we have is a contest.
>

True, it remains to be seen whether in fact the best computable
solution implies AI or not. It is clear that the best solution is AI.
The best computable solution may or may not be. In the case of a
multilingual corpus it almost certainly is AI.

> > There is one issue in your reference which I really have to take issue
> > wth.
> >
> > "Turing does not require language translaton" - It does. The way to
> > beat Turing is to put in a nonsense sentence which is grammatically
> > correct.
>
> As Turing described the test for AI, he did not require any of the
> people or machines to be multilingual. Many people that you might
> consider intelligent speak only their native language.
>
> A monolingual language model has applications in language translation.
> A person speaking only one language can clean up a bad machine
> translation from Babelfish, for example. In general, if x is a string
> in English, and y is a string in Spanish, then the problem of
> translating English to Spanish is to find y maximizing p(y|x). By
> Bayes law, this is the same as maximizing p(x|y)p(y). p(y) is a
> language model that only knows Spanish.
>

If it is because you don't know Spanish - yes. but a "cerradura" or
"ressorte" implies a position in your 300 or do dimensional LSA space.
The Turing test implies correct positioning in multidimensional space.
Spanish, or another 2L is the best demonstrator of Eucidean
positioning.


> > I accept that if there is no Kolmogorov criterion built in people will
> > cheat, although it could be that you could screen with a small data set
> > without Kolmogorov (size of program) being an issue.
>
>

> > > I don't believe that the human brain does anything that can't be done
> > > on a computer with the right algorithm. However, we don't know any
> > > algorithm for learning language that performs as well as humans in many
> > > tasks, such as distinguishing likely text sequences from unlikely. The
> > > Hutter prize will reward incremental steps toward this goal.
> > >
> > That is not really the issue. Human intelligence is lossy.
>
> That is because the human brain is not deterministic. I believe it is
> valid to use a test for AI that requires deterministic computation,
> because that is not a hardship for machines.
>

It is also because the inherent meaning of Wiki can be expressed in
many different ways. Maybe I could explain something better than Wiki.

Determinism (of the computer) is quite interesting in the philosopical
sense. Another identical computer will arrive at the same result.
Computers can be chaotic. Let us for example simulate aerodynamic flow.
The outcome is critically dependent on the initial conditions. AI
programs might or might not behave chaotically.

BTW - If you asked a computer to discuss Katrina and why it happenned
and the computer actually went into simulation it would still be
determionistic in one sense, but given minute changes it would produce
a different answer. A hurricane after all is caused by a butterfly in
Japan.

-- Ian Parker

Reply all
Reply to author
Forward
0 new messages