Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

# Compression is Equivalent to General Intelligence

1,888 views

### James A. Bowery

Aug 13, 2006, 2:48:53 AM8/13/06
to
There is some confusion over the relationship between artificial
intelligence and compression that needs to be cleared up.

The following is an excerpt from Matt Mahoney's rationale for the
50,000 Euro, Hutter Prize for Lossless Compression of Human Knowledge (
http://prize.hutter1.net ):

Compression is Equivalent to General Intelligence

In 2000, Hutter [21,22] proved that finding the optimal behavior of a
rational agent is equivalent to compressing its observations.
Essentially he proved Occam's Razor [23], the simplest answer is
usually the correct answer. The proof applies to any goal-seeking agent
in any unknown environment which can be simulated by a computer. For
example, an agent could be a person or computer whose goal is to
accumulate money or some other reward signal in an unknown world.

Formally, the agent and environment are a pair of interacting Turing
machines that pass messages back and forth at each time step. In
addition, the environment outputs a reward signal (a number) to the
agent, and the agent's goal is to maximize the accumulated reward. What
Hutter proved is that the optimal behavior of an agent is to guess that
the environment is controlled by the shortest program that is
consistent with all of the interaction observed so far.

The proof is based on the Solomonoff distribution of random programs
(encoded as strings). The only possible probability distribution over
an infinite set of possible strings is one in which shorter strings
have higher probabilities than longer ones. Given any string s with
probability p(s) > 0, there can only be a finite number of strings
longer than s with probability greater than p(s), but there must be an
infinite number of strings longer than s with smaller probability.
Therefore, given any infinite set of programs with nonzero probability,
with any a-priori distribution, shorter programs must be more probable
than longer ones.

The problem of finding the shortest program consistent with an agent's
observations and actions is known as AIXI. AIXI implies a solution to
optimal algorithmic compression: coding a string as the shortest
program that outputs it. The problem is worth studying because it is
hard. The general problem is not computable [11], although Hutter
proved that if we assume time bounds t and space bounds l on the
environment, then this restricted problem, known as AIXItl, can be
solved in O(t2l) time.

...
11. Chaitin, G. J., "Algorithmic Information Theory", IBM Journal of
Research and Development 21:350-359, 496, 1997.
...
21. Hutter, M., Towards a Universal Theory of Artificial Intelligence
based on Algorithmic Probability and Sequential Decisions, Proceedings
of the 12th European Conference on Machine Learning (ECML-2001)
226-238, 2000.

22. Hutter, M., Universal Artificial Intelligence: Sequential Decisions
based on Algorithmic Probability, EATCS, Springer, Berlin, 2004.

23. Wikipedia, Occam's Razor,
http://en.wikipedia.org/wiki/Occam's_Razor, 2006.

[ comp.ai is moderated ... your article may take a while to appear. ]

### Dmitry A. Kazakov

Aug 13, 2006, 6:54:13 AM8/13/06
to
On Sun, 13 Aug 2006 06:48:53 GMT, James A. Bowery wrote:

[...]

> The proof is based on the Solomonoff distribution of random programs
> (encoded as strings). The only possible probability distribution over
> an infinite set of possible strings is one in which shorter strings
> have higher probabilities than longer ones. Given any string s with
> probability p(s) > 0, there can only be a finite number of strings
> longer than s with probability greater than p(s), but there must be an
> infinite number of strings longer than s with smaller probability.
> Therefore, given any infinite set of programs with nonzero probability,
> with any a-priori distribution, shorter programs must be more probable
> than longer ones.

It is a logical fallacy. The same way one could prove that any numeric
algorithm should always consider lesser numbers as the most probable
outcome, just because the expectation of any distribution should be finite.

Also, for any given N (length) one can have an infinite number of
distributions with all programs shorter than N be wrong.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

### matma...@yahoo.com

Aug 14, 2006, 12:35:27 AM8/14/06
to

I guess I should refer to the original source, which states it more
formally.
http://www.idsia.ch/~marcus/ai/paixi.htm

-- Matt Mahoney

### David Kinny

Aug 14, 2006, 7:44:16 AM8/14/06
to
"James A. Bowery" <jabo...@gmail.com> writes:

> There is some confusion over the relationship between artificial
> intelligence and compression that needs to be cleared up.

> The following is an excerpt from Matt Mahoney's rationale for the
> 50,000 Euro, Hutter Prize for Lossless Compression of Human Knowledge (
> http://prize.hutter1.net ):

> Compression is Equivalent to General Intelligence

> In 2000, Hutter [21,22] proved that finding the optimal behavior of a
> rational agent is equivalent to compressing its observations.

There is no mention of compression in his 2000 paper. But it
certainly follows that having an exact model of the true probability
of an agent's observations, which allows the maximization of reward,
also allows one to optimally compress them. But how does showing
that one can learn these probabilities by a brute force approach
which converges absurdly slowly have anything to tell us about how
to achieve machine intelligence with realistic resources?

> [...]

> The problem of finding the shortest program consistent with an agent's
> observations and actions is known as AIXI. AIXI implies a solution to
> optimal algorithmic compression: coding a string as the shortest
> program that outputs it. The problem is worth studying because it is
> hard. The general problem is not computable [11], although Hutter
> proved that if we assume time bounds t and space bounds l on the
> environment, then this restricted problem, known as AIXItl, can be
> solved in O(t2l) time.

O(t2^l) actually. Absurdly large.

Hutter's work, while undoubtedly clever, demonstrates more than
anything else how misleading results in asymptotic complexity can be.
AIXI essentially searches over all possible programs and simulates them
in parallel to find ones with desired properties. AIXI is optimal
because it can in theory eventually find any program and is "as good as"
the best one it finds, but only if we ignore the gargantuan additive and
multiplicative slowdown factors in its execution time. Of theoretical
interest, yes, but irrelevant to the problem of resource-bounded AI.

David

### matma...@yahoo.com

Aug 22, 2006, 8:56:11 PM8/22/06
to
David Kinny wrote:
> "James A. Bowery" <jabo...@gmail.com> writes:
>
> > There is some confusion over the relationship between artificial
> > intelligence and compression that needs to be cleared up.
>
> > The following is an excerpt from Matt Mahoney's rationale for the
> > 50,000 Euro, Hutter Prize for Lossless Compression of Human Knowledge (
> > http://prize.hutter1.net ):
>
> > Compression is Equivalent to General Intelligence
>
> > In 2000, Hutter [21,22] proved that finding the optimal behavior of a
> > rational agent is equivalent to compressing its observations.
>
> There is no mention of compression in his 2000 paper. But it
> certainly follows that having an exact model of the true probability
> of an agent's observations, which allows the maximization of reward,
> also allows one to optimally compress them. But how does showing
> that one can learn these probabilities by a brute force approach
> which converges absurdly slowly have anything to tell us about how
> to achieve machine intelligence with realistic resources?

The goal of the Hutter prize is not to support AIXI, but to support the
lesser goal of language modeling. AIXI does not need experimental
support, since it is a proven, theoretical result. We mention it
because it helps justify our approach to the language modeling problem