782 views

Skip to first unread message

Aug 13, 2006, 2:48:53 AM8/13/06

to

There is some confusion over the relationship between artificial

intelligence and compression that needs to be cleared up.

intelligence and compression that needs to be cleared up.

The following is an excerpt from Matt Mahoney's rationale for the

50,000 Euro, Hutter Prize for Lossless Compression of Human Knowledge (

http://prize.hutter1.net ):

Compression is Equivalent to General Intelligence

In 2000, Hutter [21,22] proved that finding the optimal behavior of a

rational agent is equivalent to compressing its observations.

Essentially he proved Occam's Razor [23], the simplest answer is

usually the correct answer. The proof applies to any goal-seeking agent

in any unknown environment which can be simulated by a computer. For

example, an agent could be a person or computer whose goal is to

accumulate money or some other reward signal in an unknown world.

Formally, the agent and environment are a pair of interacting Turing

machines that pass messages back and forth at each time step. In

addition, the environment outputs a reward signal (a number) to the

agent, and the agent's goal is to maximize the accumulated reward. What

Hutter proved is that the optimal behavior of an agent is to guess that

the environment is controlled by the shortest program that is

consistent with all of the interaction observed so far.

The proof is based on the Solomonoff distribution of random programs

(encoded as strings). The only possible probability distribution over

an infinite set of possible strings is one in which shorter strings

have higher probabilities than longer ones. Given any string s with

probability p(s) > 0, there can only be a finite number of strings

longer than s with probability greater than p(s), but there must be an

infinite number of strings longer than s with smaller probability.

Therefore, given any infinite set of programs with nonzero probability,

with any a-priori distribution, shorter programs must be more probable

than longer ones.

The problem of finding the shortest program consistent with an agent's

observations and actions is known as AIXI. AIXI implies a solution to

optimal algorithmic compression: coding a string as the shortest

program that outputs it. The problem is worth studying because it is

hard. The general problem is not computable [11], although Hutter

proved that if we assume time bounds t and space bounds l on the

environment, then this restricted problem, known as AIXItl, can be

solved in O(t2l) time.

...

11. Chaitin, G. J., "Algorithmic Information Theory", IBM Journal of

Research and Development 21:350-359, 496, 1997.

...

21. Hutter, M., Towards a Universal Theory of Artificial Intelligence

based on Algorithmic Probability and Sequential Decisions, Proceedings

of the 12th European Conference on Machine Learning (ECML-2001)

226-238, 2000.

22. Hutter, M., Universal Artificial Intelligence: Sequential Decisions

based on Algorithmic Probability, EATCS, Springer, Berlin, 2004.

23. Wikipedia, Occam's Razor,

http://en.wikipedia.org/wiki/Occam's_Razor, 2006.

[ comp.ai is moderated ... your article may take a while to appear. ]

Aug 13, 2006, 6:54:13 AM8/13/06

to

On Sun, 13 Aug 2006 06:48:53 GMT, James A. Bowery wrote:

[...]

> The proof is based on the Solomonoff distribution of random programs

> (encoded as strings). The only possible probability distribution over

> an infinite set of possible strings is one in which shorter strings

> have higher probabilities than longer ones. Given any string s with

> probability p(s) > 0, there can only be a finite number of strings

> longer than s with probability greater than p(s), but there must be an

> infinite number of strings longer than s with smaller probability.

> Therefore, given any infinite set of programs with nonzero probability,

> with any a-priori distribution, shorter programs must be more probable

> than longer ones.

It is a logical fallacy. The same way one could prove that any numeric

algorithm should always consider lesser numbers as the most probable

outcome, just because the expectation of any distribution should be finite.

Also, for any given N (length) one can have an infinite number of

distributions with all programs shorter than N be wrong.

--

Regards,

Dmitry A. Kazakov

http://www.dmitry-kazakov.de

Aug 14, 2006, 12:35:27 AM8/14/06

to

I guess I should refer to the original source, which states it more

formally.

http://www.idsia.ch/~marcus/ai/paixi.htm

-- Matt Mahoney

Aug 14, 2006, 7:44:16 AM8/14/06

to

"James A. Bowery" <jabo...@gmail.com> writes:

> There is some confusion over the relationship between artificial

> intelligence and compression that needs to be cleared up.

> The following is an excerpt from Matt Mahoney's rationale for the

> 50,000 Euro, Hutter Prize for Lossless Compression of Human Knowledge (

> http://prize.hutter1.net ):

> Compression is Equivalent to General Intelligence

> In 2000, Hutter [21,22] proved that finding the optimal behavior of a

> rational agent is equivalent to compressing its observations.

There is no mention of compression in his 2000 paper. But it

certainly follows that having an exact model of the true probability

of an agent's observations, which allows the maximization of reward,

also allows one to optimally compress them. But how does showing

that one can learn these probabilities by a brute force approach

which converges absurdly slowly have anything to tell us about how

to achieve machine intelligence with realistic resources?

> [...]

> The problem of finding the shortest program consistent with an agent's

> observations and actions is known as AIXI. AIXI implies a solution to

> optimal algorithmic compression: coding a string as the shortest

> program that outputs it. The problem is worth studying because it is

> hard. The general problem is not computable [11], although Hutter

> proved that if we assume time bounds t and space bounds l on the

> environment, then this restricted problem, known as AIXItl, can be

> solved in O(t2l) time.

O(t2^l) actually. Absurdly large.

Hutter's work, while undoubtedly clever, demonstrates more than

anything else how misleading results in asymptotic complexity can be.

AIXI essentially searches over all possible programs and simulates them

in parallel to find ones with desired properties. AIXI is optimal

because it can in theory eventually find any program and is "as good as"

the best one it finds, but only if we ignore the gargantuan additive and

multiplicative slowdown factors in its execution time. Of theoretical

interest, yes, but irrelevant to the problem of resource-bounded AI.

David

Aug 22, 2006, 8:56:11 PM8/22/06

to

David Kinny wrote:

> "James A. Bowery" <jabo...@gmail.com> writes:

>

> > There is some confusion over the relationship between artificial

> > intelligence and compression that needs to be cleared up.

>

> > The following is an excerpt from Matt Mahoney's rationale for the

> > 50,000 Euro, Hutter Prize for Lossless Compression of Human Knowledge (

> > http://prize.hutter1.net ):

>

> > Compression is Equivalent to General Intelligence

>

> > In 2000, Hutter [21,22] proved that finding the optimal behavior of a

> > rational agent is equivalent to compressing its observations.

>

> There is no mention of compression in his 2000 paper. But it

> certainly follows that having an exact model of the true probability

> of an agent's observations, which allows the maximization of reward,

> also allows one to optimally compress them. But how does showing

> that one can learn these probabilities by a brute force approach

> which converges absurdly slowly have anything to tell us about how

> to achieve machine intelligence with realistic resources?

> "James A. Bowery" <jabo...@gmail.com> writes:

>

> > There is some confusion over the relationship between artificial

> > intelligence and compression that needs to be cleared up.

>

> > The following is an excerpt from Matt Mahoney's rationale for the

> > 50,000 Euro, Hutter Prize for Lossless Compression of Human Knowledge (

> > http://prize.hutter1.net ):

>

> > Compression is Equivalent to General Intelligence

>

> > In 2000, Hutter [21,22] proved that finding the optimal behavior of a

> > rational agent is equivalent to compressing its observations.

>

> There is no mention of compression in his 2000 paper. But it

> certainly follows that having an exact model of the true probability

> of an agent's observations, which allows the maximization of reward,

> also allows one to optimally compress them. But how does showing

> that one can learn these probabilities by a brute force approach

> which converges absurdly slowly have anything to tell us about how

> to achieve machine intelligence with realistic resources?

The goal of the Hutter prize is not to support AIXI, but to support the

lesser goal of language modeling. AIXI does not need experimental

support, since it is a proven, theoretical result. We mention it

because it helps justify our approach to the language modeling problem

(which is admittedly controversial).

-- Matt Mahoney

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu