Google Founder: AI Will Be Solved by Brute Force

136 views
Skip to first unread message

jabo...@gmail.com

unread,
Feb 23, 2007, 1:18:57 PM2/23/07
to Hutter Prize
Larry Page, co-founder of Google, gave a talk in which he opines that
artificial intelligence will be solved by brute force and that Google,
(which happens to be the biggest owner of computers in the world) is
working on it.

http://news.com.com/1606-2_3-6160334.html?tag=ne.vid

As usual the distinction between compression and decompression is
crucial. It may be that to _write_ the ultimate Kolmogorov Complexity
code (compression) of human knowledge, hence the optimal knowledge
representation, will require huge computation resources, including a
great deal of _human_ analysis of data. But the Hutter Prize's
computation limitations are about DEcompression -- or the generation
of a "response". If you choose a knowledge representation that can't
decompress rapidly enough then it won't be useful for Google's
responses to user queries.

Transcribing the video:

My prediction is that when AI happens, its going to be a lot of
computation and not so much clever blackboard/whiteboard kind of stuff
-- clever algorithms -- but just a LOT of computation. My theory is
that if you look at your programming -- your DNA -- is about 600
megabytes compressed. Its smaller than any modern operating system.
Its smaller than Linux, or Windows or anything like that. Your whole
operating system. That includes booting up your brain, right? By
definition. And so you program codes probably aren't that
complicated. It's probably more about the overall computation.
That's my guess.

We have some people at Google who are really trying to build
artificial intelligence. And to do it in a large scale and so on.
And in fact ... to do the perfect job of search you could ask any
query and it would give you the perfect answer and that would be
artificial intelligence... based on everything being on the Web, which
is a pretty close approximation.

So I think we're lucky enough to be working incrementally closer to
that, but again, very very few people are working on this. I don't
think its THAT far off as people think.

Matt Mahoney

unread,
Feb 23, 2007, 5:38:47 PM2/23/07
to Hutter Prize
On Feb 23, 1:18 pm, jabow...@gmail.com wrote:
> Larry Page, co-founder of Google, gave a talk in which he opines that
> artificial intelligence will be solved by brute force and that Google,
> (which happens to be the biggest owner of computers in the world) is
> working on it.
>
> http://news.com.com/1606-2_3-6160334.html?tag=ne.vid
>
> As usual the distinction between compression and decompression is
> crucial. It may be that to _write_ the ultimate Kolmogorov Complexity
> code (compression) of human knowledge, hence the optimal knowledge
> representation, will require huge computation resources, including a
> great deal of _human_ analysis of data. But the Hutter Prize's
> computation limitations are about DEcompression -- or the generation
> of a "response". If you choose a knowledge representation that can't
> decompress rapidly enough then it won't be useful for Google's
> responses to user queries.

I tend to agree with Page. Google has the money, brains, computing
power, and motivation to solve AI. They are working with Doug Lenat
from Cycorp. Google has a legitimate interest in answering natural
language queries, delivering more relevant ads, detecting porn and
spam, and making images, audio, and video searchable.

The compression approach is to reward the fastest way to learn a
language model. I picked 1 GB of text for the large text benchmark
because that's how much the average human is exposed to. To solve the
problem, you have to figure out how people learn language. But Google
doesn't need to go this route. Suppose you have the problem of
recognizing whether a sentence makes sense. You could analyze the
syntax and semantics, or you could count exact matches. With 1 GB it
is unlikely you will find an exact match to sequences longer than
about 3 words. With 10 TB you can match about 5 words. This is not
AI, but it gets you closer. If they can get away with inefficient AI
and make up for it with brute force, why wouldn't they?

As for compression vs. decompression, yes compression is harder.
Kolmogorov compression is not computable. But AI is. We know so
because the brain does it with finite resources. Predicting text is
not as hard as the general problem, which might also include
compressing encrypted data.

Most of the best compressors use the same resources to compress as to
decompress. They use predictive models where both the compressor and
decompressor perform the same computation to learn the language. In
theory the model for a fixed data set could be hand coded, in which
case decompression would be faster. I don't believe this will happen
because programming a learning model and training it is easier than
programming the learned knowledge directly.

-- Matt Mahoney

jabo...@gmail.com

unread,
Feb 23, 2007, 6:43:56 PM2/23/07
to Hutter Prize
On Feb 23, 2:38 pm, "Matt Mahoney" <matmaho...@yahoo.com> wrote:
> I tend to agree with Page. Google has the money, brains, computing
> power, and motivation to solve AI. They are working with Doug Lenat
> from Cycorp...

> The compression approach is to reward the fastest way to learn a
> language model.

The Cycorp approach is virtually the opposite of using automating
compression as the fastest way to learn a language model. He's got a
small army of philosophy-degree holders doing knowledge representation
work.

> As for compression vs. decompression, yes compression is harder.
> Kolmogorov compression is not computable. But AI is. We know so
> because the brain does it with finite resources.

Be careful here. What the brain does during language processing is
decode language contain previously "compressed" observations of the
world. Sometimes these compressed observations of the world are the
wisdom of the ages -- the result of a very arduous process involving
generations of people trying various models of the world to come up
with knowledge -- frequently observations compressed as language --
and pass it on verbally to the next generation.


Matt Mahoney

unread,
Feb 23, 2007, 8:24:57 PM2/23/07
to Hutter Prize

On Feb 23, 6:43 pm, jabow...@gmail.com wrote:
> On Feb 23, 2:38 pm, "Matt Mahoney" <matmaho...@yahoo.com> wrote:
>
> > I tend to agree with Page. Google has the money, brains, computing
> > power, and motivation to solve AI. They are working with Doug Lenat
> > from Cycorp...
> > The compression approach is to reward the fastest way to learn a
> > language model.
>
> The Cycorp approach is virtually the opposite of using automating
> compression as the fastest way to learn a language model. He's got a
> small army of philosophy-degree holders doing knowledge representation
> work.

I don't buy the Cyc approach either, but I saw a talk by Lenat at
Google and he seems keenly aware of the limitations of the current
state of AI and what problems remain to be solved. I think what will
happen is that Google will figure out a way to use the Cyc knowledge
base much like using a dictionary or thesaurus to augment a
statistical language model. Cyc knows properties of many objects such
as size, weight, cost, etc. and can reason that a 747 is bigger than
your big toe. But after playing the FACTory game for awhile (http://
www.cyc.com/) you quickly become aware of its limitations. At the
very least Cyc could convert their bucket of assertions into English
statements and dump them into Google's sea of text.

> > As for compression vs. decompression, yes compression is harder.
> > Kolmogorov compression is not computable. But AI is. We know so
> > because the brain does it with finite resources.
>
> Be careful here. What the brain does during language processing is
> decode language contain previously "compressed" observations of the
> world. Sometimes these compressed observations of the world are the
> wisdom of the ages -- the result of a very arduous process involving
> generations of people trying various models of the world to come up
> with knowledge -- frequently observations compressed as language --
> and pass it on verbally to the next generation.

Let's not confuse the role of compression. In a deterministic machine
we can use lossless compression (or some other measure) to evaluate a
language model at the task of prediction. The human brain also
predicts, but does not compress, because neurons are noisy. The
prediction is not exactly repeatable.

You are probably referring to pattern recognition or filtering of
sensory data. When we write a lossy compressor for images, video, or
audio, we aspire to duplicate as closely as possible the same
filtering operations, to discard precisely the same information that
the human brain would discard, so that the loss of information is not
perceptable.

I realize that lossy pattern recognition also occurs in language
modeling, but not in the role of evaluation. Rather, its role is to
map input to context for context modeling. Thus, the phrases "Alice
gave Bob money" and "Bob was paid by Alice" would ideally be mapped to
the same context. Any good compressor does such many-to-one "lossy"
context mapping, but at a lower level.

-- Matt Mahoney

jabo...@gmail.com

unread,
Feb 24, 2007, 4:11:01 PM2/24/07
to Hutter Prize
I don't think what I'm bringing up is directly dependent on lossy vs
lossless compression -- except insofar as universal learning theory
depends on assuming a lossless model with two parts (as in MML, MDL,
AS, etc.): theory used to encode the data and the data so encoded.

What I'm talking about is more along the lines of thinking about
theory itself as consisting of two parts:

1) The theory that might be called the Bayesian prior transmitted to
and decoded by the individual's development -- in the form of first
genes and then words.

2) The theory that he develops on his own given 1 and further, more
direct, experience.

When a child learns a language model, he is using his genes (1) as the
Bayesian prior with the sounds of speech as the experience to develop
a theory on his own (2). However, when has acquired his language
model, he now is in a position to receive a further part of his
Bayesian prior -- words that convey his culture (1). These are his
software programming -- his culture -- his memes -- and they enable
him to more or less efficiently form more a more elaborate theory from
his further direct experience (2).

The Hutter Prize allows the theory to be derived by any means
whatsoever -- but the computation load of "decompressing" the theory
for a response to a query is the constraint. The same is true of
Google's need for AI.

James Bowery

unread,
Feb 25, 2007, 11:28:05 AM2/25/07
to Hutter Prize
I should add that it is not the case that Google can get away with any
old theory so long as it provides fast responses. A "brute force"
response consisting of memory-intensive look ups is not sufficient to
provide what Larry Page termed "the perfect answer" to a query. An
additional constraint is that these tables must have been generated
from a shortest program consistent with the content of the world wide
web, and other corpora -- for the reasons described by Hutter et al.

This points to a potential relaxation of the Hutter Prize computation
constraint:

Allow tabling pre-expansion of the theory portion of the two-part
(theory, encoded-data) Kolmogorov approximation of the corpus.

The problem with this is that if the tabling pre-expansion of the
theory portion requires the computation resources of a Fortune 500
company, we will be in a new regime of competition.

Reply all
Reply to author
Forward
0 new messages