PS: Many people disagree with the posited definition of "artificial
intelligence quality", but it seems obvious to me. I'm not sure why
its not obvious to everyone but such is the case. Any suggestions on
how to get this across more effectively? Or are we dealing with
different cognitive types -- some will simply 'get it' and others
won't?
http://www.geocities.com/jim_bowery/cprize.html
The C-Prize
The most crucial technology prize of all.
By Jim Bowery
Copyright May 2005
The author grants the right to copy and distribute without
modification.
Since all technology prize awards are geared toward solving crucial
problems, the most crucial technology prize award of them all would be
one that solves the rest of them:
The C-Prize -- A prize that solves the artificial intelligence problem.
The C-Prize award criterion is as follows:
Let anyone submit a program that produces, with no inputs, one of the
major natural language corpora as output.
S = size of uncompressed corpus
P = size of program outputting the uncompressed corpus
R = S/P (the compression ratio).
Award monies in a manner similar to the M-Prize:
Previous record ratio: R0
New record ratio: R1=R0+X
Fund contains: $Z at noon GMT on day of new record
Winner receives: $Z * (X/(R0+X))
Compression program and decompression program are made open source.
Explanation
A very severe meta-problem with artificial intelligence is the question
of how one can define the quality of an artificial intelligence.
Fortunately there is an objective technique for ranking the quality of
artificial intelligence:
Kolmogorov Complexity
Kolmogorov Complexity is a mathematically precise formulation of
Ockham's Razor, which basically just says "Don't over-simplify or
over-complicate things." More formally, the Kolmogorov Complexity of a
given bit string is the minimum size of a Turing machine program
required to output, with no inputs, the given bit string.
Any set of programs which purport to be the standards of artificial
intelligence can be compared by simply comparing their Artificial
Intelligence Quality. Their AIQs can be precisely measured as follows:
Take an arbitrarily large corpus of writings sampled from the world
wide web. This corpus will establish the equivalent of an IQ test. Give
the AIs the task of compressing this corpus into the smallest
representation. This representation must be a program that, taking no
outside inputs, produces the exact sample it compressed. The AIQ of an
AI is simply the ratio of the size of the uncompressed writings to the
size of the program that, when executed, produces the uncompressed
writings.
In other words, the AIQ is the compression ratio achieved by the AI on
the AIQ test.
The reason this works as an AI quality test is that compression
requires predictive modeling. If you can predict what someone is going
to say, you have modeled their mental processes and by inference have a
superset of their mental faculties.
Mechanics
The C-Prize is to be modeled after the Methusela Mouse Prize or M-Prize
where people make pledges of money to the prize fund. If you would like
to help with the set up and/or administration of this prize award
similar to the M-Prize let me know by email. 1
http://www.geocities.com/jim_bowery/bafar.html
2 years later someone with a lot more money had put up a quarter
million dollars for the same criterion (but with a time limit):
http://groups-beta.google.com/group/sci.space.tech/browse_frm/thread/f5ade396224b6e1/699cabefb2b5ee10?q=%22cats+prize%22&rnum=19&hl=en#699cabefb2b5ee10
aka
http://tinyurl.com/bo7py
Then some guys in St. Louis started the ball rolling on the X-Prize,
with a bit different rules, which eventually paid out $10 million
dollars.
A similar story has happened with the M-Prize which is now quite
active. Indeed the money is rolling in:
http://mprize.org/index.php?pagename=releases
Prize legislation I drafted for fusion energy technology back in 1992
didn't get funded but it did pick up the support of one of the founders
of the Tokamak program:
http://www.geocities.com/jim_bowery/BussardsLetter.html
Prize awards for technology milestones are gaining credibility these
days and if people "get it" the C-Prize could take off in an even
bigger way than the rest of them.
Very interesting. There is a similar contest that has been going on
for 9 years. It might help you think about the details.
http://mailcom.com/challenge/
Have you picked out a text corpus? I think this is a good approach to
the AI problem, very objective, and easier to administer than the
Loebner prize.
Last year I once proposed almost exactly the same thing to NSF with a
$50K prize, but they would not fund it.
The connection between text compression and AI is not obvious, but
there is a simple proof that it solves the Turing test. The
compression problem is to code text strings x with length log2 1/P(x)
bits. But if you know the probability distribution P(x) (specifically
the distribution over chat sessions) then you could build a machine
that could answer questions such that it is indistinguishable from
human. Given a question q, you generate response r with distribution
P(r|q) = P(qr)/P(q). You can generalize q to any sequence of questions
and responses up to the last question.
I am glossing over some assumptions that need to be more clearly
stated. For example, I am assuming that the interrogator, machine, and
the machine's human opponent know the same distribution P(x), and that
people use the same distribution for writing text as they do for
recognition. I think these are reasonable approximations because the
machine only has to be close enough to fool the interrogator, not an
exact match to the "average" human.
Neither AI nor text compression is solved. In 1950 Shannon estimated
the entropy of written English to be about 1 bit per character (based
on how well humans can guess successive letters), but the best text
compressors achieve about 1.2-1.3 bpc. These are language models for
speech recognition that use a combination of word trigram models and
semantic models based on word proximity. I did some of my dissertation
work in this area before I switched to a less interesting topic where I
could get funding. http://cs.fit.edu/~mmahoney/dissertation/
-- Matt Mahoney
I ran across your poster session paper "Text Compression as a Test for
Artificial Intelligence" which is one of the things that inspired me to
look into the prize award approach. My original motivation is my
rather strong sense that Ockham's Razor is more than a nice rule of
thumb for scientists.
Your proof is elegant, simple and convincing however people still don't
"get it". When people start arguing about intelligence tests they can
get pretty squirrley and I think the Turing test is a poison that has
impeded thought about artificial intelligence more than it has advanced
it. Your paper is crucial to overcoming that handicap.
However, since the 1960s there have been other arguments advanced,
primarily by the algorithmic complexity guys, that should have taken
root in the AI community but didn't. I don't understand exactly why
that intellectual failure occurred in the AI community but it most
obviously did.
As for the text corpus I would choose:
That was one of the main questions I wanted to discuss because it is so
crucial to the success of such a prize. Right now if I had to choose a
single text corpus I'd pick the 1 terabyte corpus that Peter Turney
used in his recent accomplishment of human level performance on the SAT
verbal analogies test -- a feat that is particularly interesting given
the exceptionally high correlation between that test and general
intelligence aka the 'g' factor.
In this regard, you may be interested in my recently published article:
"AI Breakthrough or the Mismeasure of Machine?"
http://www.kuro5hin.org/story/2005/5/26/192639/466
I'm thinking of writing another article on the C-Prize.
It would be hard to conduct a test with a 1 TB corpus. Few people have
enough computing power to do this, and I am sure you don't want to host
the corpus on your website.
I would suggest something on the order of 1 GB. Turing estimated in
1950 that the AI problem would take 10^9 bits of memory. He did not
say how he came up with this number. But consider a human exposed to
150 words per minute, 12 hours a day for 20 years. This is about 4 GB
of text, or a 1 GB zip file. So a machine learning algorithm should
have a similar constraint.
Even this may be too high. The average human vocabulary is 30K common
words plus an equal number of proper nouns. To learn a word you have
to be exposed to it about 20 times. By Zipf's law, the n'th most
frequent word in English is 0.1/n. This means that you need about 12
million words or 50 MB.
Semantic models using LSA have used about 250 MB, like the WSJ corpus.
However, with a larger corpus there is a tendency to get lazy. For
example, with LSA you use the fuzzy equivalence A ~ B, meaning "A has
similar meaning to B" or "A and B appear in the same proximity".
A ~ A (reflexive, used by most data compressors)
A ~ B implies B ~ A (symmetric, used in distant bigram models)
A ~ B and B ~ C implies A ~ C (transitive, used by LSA)
LSA uses the transitive property to predict that C follows A even if
those two words were never seen together before. However with a larger
corpus you can collect the statistics directly rather than use LSA to
infer them, so you tend take the easy route. In the above paper on
word analogy solving, I see a lot of steps were data was discarded, but
for a 1 TB corpus it didn't matter.
You might be able to construct a high quality (diverse) corpus by
finding the minimum set of documents that contain all the words in a
dictionary. You might also consider a corpus of several languages to
test whether a model can learn a generic language without hardcoded
rules for English.
-- Matt Mahoney
Perhaps Wikipedia would suffice:
http://download.wikimedia.org/#wikipedia
The current english language edition, gzipped, is about 1GB, but as you
see, there are many Wikipedias in different languages. I doubt they
total over 4GB gzipped.
Compressing multiple languages would have a very good side effect:
Machine translation technology could be dramatically enhanced and
properly constructed compressors would discover things about the
various cultures that might not otherwise be obvious.
Part of my preference for picking a big corpus to begin with is so that
if the contest goes big time -- which it would if proper funding levels
were applied to it -- some sort of rules might be constructed to allow
some divisions of the contest for large computer systems to crunch away
doing more brute-force statistics such as Turney used, while other
divisions could do a fairly sampled subset of the total corpus. But
this is something we can deal with later when the competition (and
prize) gets much bigger.
PS: Another reason I'd like to see huge corpora ultimately is that I'd
really like to see this system beat humans by a substantial amount:
We're so stupid we desperately need all the help we can get.
Nice choice. Lots of contributors, lots of topics, high quality.
> PS: Another reason I'd like to see huge corpora ultimately is that I'd
> really like to see this system beat humans by a substantial amount:
> We're so stupid we desperately need all the help we can get.
When computers are smarter than humans, how will we know?
-- Matt Mahoney
He gives a formal proof, but it basically says that the only possible
distribution of the infinite set of programs (or strings) with nonzero
probability is one which favors shorter programs over longer ones.
Given any string of length n with probability p > 0, there are an
infinite set of strings longer than n, but only a finite number of
these can have probability higher than p.
-- Matt Mahoney
Wow.
If what you say is easily verifiable by consulting mathematicians then
you've just dramatically strengthened the case for the C-Prize. A
philanthropist can have a trusted consultant verify Hutter's proof and
the C-Prize's salience to it. The only remaining question is whether
the C-Prize is properly administered -- something the philanthropist
can verify with a combination of business and engineering consultants.
I had run across Hutter's AIXI paper before, when searching for
Solomonoff's algorithmic probability theory, but I read just the
abstract and didn't have the background for it so I sent it off to a
friend who is a doctoral candidate in statistical mechanics to review.
Being a doctoral candidate he doesn't have much time left to review
papers ourside his area so I never got the low-down you just gave.
Thanks for the clear synopsis.
A proof like this should have been available to researchers from the
1960s work on minimum length description (AP, KC, etc.) and decision
theory but apparently no one bothered to put 2 and 2 together for what
-- 40 years now?
This is going to be a major question for science historians:
Why did the AI community fail to formalize this proof for so long?
Given Moore's Law originated at about that time, 40 years ago, there's
a good argument that this ranks as one of the greatest losses of
potential to befall humanity.
Here are the drawbacks I see to using the "cur" download of Wikipedia:
1) The purported "neutral point of view" is subject to systemic bias.
The process supported by Wikipedia biases the content toward people who
are able and willing to contribute under those circumstances. It's not
clear how to go about identifying this bias let alone its impact.
2) If a snapshot of the "cur" downloads is delayed, it will be subject
to gaming by people who submit changes to articles that have a
Kolmogorov complexity that is known to be low just to the gamers.
Wikipedia keeps prior archives of the 'cur' downloads and it is
unlikely anyone has gamed the system so far but now that we've broached
the subject there is the potential that future versions of the 'cur'
downloads will have been so-gamed.
3) The entire history of edits is about a factor of 10 larger than the
current version. This would be a superior corpus for the purpose of
ferreting out how various points of view bias content -- and be very
relevant to epistemology, social and political sciences -- thereby
creating a superior AI capable of considering the source of various
assertions. If this really is beyond the capacity of computers that
would be available to viable contestants then it might be necessary to
defer use of that larger corpus until more capacity or more funding for
the prize is available.
For a discussion of the downloads see:
http://en.wikipedia.org/wiki/Wikipedia:Database_download#Weekly_database_dumps
> When computers are smarter than humans, how will we know?
Given Hutter's proof my guess is we'll know when they are capable of
more compression than humans given a comparable base of knowledge.
There is always the question of how do we know how much compression
humans are capable of. I don't have that question answered.
I think as long as there is enough text in the data that a compressor
has to be able to learn semantics, syntax, coherence, etc. to compress
it effectively, then it should be a good test of AI. As long as
everyone works with the same data, it should be fair. Whatever version
is used will become outdated, but this should not matter to a
compressor that starts with no knowledge. Wikipedia is not completely
representative of all possible human communication, but I think it is
close enough. I don't think the edits are that useful because you end
up with lots of nearly identical texts that are easy to compress.
-- Matt Mahoney
I don't think there is much problem with the proof so much as there is
with arguing whether it means anything. Hutter gives a formal
definition of an agent, but then we have to make the leap that an agent
is a good model of human behavior. (It seems so). Another assumption
is that the universe is computable. Again it seems like a reasonable
assumption, but others might not agree.
> I had run across Hutter's AIXI paper before, when searching for
> Solomonoff's algorithmic probability theory, but I read just the
> abstract and didn't have the background for it so I sent it off to a
> friend who is a doctoral candidate in statistical mechanics to review.
> Being a doctoral candidate he doesn't have much time left to review
> papers ourside his area so I never got the low-down you just gave.
> Thanks for the clear synopsis.
>
> A proof like this should have been available to researchers from the
> 1960s work on minimum length description (AP, KC, etc.) and decision
> theory but apparently no one bothered to put 2 and 2 together for what
> -- 40 years now?
>
> This is going to be a major question for science historians:
>
> Why did the AI community fail to formalize this proof for so long?
>
> Given Moore's Law originated at about that time, 40 years ago, there's
> a good argument that this ranks as one of the greatest losses of
> potential to befall humanity.
I think the AI community ignored it is because it does not lead to a
practical solution to AI. Finding the shortest program consistent with
some data is not computable. Hutter describes a restricted version
which runs in exponential time, but again this is not very useful.
What it does is formalize the goal of machine learning to generalize to
unseen data, something researchers have already been doing. Hutter's
work puts this in a consistent framework.
-- Matt Mahoney
It will be necessary for contestants to use file systems that support
larger than 2GiB files but most Linux distros default to file systems
such as ext3 which have a 2GiB file size limit. The current distros
come with more capable file systems such as Reiserfs but they don't
default your partitions to them.
Also, these files are in SQL dump format so you can load up MySQL with
them. It may not be necessary to convert them to plain text files for
the contest because the vast bulk of the content is straight Wiki
markup natural language text.
Finding the shortest program isn't as important as having a simple
metric like compression ratio that can act as a measure of
intelligence. Engineering is largely about progressive optimization of
some figure of merit. If you don't have a figure of merit you are
going to be awash in subjective valuations. I did quite a bit of
practical early work in neural network image segmentation. While
pruning was crucial for generalization -- and led me to my belief there
was a lot more to Ockham's Razor than a nice rule of thumb -- I didn't
see much evidence of anyone using lossless compression ratio as a test.
Moreover the assumptions you point out are as reasonable as any people
routinely adopt for massive investments in technologically risky
enterprises. I think Hutter's proof is a really important if for no
other reason than that it provides a strong basis for using compression
ratio as the figure of merit that has been missing from AI.
I downloaded the English version, about a 949 MB gzip file,
uncompressing to 2.7 GB. Took a few minutes with a cable modem. WinME
can handle files over 2GB but a lot of programs like "type" apparently
can't, and give strange errors like "access denied". gzip 1.2.4 dies.
gzip 1.3.5 works OK when used normally, but gzip -dc redirected to a
file gives an error, "GZIP.EXE: stdout: Bad file descriptor", and
truncates the output to 2 GB (which is actually what I want because I
can read it).
The data is not exactly plain text. There is a lot of structure,
links, tables, etc. Mostly this just adds complexity to the model but
still requires AI to solve so it should serve as a benchmark. Here are
a few random snips.
o extend the cultural, educational and other benefits of the Framework
Convention for the Protection of National Minorities to the
Cornish.\n[[User:Bretagne 44| Bretagne 44]] 23/3/05\n\nIf there truly
is a Cornish nation, then it shouldn\'t even be on the [[English
(people)|article]], which is about the English nation. A nation which
is (linguistically, culturally and possibly genetically) related to the
[[Germans]], the [[Dutch]], the [[Lowland Scots]], the [[Protestant]]s
of
450;ナキズム]]\n[[no:Anarkisme]]\n[[pl:Anarchizm]]\n[[pt
:Anarquismo]]\n[[ru:Анархизм]]\n
[[simple:Anarchism]]\n[[sr:Анархиз

72;м]]\n[[sv:Anarkism]]\n[[th:ลัทธิอ&#
3609;าธิปไตย]]','spelling',0,'205.206.
13.77','20050516015104','move=:edit=',5252,0,0,0,0.786172332974311,'799494839848
95','20050516015104'),(13,0,'AfghanistanHistory','#REDIRECT [[History
of Afghanistan]]','whoops',4,'Magnus
Manske','20020827030744','',5,1,1,0,0.062150286568468
Retardation and Developmental Disabilities Research Reviews | Volume=8
| Issue=3 | Year=2002 | Pages=151–61}}
[[http://www.ncbi.nlm.nih.gov/entrez/query.
fcgi?cmd=Retrieve&db=PubMed&list_uids=12216059&dopt=Abstract
abstract])\n\n* {{Journal reference issue | Author=Croen LA, Grether
JK, Hoogstrate J, Selvin S. |
-- Matt Mahoney
I had written:
> such as ext3 which have a 2GiB file size limit.
This is wrong. I got this erroneous information from the first hit
that google returned on the subject:
http://www.linuxgeek.net/beginners/node101.html
The truth is that the maximum file size varies with the blocksize of
the ext3 file system and that maximum file size is given by the
following table available from:
http://www.novell.com/documentation/suse91/suselinux-adminguide/html/apas04.html
Ext2 or Ext3 (1 kB block size) 234 (16 GB) 241 (2 TB)
Ext2 or Ext3 (2 kB block size) 238 (256 GB) 243 (8 TB)
Ext2 or Ext3 (4 kB block size) 241 (2 TB) 244 (16 TB)
Ext2 or Ext3 (8 kB block size) (systems with 8 kB pages (like
Alpha)) 246 (64 TB) 245 (32 TB)
My Ubuntu system came with 4kB blocksize (according to dumpe2fs) for
max file size of 2TB.
The problem was with the wget command. Apparently the Linux distro
guys aren't as religious as they should be about updating their basic
utilities to handle larger file sizes given the new world of cheap
massively large disks.
Lots of programs probably assume files are never larger than 2 GB.
fseek() and ftell() use 32 bit signed ints. Maybe your program doesn't
call these directly, but maybe the library does. Maybe some other part
of the program does this implicitly, like counting bytes with an int or
long, or using 32 bit pointers or array indexes. Probably this stuff
doesn't get tested a lot.
It wouldn't hurt to split up the big files into pieces for the contest.
-- Matt Mahoney