What are some of the potential practical problems with running the following proposed prize competition?
PS: Many people disagree with the posited definition of "artificial intelligence quality", but it seems obvious to me. I'm not sure why its not obvious to everyone but such is the case. Any suggestions on how to get this across more effectively? Or are we dealing with different cognitive types -- some will simply 'get it' and others won't?
The most crucial technology prize of all. By Jim Bowery Copyright May 2005 The author grants the right to copy and distribute without modification.
Since all technology prize awards are geared toward solving crucial problems, the most crucial technology prize award of them all would be one that solves the rest of them:
The C-Prize -- A prize that solves the artificial intelligence problem.
The C-Prize award criterion is as follows:
Let anyone submit a program that produces, with no inputs, one of the major natural language corpora as output.
S = size of uncompressed corpus P = size of program outputting the uncompressed corpus R = S/P (the compression ratio).
Award monies in a manner similar to the M-Prize:
Previous record ratio: R0 New record ratio: R1=R0+X Fund contains: $Z at noon GMT on day of new record Winner receives: $Z * (X/(R0+X))
Compression program and decompression program are made open source.
Explanation
A very severe meta-problem with artificial intelligence is the question of how one can define the quality of an artificial intelligence.
Fortunately there is an objective technique for ranking the quality of artificial intelligence:
Kolmogorov Complexity
Kolmogorov Complexity is a mathematically precise formulation of Ockham's Razor, which basically just says "Don't over-simplify or over-complicate things." More formally, the Kolmogorov Complexity of a given bit string is the minimum size of a Turing machine program required to output, with no inputs, the given bit string.
Any set of programs which purport to be the standards of artificial intelligence can be compared by simply comparing their Artificial Intelligence Quality. Their AIQs can be precisely measured as follows:
Take an arbitrarily large corpus of writings sampled from the world wide web. This corpus will establish the equivalent of an IQ test. Give the AIs the task of compressing this corpus into the smallest representation. This representation must be a program that, taking no outside inputs, produces the exact sample it compressed. The AIQ of an AI is simply the ratio of the size of the uncompressed writings to the size of the program that, when executed, produces the uncompressed writings.
In other words, the AIQ is the compression ratio achieved by the AI on the AIQ test.
The reason this works as an AI quality test is that compression requires predictive modeling. If you can predict what someone is going to say, you have modeled their mental processes and by inference have a superset of their mental faculties.
Mechanics
The C-Prize is to be modeled after the Methusela Mouse Prize or M-Prize where people make pledges of money to the prize fund. If you would like to help with the set up and/or administration of this prize award similar to the M-Prize let me know by email. 1
Prize legislation I drafted for fusion energy technology back in 1992 didn't get funded but it did pick up the support of one of the founders of the Tokamak program:
Prize awards for technology milestones are gaining credibility these days and if people "get it" the C-Prize could take off in an even bigger way than the rest of them.
jim_bow...@hotmail.com wrote: > What are some of the potential practical problems with running the > following proposed prize competition?
> PS: Many people disagree with the posited definition of "artificial > intelligence quality", but it seems obvious to me. I'm not sure why > its not obvious to everyone but such is the case. Any suggestions on > how to get this across more effectively? Or are we dealing with > different cognitive types -- some will simply 'get it' and others > won't?
Very interesting. There is a similar contest that has been going on for 9 years. It might help you think about the details. http://mailcom.com/challenge/
Have you picked out a text corpus? I think this is a good approach to the AI problem, very objective, and easier to administer than the Loebner prize.
Last year I once proposed almost exactly the same thing to NSF with a $50K prize, but they would not fund it.
The connection between text compression and AI is not obvious, but there is a simple proof that it solves the Turing test. The compression problem is to code text strings x with length log2 1/P(x) bits. But if you know the probability distribution P(x) (specifically the distribution over chat sessions) then you could build a machine that could answer questions such that it is indistinguishable from human. Given a question q, you generate response r with distribution P(r|q) = P(qr)/P(q). You can generalize q to any sequence of questions and responses up to the last question.
I am glossing over some assumptions that need to be more clearly stated. For example, I am assuming that the interrogator, machine, and the machine's human opponent know the same distribution P(x), and that people use the same distribution for writing text as they do for recognition. I think these are reasonable approximations because the machine only has to be close enough to fool the interrogator, not an exact match to the "average" human.
Neither AI nor text compression is solved. In 1950 Shannon estimated the entropy of written English to be about 1 bit per character (based on how well humans can guess successive letters), but the best text compressors achieve about 1.2-1.3 bpc. These are language models for speech recognition that use a combination of word trigram models and semantic models based on word proximity. I did some of my dissertation work in this area before I switched to a less interesting topic where I could get funding. http://cs.fit.edu/~mmahoney/dissertation/
Right. I thought I'd ask here before bothering Leonid Broukhis since people often have ideas about how to exploit weaknesses in contest rules that they are willing to share. I'm sure his experience will turn out to be invaluable.
I ran across your poster session paper "Text Compression as a Test for Artificial Intelligence" which is one of the things that inspired me to look into the prize award approach. My original motivation is my rather strong sense that Ockham's Razor is more than a nice rule of thumb for scientists.
Your proof is elegant, simple and convincing however people still don't "get it". When people start arguing about intelligence tests they can get pretty squirrley and I think the Turing test is a poison that has impeded thought about artificial intelligence more than it has advanced it. Your paper is crucial to overcoming that handicap.
However, since the 1960s there have been other arguments advanced, primarily by the algorithmic complexity guys, that should have taken root in the AI community but didn't. I don't understand exactly why that intellectual failure occurred in the AI community but it most obviously did.
As for the text corpus I would choose:
That was one of the main questions I wanted to discuss because it is so crucial to the success of such a prize. Right now if I had to choose a single text corpus I'd pick the 1 terabyte corpus that Peter Turney used in his recent accomplishment of human level performance on the SAT verbal analogies test -- a feat that is particularly interesting given the exceptionally high correlation between that test and general intelligence aka the 'g' factor.
In this regard, you may be interested in my recently published article:
jim_bow...@hotmail.com wrote: > As for the text corpus I would choose:
> That was one of the main questions I wanted to discuss because it is so > crucial to the success of such a prize. Right now if I had to choose a > single text corpus I'd pick the 1 terabyte corpus that Peter Turney > used in his recent accomplishment of human level performance on the SAT > verbal analogies test -- a feat that is particularly interesting given > the exceptionally high correlation between that test and general > intelligence aka the 'g' factor.
> In this regard, you may be interested in my recently published article:
> I'm thinking of writing another article on the C-Prize.
It would be hard to conduct a test with a 1 TB corpus. Few people have enough computing power to do this, and I am sure you don't want to host the corpus on your website.
I would suggest something on the order of 1 GB. Turing estimated in 1950 that the AI problem would take 10^9 bits of memory. He did not say how he came up with this number. But consider a human exposed to 150 words per minute, 12 hours a day for 20 years. This is about 4 GB of text, or a 1 GB zip file. So a machine learning algorithm should have a similar constraint.
Even this may be too high. The average human vocabulary is 30K common words plus an equal number of proper nouns. To learn a word you have to be exposed to it about 20 times. By Zipf's law, the n'th most frequent word in English is 0.1/n. This means that you need about 12 million words or 50 MB.
Semantic models using LSA have used about 250 MB, like the WSJ corpus. However, with a larger corpus there is a tendency to get lazy. For example, with LSA you use the fuzzy equivalence A ~ B, meaning "A has similar meaning to B" or "A and B appear in the same proximity".
A ~ A (reflexive, used by most data compressors) A ~ B implies B ~ A (symmetric, used in distant bigram models) A ~ B and B ~ C implies A ~ C (transitive, used by LSA)
LSA uses the transitive property to predict that C follows A even if those two words were never seen together before. However with a larger corpus you can collect the statistics directly rather than use LSA to infer them, so you tend take the easy route. In the above paper on word analogy solving, I see a lot of steps were data was discarded, but for a 1 TB corpus it didn't matter.
You might be able to construct a high quality (diverse) corpus by finding the minimum set of documents that contain all the words in a dictionary. You might also consider a corpus of several languages to test whether a model can learn a generic language without hardcoded rules for English.
The current english language edition, gzipped, is about 1GB, but as you see, there are many Wikipedias in different languages. I doubt they total over 4GB gzipped.
Compressing multiple languages would have a very good side effect: Machine translation technology could be dramatically enhanced and properly constructed compressors would discover things about the various cultures that might not otherwise be obvious.
Part of my preference for picking a big corpus to begin with is so that if the contest goes big time -- which it would if proper funding levels were applied to it -- some sort of rules might be constructed to allow some divisions of the contest for large computer systems to crunch away doing more brute-force statistics such as Turney used, while other divisions could do a fairly sampled subset of the total corpus. But this is something we can deal with later when the competition (and prize) gets much bigger.
PS: Another reason I'd like to see huge corpora ultimately is that I'd really like to see this system beat humans by a substantial amount: We're so stupid we desperately need all the help we can get.
> The current english language edition, gzipped, is about 1GB, but as you > see, there are many Wikipedias in different languages. I doubt they > total over 4GB gzipped.
Nice choice. Lots of contributors, lots of topics, high quality.
> PS: Another reason I'd like to see huge corpora ultimately is that I'd > really like to see this system beat humans by a substantial amount: > We're so stupid we desperately need all the help we can get.
When computers are smarter than humans, how will we know?
Hutter's AIXI, http://www.idsia.ch/~marcus/ai/paixi.htm makes another argument for the connection between compression and AI that is more general than the Turing test. He proves that the optimal behavior of an agent (an interactive system that receives a reward signal from an unknown environment) is to guess that the environement is most likely computed by the shortest possible program that is consistent with the behavior observed so far. In other words, the most likely outcome for any experiment is the one with the simplest explanation, where "simplest" means the smallest program that could model what you currently know about the universe.
He gives a formal proof, but it basically says that the only possible distribution of the infinite set of programs (or strings) with nonzero probability is one which favors shorter programs over longer ones. Given any string of length n with probability p > 0, there are an infinite set of strings longer than n, but only a finite number of these can have probability higher than p.
Matt Mahoney writes: > He proves... the most likely outcome for > any experiment is the one with the simplest explanation, where > "simplest" means the smallest program that could model what you > currently know about the universe.
Wow.
If what you say is easily verifiable by consulting mathematicians then you've just dramatically strengthened the case for the C-Prize. A philanthropist can have a trusted consultant verify Hutter's proof and the C-Prize's salience to it. The only remaining question is whether the C-Prize is properly administered -- something the philanthropist can verify with a combination of business and engineering consultants.
I had run across Hutter's AIXI paper before, when searching for Solomonoff's algorithmic probability theory, but I read just the abstract and didn't have the background for it so I sent it off to a friend who is a doctoral candidate in statistical mechanics to review. Being a doctoral candidate he doesn't have much time left to review papers ourside his area so I never got the low-down you just gave. Thanks for the clear synopsis.
A proof like this should have been available to researchers from the 1960s work on minimum length description (AP, KC, etc.) and decision theory but apparently no one bothered to put 2 and 2 together for what -- 40 years now?
This is going to be a major question for science historians:
Why did the AI community fail to formalize this proof for so long?
Given Moore's Law originated at about that time, 40 years ago, there's a good argument that this ranks as one of the greatest losses of potential to befall humanity.
Matt Mahoney writes: > Nice choice. Lots of contributors, lots of topics, high quality.
Here are the drawbacks I see to using the "cur" download of Wikipedia:
1) The purported "neutral point of view" is subject to systemic bias. The process supported by Wikipedia biases the content toward people who are able and willing to contribute under those circumstances. It's not clear how to go about identifying this bias let alone its impact.
2) If a snapshot of the "cur" downloads is delayed, it will be subject to gaming by people who submit changes to articles that have a Kolmogorov complexity that is known to be low just to the gamers. Wikipedia keeps prior archives of the 'cur' downloads and it is unlikely anyone has gamed the system so far but now that we've broached the subject there is the potential that future versions of the 'cur' downloads will have been so-gamed.
3) The entire history of edits is about a factor of 10 larger than the current version. This would be a superior corpus for the purpose of ferreting out how various points of view bias content -- and be very relevant to epistemology, social and political sciences -- thereby creating a superior AI capable of considering the source of various assertions. If this really is beyond the capacity of computers that would be available to viable contestants then it might be necessary to defer use of that larger corpus until more capacity or more funding for the prize is available.
> When computers are smarter than humans, how will we know?
Given Hutter's proof my guess is we'll know when they are capable of more compression than humans given a comparable base of knowledge. There is always the question of how do we know how much compression humans are capable of. I don't have that question answered.