Teaching Common Sense to Computers

Push Singh

unread,

Sep 7, 2000, 12:22:12 PM9/7/00

to

Hi everyone!

We have recently started a project here at MIT to try to build a
computer with the basic intelligence of a person. To this end we have
built a web site at http://commonsense.media.mit.edu that allows people
across the web to participate in a vast experiment: to construct a giant
repository of general, "commonsense" knowledge. This includes facts
like:

- every person is younger than their mother
- one hundred dollars is a lot to pay for a sandwich
- snow is cold and is made of millions of snowflakes
- a week is longer than an minute
- computers need a source of power to operate
- most birds can fly, except for penguins and birds with broken wings

It is exactly these simple pieces of knowledge that computers lack, that
prevent them from being more friendly and human. This repository of
knowledge will enable us to create more intelligent and sociable
software, build human-like robots, and better understand the structure
our own minds.

We invite you all to come visit our project web page, and teach our
computer some of the things all us humans know about the world, but that
no computer knows!

Best regards,

Push Singh

Jorn Barger

unread,

Sep 7, 2000, 12:26:42 PM9/7/00

to

[xpost widened]

How is it any less moronic than Mindpixel?

--
http://www.robotwisdom.com/ "Relentlessly intelligent
yet playful, polymathic in scope of interests, minimalist
but user-friendly design." --Milwaukee Journal-Sentinel

Anatoly Vorobey

unread,

Sep 7, 2000, 1:28:23 PM9/7/00

to

On Thu, 07 Sep 2000 12:22:12 -0400,
Push Singh <pu...@mit.edu> wrote:

>We have recently started a project here at MIT to try to build a
>computer with the basic intelligence of a person. To this end we have
>built a web site at http://commonsense.media.mit.edu that allows people
>across the web to participate in a vast experiment: to construct a giant
>repository of general, "commonsense" knowledge. This includes facts
>like:

Should be obvious by now that these guys (the classic-AI people, the
AI lab, the Media lab etc.) are the most successful con artists of our
time. And lo, they are having another go at it. They are the Energizer
Bunnies of the CS world.

--
Anatoly Vorobey,
mel...@pobox.com http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton

Push Singh

unread,

Sep 7, 2000, 3:04:19 PM9/7/00

to

MindPixel isn't moronic, it's courageous. I disagree with how
McKinstrey is doing it (as a company, giving out "shares" that will
never have any value, instead of making it public immediately), but from
a large database of sentences one can obtain lexical selection
restrictions (what verbs take what arguments), the simplest kinds of
relations like is-a, part-of, is-for, happens-during, and others, and
also general associations between ideas. Some of that can be absorbed
from the web, but much of just isn't clearly written down anywhere,
hence the need for human teachers.

In any case, some major differences are that our database is freely
available, that we will soon go beyond sentences to acquire stories,
diagrams, emotional impressions, and other forms of mental
representation, and that we have a number of applications in store that
will use the database. The MindPixel idea of "training up a neural
network" with the database is clearly ridiculous; we are instead
developing a reasoning system in some way comparable to Cyc's to make
use of the knowledge. I also believe our interface is better -- but in
the end it doesn't matter because the database is publically available
and MindPixel can absorb it if they wish.

I would appreciate more specific feedback: please go to
http://commonsense.media.mit.edu and try it for yourself. The system
itself is still very much under development, and comments at this stage
are likely to affect the architecture of the system in deep ways.

-- Push Singh

Ron Hardin

unread,

Sep 7, 2000, 3:18:44 PM9/7/00

to

Push Singh wrote:
> I would appreciate more specific feedback: please go to
> http://commonsense.media.mit.edu and try it for yourself. The system
> itself is still very much under development, and comments at this stage
> are likely to affect the architecture of the system in deep ways.

Snork.

Try reading, yourself, _A Comprehensive Grammar of the English Language_,
Quirk, Greenbaum, Leech, and Svartvik.

You should find after the first thousand pages that none of your hunches
survive.
--
Ron Hardin
rhha...@mindspring.com

On the internet, nobody knows you're a jerk.

Jorn Barger

unread,

Sep 7, 2000, 3:23:22 PM9/7/00

to

Push Singh <pu...@mit.edu> wrote:
> we will soon go beyond sentences to acquire stories,
> diagrams, emotional impressions, and other forms of mental
> representation

I'll start holding my breath, then!

>;^/

Phyllis Chamberlain

unread,

Sep 8, 2000, 9:28:57 PM9/8/00

to

Jorn Barger wrote in message
<1egleko.129...@207-229-151-195.d.enteract.com>...

>Push Singh <pu...@mit.edu> wrote:
>> we will soon go beyond sentences to acquire stories,
>> diagrams, emotional impressions, and other forms of mental
>> representation
>
>I'll start holding my breath, then!

Yes, there's common sense, but there's also folk wisdom (If you want the cat
to stay in the new house, butter her paws) and expressions for superlatives
(It's too hot to hold the baby) and the like. Until we can explain to
ourselves how we understand the meaning and intent of these words, how can
we teach a computer??

Phyllis Chamberlain

mind...@my-deja.com

unread,

Sep 9, 2000, 5:44:35 PM9/9/00

to

In article <39B7E6B3...@mit.edu>,

Push Singh <pu...@mit.edu> wrote:
> MindPixel isn't moronic, it's courageous. I disagree with how
> McKinstrey is doing it (as a company, giving out "shares" that will
> never have any value, instead of making it public immediately),

first, no 'e' in McKinstry and second, your statement is misleading.
the database is publically available right now, just not for commercial
use. the commercial rights to the system belong to the people that
created it and rightfully so.

> but from
> a large database of sentences one can obtain lexical selection
> restrictions (what verbs take what arguments), the simplest kinds of
> relations like is-a, part-of, is-for, happens-during, and others, and
> also general associations between ideas. Some of that can be absorbed
> from the web, but much of just isn't clearly written down anywhere,
> hence the need for human teachers.

you can get much more than this. in fact you can get ALL the implicit
knowledge, provided the corpus is large enough. this is just radon's
1917 theory of image reconstruction (applied to high dimensionality)
that keeps getting rediscovered over and over again as each branch of
science learns to map their problems into tomographic terms.

> In any case, some major differences are that our database is freely
> available, that we will soon go beyond sentences to acquire stories,
> diagrams, emotional impressions, and other forms of mental
> representation, and that we have a number of applications in store
that
> will use the database.

the problem being is the net is a VERY open place. how do you keep
garbage out without any form of validation mechanism? as you recall
push, i tried to do much as you are doing now with MISTIC in 1994-97
and ended up with so much crap that even the statistics were of no
value... all you have to do is try to image slashdot without the
moderation system to see what's going to happen to your database...
it's like going into the deepest, darkest jungle with no clothing, bug
spray or immune system!

The MindPixel idea of "training up a neural
> network" with the database is clearly ridiculous;

really? thems fightin' words! you symbolic guys make me puke... no,
seriously, there is nothing ridiculous about it. have you read
elman 'finding structure in time' cognition 1990? recurrent neural
networks are VERY good at automatically extracting lexical data from
exactly the kind of data mindpixel is collecting.

in fact, with under 160,000 items i am seeing VERY interesting high-d
clusters in the data! there is already enough data for an ANN to easily
discover a great deal of lexical information. i definitely feel like
i'm looking at the first lowres hypertomographic (cool word, no?)
images of the human mind... when complete they will make a great cover
for an issue of science.

care to wager who has a system that can pass a 1,000 item MIST first?

we are instead
> developing a reasoning system in some way comparable to Cyc's to make
> use of the knowledge. I also believe our interface is better -- but
in
> the end it doesn't matter because the database is publically available
> and MindPixel can absorb it if they wish.

yep. your interface is better... you didn't have to write it all by
your lonesome... but mindpixel is an evolving community which at the
end of this weekend will have literally TENS OF THOUSANDS of users
(okay, only 2 tens... but that's still tens)

and ditto for openmind sucking in mindpixel, save the commercial caveat
above.

chris
http://www.mindpixel.com
The World's Largest AI Effort

ps. jorn... how many years have you been fighting this idea of mine
here in these news groups? now i guess the whole mit media lab is crazy
too? hmmmm... maybe you should do some careful thinking...

Sent via Deja.com http://www.deja.com/
Before you buy.

Jorn Barger

unread,

Sep 10, 2000, 3:44:17 AM9/10/00

to

Push Singh <pu...@mit.edu> wrote:
> we will soon go beyond sentences to acquire stories,
> diagrams, emotional impressions, and other forms of mental
> representation

Isn't it trivially obvious that if your system _can_ eventually work for
the-entire-universe-of-knowledge-taken-in-one-gulp, that it will work
many, many, many times sooner if you limit your domain to some discrete
subset like 'common sense about toenails'?

Chris F Clark

unread,

Sep 10, 2000, 10:37:02 AM9/10/00

to

Jorn asked:

> Isn't it trivially obvious that if your system _can_ eventually work for
> the-entire-universe-of-knowledge-taken-in-one-gulp, that it will work
> many, many, many times sooner if you limit your domain to some discrete
> subset like 'common sense about toenails'?

Only if you are a "reductionist" and think that knowledge can be
separated into discrete components that can be individually mastered.
In the extreme, a "wholist" would say that the above result is not
true, that learning any one domain is equally difficult to learning
the entire universe.

The truth is probably somewhere between those views, and where the
truth lies depends on how interconnnected knowledge is. In
particular, most of the common sense about toenails is either trivial
or connected to common sense about other "related" things, common
sense about feet, common sense about cutin (sp? the material of which
toenails are made), common sense about fungi (the major source of
toenail diseases). There are tied to larger groupings: common sense
about body parts, common sense about materials in general, common
sense about diseases. This extends outward (perhaps indefinitely) to
common sense about biology, psychology (lest one forget the important
role of the painted toenail in foot-fetishes :-)), philosophy.

Most common sense axioms in any area can be derived from the common
sense axioms of related areas with the right theorems. Of course, we
humans have spent eons discovering the right theorems and still cannot
recreate the universe from "first principles".

Greeks who were much smarter than you and I managed to give us
centuries of blood-letting with leeches by misapplying a simple first
principle (the four humors). Einstein and Hawking couldn't even agree
on whether God likes to gamble with dice.

Now, since the question was taken out of context, it might either
strengthen Jorn's argument or weaken it against the proposed
be-all-end-all system.

However, years of prior examples have shown that radical proposals can
yield amazing progress and insights. Eliza could do wonders in
simulating conversations from an extremely trivial (once discovered)
concept (keyword scanning). At the same time, each radical proposal
never succeeds in being the answer to "life, the universe, and
everything". To my knowledge, no one yet has made an Eliza derivative
that has successfully passed a Turing Test against any well-informed
judge, despite years of improvements.

It is like the Goedel incompleteness theorem. Any system of knowledge
that is powerful enough to describe everything must be inconsistent
and any consistent system must be incomplete and leave facts unknown.
Thus, any system can make remarkable strides in covering a new domain
of knowledge, but once the weakness of the system is found, there are
infinite numbers of trivial examples that will escape its grasp. The
speed at which new systems are being discovered is increasing and so
is the speed in which the holes are found.

The point is that any new system may provide powerful new insights.
However, the claims of its evangelists will surely be overstated.

-Chris Clark

Jorn Barger

unread,

Sep 10, 2000, 11:10:18 AM9/10/00

to

Chris F Clark <c...@world.std.com> wrote:
> Jorn asked:
> > Isn't it trivially obvious that if your system _can_ eventually work for
> > the-entire-universe-of-knowledge-taken-in-one-gulp, that it will work
> > many, many, many times sooner if you limit your domain to some discrete
> > subset like 'common sense about toenails'?
>
> Only if you are a "reductionist" and think that knowledge can be
> separated into discrete components that can be individually mastered.
> In the extreme, a "wholist" would say that the above result is not
> true, that learning any one domain is equally difficult to learning
> the entire universe.

So the [neural net or whatever] will know nothing... and then in the
next instant it will know everything?

> ...most of the common sense about toenails is either trivial
> or connected to common sense about other "related" things,...
> These are tied to larger groupings: common sense

> about body parts, common sense about materials in general, common
> sense about diseases. This extends outward (perhaps indefinitely) to

> common sense about biology, psychology... philosophy.

Yes, good, but I don't believe this weakens my case.

> Most common sense axioms in any area can be derived from the common
> sense axioms of related areas with the right theorems. Of course, we
> humans have spent eons discovering the right theorems and still cannot
> recreate the universe from "first principles".

But the gigantic 'gap' is the psychology-part.

> However, years of prior examples have shown that radical proposals can
> yield amazing progress and insights. Eliza could do wonders in
> simulating conversations from an extremely trivial (once discovered)
> concept (keyword scanning). At the same time, each radical proposal
> never succeeds in being the answer to "life, the universe, and
> everything". To my knowledge, no one yet has made an Eliza derivative
> that has successfully passed a Turing Test against any well-informed
> judge, despite years of improvements.

Last I checked, the improvements were insignificant when the
topic-domain was unlimited. Thom Whalen's could work wonders when the
domain was limited, though-- and that seems to me to support my
'start-with-just-toes' argument.

> It is like the Goedel incompleteness theorem. Any system of knowledge
> that is powerful enough to describe everything must be inconsistent
> and any consistent system must be incomplete and leave facts unknown.
> Thus, any system can make remarkable strides in covering a new domain
> of knowledge, but once the weakness of the system is found, there are
> infinite numbers of trivial examples that will escape its grasp. The
> speed at which new systems are being discovered is increasing and so
> is the speed in which the holes are found.

This sounds suspiciously like 'it will never be perfect so why bother?'

Answering 90% (or even 20%) of questions about toes is a much more
reasonable starting-goal than everything about everything, or even
everything about [toes].

Chris F Clark

unread,

Sep 10, 2000, 12:11:22 PM9/10/00

to

Jorn wrote:
> Last I checked, the improvements were insignificant when the
> topic-domain was unlimited. Thom Whalen's could work wonders when the
> domain was limited, though-- and that seems to me to support my
> 'start-with-just-toes' argument.

. . .

> Answering 90% (or even 20%) of questions about toes is a much more
> reasonable starting-goal than everything about everything, or even
> everything about [toes].

Perhaps.

However, the point of the posting is that you may not be able to get
to everything about everything using the "start with the expert in one
domain and work outward" approach. This is the "reductionist" flaw
(to think that one can).

The fact that one can build a system that works wonders in a limited
domain, but does not scale to unlimited domains exactly supports what
I was saying. You can't get there by building up.

It is also equally true that one cannot get there "by miracle" by
inventing some new scheme that resolves all the problems (although
each proponent of a new scheme will think that their scheme has done
so).

> This sounds suspiciously like 'it will never be perfect so why bother?'

That was not the intent. However, I can see how one could draw that
conclussion.

Winning the game is impossible (there will always be things we cannot
understand and formalize). However, the point is not to win, but to
continue playing. The only way to lose is to stop trying. Living is
to attempt the impossible knowing that one ultimately will fail. It's
like the proverb where one doesn't have to outrun the bear, just the
other hiker.

I did not mean to disparage your point. I just saw a reductionistic
argument that could not go completely unchallenged.

Advocates always think their solution is ultimate and they are always
wrong. However, it is not attempting to solve too large a problem
that makes them wrong. It is just that they are doomed to be wrong.

I was merely trying to point out the flaw in using reductionism to
point out that they are wrong. Just because the "devil is in the
details", does not mean that the way to solve the problem is by
starting with the details.

Some partial solutions do not work well when applied to the details.
They only work as broad brush strokes. Attempting to use them in
domain limited areas actually makes them weaker.

-Chris Clark

Jorn Barger

unread,

Sep 10, 2000, 12:22:34 PM9/10/00

to

Chris F Clark <c...@world.std.com> wrote:

> The fact that one can build a system that works wonders in a limited
> domain, but does not scale to unlimited domains exactly supports what
> I was saying. You can't get there by building up.

If you're referring to Whalen's, it's not entirely clear what 'scaling
up' would mean. His approach was to compile a 'faq' on a given topic,
and then have enough ability at parsing the questions to make a
reasonable guess which answer corresponded to each new question.

You can-- in theory, at least-- scale this up by compiling enough faqs
on different topics, and I think that that approach would be much more
plausible (ie-- likely to lead to useful knowledge) than GIGO-databases
of user-submitted 'facts'.

Ron Hardin

unread,

Sep 10, 2000, 12:25:17 PM9/10/00

to

Chris F Clark wrote:
> Winning the game is impossible (there will always be things we cannot
> understand and formalize). However, the point is not to win, but to
> continue playing. The only way to lose is to stop trying.

You can take up another interest.

Try philosophy along the lines of Derrida (Levinas, Jabes), or
Cavell (Wittgenstein, Cavell's Austin [not Searle's]) and you
might find you're ahead in a way.

Push Singh

unread,

Sep 10, 2000, 5:30:26 PM9/10/00

to

Jorn Barger wrote:

> Isn't it trivially obvious that if your system _can_ eventually work for
> the-entire-universe-of-knowledge-taken-in-one-gulp, that it will work
> many, many, many times sooner if you limit your domain to some discrete
> subset like 'common sense about toenails'?

A lot of people think that (including me, sometimes), but what worries
me is how interconnected commonsense is. No matter which pieces you try
to cut out (like toenails), you soon need to start telling the system
about so many other kinds of things -- physical objects, bodies,
biology, space, time, events, materials, etc. -- that you're back to
describing the whole rest of the universe.

I'm not advocating against zooming in on a domain, and trying to study
"common sense" within that domain, but I guess I see that as what AI has
tried to do until now, without huge success. It has repeatedly failed
to achieve human-level competence even within a single realm.

Now there could be many reasons for this:

- we might be lacking appropriate "cognitive architectures", e.g. to
support both multiple kinds of algorithms (planning, explaining, analogy
making), and also to support multiple representations (sentences,
stories, diagrams, movies, etc.)

- maybe there is a subset of commonsense (perhaps just a few hundred
thousand items) which, once you have them, you can start modeling
everything more easily (beginning with something like the Cyc upper
level ontology)

- maybe we are missing some essential new organizational idea, like what
societies-of-agents gave us over and above central reasoning systems.

Who knows for sure. I do have another project, to make a simulated
robot that learns to walk and move, which does zoom in on the
spatial/physical/body domain, so I guess I'm trying both approaches.
Within Open Mind there is some limited support for zooming in on topics,
and we may choose to go that path (e.g. commonsense about sports, or
computers), but haven't had to make that particular decision yet.

Push

Jorn Barger

unread,

Sep 10, 2000, 5:48:16 PM9/10/00

to

Push Singh <pu...@mit.edu> wrote:
> - maybe there is a subset of commonsense (perhaps just a few hundred
> thousand items) which, once you have them, you can start modeling
> everything more easily (beginning with something like the Cyc upper
> level ontology)

I vote for this one, but the subset has to be psychological-- the
universal human drama, with its archetypal forms for emotion and
motivation. (cf Finnegans Wake).

Ron Hardin

unread,

Sep 10, 2000, 6:17:48 PM9/10/00

to

Push Singh wrote:
> - maybe we are missing some essential new organizational idea, like what
> societies-of-agents gave us over and above central reasoning systems.

Maybe we are not taking into account the structure of the wish for
artificial intelligence, producing an unwitting total system that operates
roughly like Charlie Chaplin on a skating ring.

The audience is not taken into account either.

ch...@mindpixel.com

unread,

Sep 12, 2000, 10:34:58 PM9/12/00

to

> Isn't it trivially obvious that if your system _can_ eventually work
for
> the-entire-universe-of-knowledge-taken-in-one-gulp, that it will work
> many, many, many times sooner if you limit your domain to some
discrete
> subset like 'common sense about toenails'?

my god jorn, how many times are you going to ask the same question? over
how many years? though i must say you sure are a hell of a lot more
polite about it now than you where when i launched MISTIC 6 years ago...

anyway, you can't talk about toenails without talking about the body,
the mind, the emotional state and the world of the person who's toenails
you are referring to. commonsense is holographic and commonsense itself
should tell you that.

for someone who spends so much time with his head in fw, you should know
better. it's all about connections. holographic connections.

Join The World's Largest AI Effort
http://www.mindpixel.com