On Wed, May 2, 2012 at 12:23 PM, YKY (Yan King Yin, 甄景贤)
<
generic.in...@gmail.com> wrote:
> On Wed, May 2, 2012 at 4:09 AM, Matt Mahoney <
mattma...@gmail.com>
> wrote:
>>
>> How do you plan to test whether the program can "learn English"?
>
> Demonstrate that it can parse simple sentences and be able to answer simple
> questions based on facts parsed from English and stored as logic.
A lot of questions can be answered without parsing. For example, if
the database contains "John loves Mary", then most questions like "Who
loves Mary?" could be answered just by matching terms without
considering word order. This is already a solved problem (Google), so
I guess not very interesting.
Parsing becomes more important when making inferences involving
relations between objects, such as in space, time, causality, or some
other attribute. For example, "Bob lives in New York", "New York is in
the USA", therefore Bob lives in the USA. An AI should be able to do
this, just because humans can. But I wonder if there is any value in
solving much deeper inference problems than a human could solve. The
only place where I can imagine this being true is when the component
facts are extremely reliable, such as in mathematics or programming.
But then we would be better off developing a specialized tool with a
restricted grammar like a programming language or WolframAlpha than a
general purpose natural language AI. The set of highly precise facts
is very small compared to the whole set of human knowledge.
There has been a lot of linguistic work in natural language parsing,
mostly supervised approaches to learning annotated text. Most of this
work I would consider a failure, both because the error rate is high
and because a parse tree is not the final result that we want. Solving
the parsing problem requires building a semantic model first. You have
to know what a sentence means before you can parse it. Any ideas on
how to do this?
Also, any ideas on what to use as a test database? I was thinking
Wikipedia. This was 4 GB of text in 2006, probably much larger now.
-- Matt Mahoney,
mattma...@gmail.com