On Thu, May 3, 2012 at 1:48 AM, YKY (Yan King Yin, 甄景贤)
<
generic.in...@gmail.com> wrote:
> For example, the KB would have the following facts:
>
> A tokenized sentence represented as a list in Clojure:
> 1. input = ["john", "loves", "mary"]
>
> 2. "love" is a verb
> 3. "john" is a noun
> 4. "mary" is a noun
>
> 5. A verb followed by a noun is a verb phrase.
> 6. A subject can be a noun.
> 7. A subject followed by a noun phrase forms a sentence.
>
> 8. The meaning of a noun is <noun>.
> 9. The meaning of a noun phrase is <verb ◦ noun>
> 10. In general, the meaning of grammatical constituents is their constituent
> meanings composed together.
>
> Desired conclusion:
> "john loves mary" is a sentence, and its meaning is <john ◦ loves ◦
> mary>.
>
> There are of course some missing pieces of knowledge. What I want to do is
> to fill in the knowledge so that the above translation works (via forward
> chaining).
>
> Maybe this can be done using ~50-100 formulas...
There are millions of formulas. Allow me to demonstrate. Complete the following:
"p_____ and salt"
"salt and p_____"
The first one is hard. The second one you probably guessed "pepper".
That is because there is a grammar rule in English that makes "salt
and pepper" more correct than "pepper and salt". Not that "pepper and
salt" is wrong, just less likely. If you needed to solve this problem
in a speech recognition system, where the _____ was inaudible, you
would need this knowledge. Likewise if you were reading text and some
of the text was blurry, or you were translating from another language
and the word for "pepper" could be translated in more than one way, or
you were correcting a document and "pepper" was misspelled. Humans can
solve all of these problems, so our AI should also have this
capability.
You could code this rule by hand, but I don't think you would want to
do this millions of times. And you don't have to. Counting Google
hits:
"salt and pepper", 42,300,000
"pepper and salt", 4,830,000.
So I think we need to think about:
- How do we represent these kind of rules?
- How do we induce this knowledge from raw text?
- What do we use for training data?
- How much computing power do we need?
- How do we measure success?
-- Matt Mahoney,
mattma...@gmail.com