wanted: 2 tasks

15 views
Skip to first unread message

YKY (Yan King Yin, 甄景贤)

unread,
Aug 10, 2012, 10:27:41 PM8/10/12
to general-in...@googlegroups.com, AGI mailing list
Hi all,

Our project needs to complete 2 more tasks, both are quite specific.

1.  Teach Genifer English in logic
==========================
Teach Genifer to parse English sentences.  Start with a toy grammar with 10-20 rules, which can be found in many A.I. textbooks.  The difficult part is that the grammar rules must be encoded in logic, such that Genifer performs parsing as if doing logic inference.

We need syntactic as well as semantic parsing.  Semantic parsing means the output is logic formulas representing the meaning of the sentences.

Our aim is just to *seed* the KB with basic English ability.  It does not matter if the rules are slightly incorrect, because they will be corrected by machine learning later.  This makes the task much easier.

The challenge is to re-engineer the parsing process entirely in logic, from text input to logic-formula output.  The logic engine is now working.

This will be a proof-of-concept that Genifer can understand English, and will help us get more funding from investors =)

Requirements:  Some familiarity with computational linguistics.  Basic logic programming such as Prolog (you'll need some time to get acquainted with our logic, but it is very easy to learn and use).

2.  Ontology and similarity learner
==========================
This is the component that organizes the logic KB.  Firstly, we can use hierarchical clustering to build a tree, based on a distance metric (we need to provide it somehow, see below).  This tree is the same as the traditional ontology based on subset ("is") and set element ("is-a") relations.

On the other hand, the similarity relation (ie, the distance metric) can be based on a logical definition known as Leibniz extensionality (which defines equality between 2 functions, but we generalize it to be fuzzy).  We map atomic concepts to the basis set of a matrix space.  Then, composite concepts and sentences can be calculated by matrix multiplication, thus providing a similarity (distance) metric.

For example, the following would be similar:
    1.  Don't judge a book by its cover
    2.  All that glitters is not gold
    3.  Clothes don't make the man

We "learn" the similarity metric by tweaking the basis set -- the algorithm has yet to be designed.

( I'm not 100% sure if this matrix technique works.  I'm still searching for counter-examples that might break it. )

The similarity matrix and the ontology can influence each other bi-directionally:
A) The distance metric can be used to cluster the KB, giving the ontology;
B) We can also import ready-made ontologies into the KB and that will impose some similarities among KB items.  Thus, we can use that to tweak the similarity basis set.

One usage of the ontology is to speed up information retrieval and thus inference, which is critical to the performance of the whole system.

Requirements:  Perhaps good at matrix analysis (I'm not very good at it and am studying it now).  The clustering algorithm is pretty standard.  Some familiarity with ontologies and logic.

--
KY
"The ultimate goal of mathematics is to eliminate any need for intelligent thought" -- Alfred North Whitehead

Matt Mahoney

unread,
Aug 12, 2012, 12:05:56 AM8/12/12
to general-in...@googlegroups.com
On Fri, Aug 10, 2012 at 10:27 PM, YKY (Yan King Yin, 甄景贤)
<generic.in...@gmail.com> wrote:
> Hi all,
>
> Our project needs to complete 2 more tasks, both are quite specific.
>
> 1. Teach Genifer English in logic
> ==========================
> Teach Genifer to parse English sentences. Start with a toy grammar with
> 10-20 rules, which can be found in many A.I. textbooks. The difficult part
> is that the grammar rules must be encoded in logic, such that Genifer
> performs parsing as if doing logic inference.

How does this solve the well known problems of natural language parsing?

> Our aim is just to *seed* the KB with basic English ability.

That was the theory behind patching Cyc. It didn't work. You need to
build the lexical model first, then semantics, then grammar, and
finally math and logic. Language is structured to be learned in that
order. You need gigabytes of text, a statistical model, and lots of
computing power.


-- Matt Mahoney, mattma...@gmail.com
Reply all
Reply to author
Forward
0 new messages