Updates of NELL's Ontology

169 views
Skip to first unread message

Xiaolin Zhang

unread,
Oct 10, 2016, 1:40:50 PM10/10/16
to NELL: Never-Ending Language Learner

Hi, Bryan and other NELL team member:


I am an independent researcher. I have the following questions regarding NELL:

 

 (1) Does the system have any built-in mechanism to update the ontology of categories and relations? If yes, please send me a link so that I can read more about it? If no, how is the ontology updated? Manually by operators? Never updated?

 

(2) Does NELL have any natural language or quasi natural language interface for anyone to ask it about its accumulated beliefs/knowledge/ideas?

 

(3) How is the integrity and consistency of the briefs and ontology, if answer to (1) is "yes", maintained and enforced, if they are enforced and maintained at all?

 

I appreciate your reply, Jerry Zhang

Bryan Kisiel

unread,
Oct 12, 2016, 2:42:07 PM10/12/16
to NELL: Never-Ending Language Learner
Hi Jerry,

I'll try to not give you answers that are too long. Please feel free to
ask for more details.

(1)

NELL first began with an ad hoc ontology of 123 categories and 55
relations that we as a group put together in an attempt to cover a wide
and general range of potentially interesting topics. Over the first
couple of years, the size of that ontolgy roughly doubled for a variety of
reasons. Sometimes, we saw it was necessary to add something to help
NELL's learning, such as when it didn't know what a "county" was, leading
to confusion between "city" and "state", or trying to get it to do a
better job with games by adding subcategories, or wanting to see how badly
NELL would do trying to learn what continent a person lived in via
category learning vs. relation learning. Other times, there would be a
particular grant or collaboration that necessitated particular additions,
or even just suggestions from this Google Group -- we have an open
invitation to the world to request that specific categories or relations
be manually added to NELL's ontology.

From the beginning, we always intended for NELL to be able to extend its
own ontology automatically as a key component of self-learning. We've had
a handful of tries at this, but it's proven to be a difficult problem, and
although maybe about 1/3 of NELL's current ontology is the product of
automated extension, none of these systems have gone into continued use.
http://rtw.ml.cmu.edu/papers/mohamed-emnlp11.pdf would be an early example
of this that made it to publication; I'm not sure if we've published any
of the other approaches.

One of our most recent attempts at automatic extension comes somewhat from
an unexpected area. Work on using verbs as a venue for ontology alignment
as in http://www.cs.cmu.edu/~dwijaya/mappingverbs.pdf has lead to the
automatic generation of several thousand new relations based on verb
clustering that we are in the process of expeimentally adding to NELL's
ontology.

(2)

We do not have a natural language interface to NELL's KB, although that is
one obvious use of a KB. However, we do offer a web service for querying
NELL's KB programmatically. You can see a small example of this at
http://rtw.ml.cmu.edu/rtw/kbbrowser/special:asknell which links to more
detailed documentation about how to use the API to build something more
sophisticated.

(3)

Maintaining consistency of NELL's learned knowledge via the constraints of
the ontology is a key principle in the design of NELL. Probably the best
comprehensive explanation of the constraints is in the first four pages of
http://www.cs.cmu.edu/~tom/pubs/NELL_aaai15.pdf and these are currently
treated as hard constraints that are enforced automatically in the final
stage of each iteration of learning where NELL consideres all the evidence
collected thusfar and makes decisions about which evidence to promote as a
believed fact to bootstrap off of for the next iteration of learning.

At present, these decisions are made in a rather simplistic way. This is
another obvious area for improvement that a number of people have looked
at, but where no lasting replacement has yet emerged.

bki...@cs.cmu.edu
> --
> You received this message because you are subscribed to the Google Groups "NELL: Never-Ending Language Learner" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cmunell+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

Xiaolin Zhang

unread,
Oct 20, 2016, 10:00:02 PM10/20/16
to NELL: Never-Ending Language Learner
Bryan, in response to your email, yes I am asking how your team ensure that the ontology itself make sense. But "make sense" in your email is way too vague a word. That's why I asked the "integrity", which minimally means "consistency" of the ontology itself. Although it might mean more than that. You said you have a lot to say about this. Feel free to say as much as you can.

Jerry

Bryan Kisiel

unread,
Oct 24, 2016, 2:20:08 PM10/24/16
to NELL: Never-Ending Language Learner, xzhan...@yahoo.com
Hi Jerry,

I think "integrity" and "consistency" are kind of vauge words as well, but
I'll try to explain further and we can go into additional detail if I'm
not answering what you're asking.

There is some minimal amount of consitency that is enforced
programmaticaly and by way of the process we use to define the ontology
itself. We start with two spreadsheet wheres each row declares a set of
required attributes for a single predicate -- one spreadsheet for
categories and one spreadsheet for relations. You can see most of these
attributes on our website. For instance, if you navigate to a particular
category, e.g.:

http://rtw.ml.cmu.edu/rtw/kbbrowser/pred:disease

Then at the top there will be a "metadata" link, e.g.:

http://rtw.ml.cmu.edu/rtw/kbbrowser/predmeta:disease

The most important field is "generalizations" which defines a the set of
parent predicates, and this defines the hierarchy of categories and
hierarchy of relations. Because of this, NELL is able to enforce the
constraint that every "disease" is also a "physiologicalcondition" and an
"abstractthing". This also enforces the constraint that no "disease" may
also be a "nondiseasecondition" because that is a sibling category.
(Although an additional field "mutexexceptions" allows us to specify
situations where a given learned instance may belong to multiple sibling
predicates.)

There are a number of other fields that constrain and define the ontology,
such as wether a relation is functional or not, or whether a category
should admit only proper nouns or not, but these are probably less
relevant to discuss here. In general these additional fields merely lead
to hard rules about what candidate beliefs NELL will disallow from
becoming promoted.

This roughly is the level of inernal consistency that humans define and
that NELL must enforce. The choice of how to define the above settings is
entirely up to humans, or at least in the case of automatically-generated
relations, at least supervised by humans.

How do we ensure that the humans have designed an ontology that makes
sense? We don't, really, and it quickly turns into a difficult
philosophical question that is not within our interests to try to answer.
For instance, for the categories, we initially thought we would divid
things at the top level into "abstract thing", "agent", "item", and
"location". That seemed reasonable at first, but it's not difficult to
start wondering whether a building should count as an item or a location,
and many other things like that. Having put some time myself into trying
to reorganize NELL's ontology to be more philosophically consistent, I can
say that I think it's a big mess and too difficult to resolve.

Interestingly, early on, we met with some of the people running Freebase
at the time, and I thought one of the interesting main points that came
out of that discussion was that in the case of Freebase, much more human
effort goes into developing a sensible ontology than into populating it.
In my opinion, this correctly reflects the relative difficulty of
establishing a consisten ontology vs. populating it. Considering this,
it's no surprise to me that it has been a much greater challenge to create
automated ways of extending the ontology than to create ways automated
ways to populate a given ontology.

All that being said, the particular choice of ontology and the quality of
the ontology is not really central to the NELL problem. One of the main
ideas behind NELL is that the ontology is entirely variable -- NELL is
meant to be a system that can learn successfully given any sufficiently
reasonable ontology. We have done some experiments using alternate
ontologies (e.g. a biomedical domain) and the results tend to confirm that
indeed NELL will work using ontoogies that are entirely different,
structurally. So for us it is sufficient at this point to have an
ontology that is "good enough" as far as philosophical consistency goes
even though it has a number of obvious flaws.

bki...@cs.cmu.edu

Olfert Rahbek

unread,
Oct 27, 2016, 9:10:14 AM10/27/16
to cmu...@googlegroups.com, xzhan...@yahoo.com
Hi Jerry and Bryan,

I am inspired to chip in, Bryan. I think your comment about the variable nature of ontologies is essential;
at the same time it has been my experience that it is possible to work towards usefulness even under the premise of continued variability.
E.g. it is not necessary to know the "final name" of the parent category of two under-categories, however it is often very usefull
to get to a deeper understanding of the border between to sister categories. And sometimes the nature of such a border becomes clearer
when the presumed sister categories are moved elsewhere, together or even independently.

Integrity and consistency discussions easily become egg-hen (which comes first) discussions.
As always, such discussions are best solved by clarifying: WHICH egg, WHICH hen?

Our key word is this: What is the PURPOSE of the ontology/topology. We should remember that the model is made to assist the work of humans.
Therefore different humans with different purposes may work in parallel with different versions of the models.

best, Olfert

//

Bedste hilsener/Kind regards,

Olfert Rahbek

WordMaps
Margrethevej 28, DK-2900 Hellerup
Att. Olfert Rahbek
word...@interaction.dk
+45 4052 3114
skype: orahbek28
VAT no. 3756 6020
Homepage: https://wordmaps.org/

-----Oprindelig meddelelse-----
Fra: cmu...@googlegroups.com [mailto:cmu...@googlegroups.com] På vegne af Bryan Kisiel
Sendt: 24. oktober 2016 20:20
Til: NELL: Never-Ending Language Learner <cmu...@googlegroups.com>
Cc: xzhan...@yahoo.com
Emne: Re: [cmunell] Re: Updates of NELL's Ontology

Bryan Kisiel

unread,
Oct 27, 2016, 10:51:22 AM10/27/16
to cmu...@googlegroups.com, xzhan...@yahoo.com
Hi Olfert,

I agree very much that it is key to hone the boundaries between sibling
categories. We have seen that this is important, for instance, to reduce
misclassification error when NELL begins to learn things that lie outside
the proper decision boundaries of existing categories. In this case, it
can begin to extend the decision boundaries of existing
sufficiently-similar categories until they overlap and then cross-pollute
each other. One can imagine that detecting this failure mode could be
used to automatically detect a case where NELL has discovered a new kind
of thing that necessitates an additional category. And it would certainly
be an interesting problem, as you suggest, to look into potentially
rearranging the ontology to better fit the learned knowledge.

This brings up an important additional component to your question about
the purpose of the ontology. While it is fundamentally true that the
purpose of the ontology is determined ultimately by the consumer of the
learned knowledge, the nature of the ontology also has an impact on the
quality of NELL's learning. The mutual-exclusion constraints among the
categories allow them to help define each-other's decision boundaries --
the main way that NELL is able to determine what is not an amphibian is to
learn what is a reptile.

One obvious instance of this is in NELL's "game" categories. Most or all
of them relentlessly learn variations on names of online poker, even
though online poker is not a valid instance of any of them, and in spite
of many human-supplied negative examples (which in many cases hold
substantively more weight than automatically-learned examples.) I
strongly suspect that best way to solve this is to introduce an additional
category dedicated to learning instnaces of online poker; then mutual
exclusion would cause NELL's proclivity for learning things into a much
more powerful pushback on the decision boundaries for the other game
categories.

bki...@cs.cmu.edu

Bryan Kisiel

unread,
Mar 13, 2017, 12:42:25 PM3/13/17
to cmu...@googlegroups.com
Hi All,

For those with an interest in this thread, It turns out that Jerry was
working toward an entry for IBM's Watson AI XPRIZE competition. An
ambitious and interesting proposal:

https://dl.dropboxusercontent.com/u/64278456/Competition%20Plan%20for%20IBM%20Watson%20Xprize%20AI%20Competition.pdf

bki...@cs.cmu.edu

Olfert Rahbek

unread,
Mar 13, 2017, 12:52:35 PM3/13/17
to cmu...@googlegroups.com

Hi Bryan,

 

Thanks a lot, very interesting read.

Here is what I got out of it:

 

 

 

Best, Olfert

 

 

Med venlig hilsen/Kind regards

 

Olfert Rahbek

+45 4052 3114

o...@wordmaps.org

skype: orahbek28

 

WordMaps

Margrethevej 28
DK-2900 Hellerup

VAT no. 3756 6020

 

www.wordmaps.org

 

cid:image003.jpg@01D2596A.0B5FAA30

 

 

 

 

-----Oprindelig meddelelse-----
Fra: cmu...@googlegroups.com [mailto:cmu...@googlegroups.com] På vegne af Bryan Kisiel

Sendt: 13. marts 2017 17:42
Til: cmu...@googlegroups.com
Emne: Re: SV: [cmunell] Re: Updates of NELL's Ontology

Xiaolin Zhang

unread,
Mar 13, 2017, 1:32:18 PM3/13/17
to NELL: Never-Ending Language Learner
Any criticisms, comments, suggestions or help on the project will be greatly appreciated.

Help needed,

Jerry 

On Monday, October 10, 2016 at 1:40:50 PM UTC-4, Xiaolin Zhang wrote:

Xiaolin Zhang

unread,
Mar 15, 2017, 7:14:33 PM3/15/17
to NELL: Never-Ending Language Learner
Reply all
Reply to author
Forward
0 new messages