Hi Jerry,
I think "integrity" and "consistency" are kind of vauge words as well, but
I'll try to explain further and we can go into additional detail if I'm
not answering what you're asking.
There is some minimal amount of consitency that is enforced
programmaticaly and by way of the process we use to define the ontology
itself. We start with two spreadsheet wheres each row declares a set of
required attributes for a single predicate -- one spreadsheet for
categories and one spreadsheet for relations. You can see most of these
attributes on our website. For instance, if you navigate to a particular
category, e.g.:
http://rtw.ml.cmu.edu/rtw/kbbrowser/pred:disease
Then at the top there will be a "metadata" link, e.g.:
http://rtw.ml.cmu.edu/rtw/kbbrowser/predmeta:disease
The most important field is "generalizations" which defines a the set of
parent predicates, and this defines the hierarchy of categories and
hierarchy of relations. Because of this, NELL is able to enforce the
constraint that every "disease" is also a "physiologicalcondition" and an
"abstractthing". This also enforces the constraint that no "disease" may
also be a "nondiseasecondition" because that is a sibling category.
(Although an additional field "mutexexceptions" allows us to specify
situations where a given learned instance may belong to multiple sibling
predicates.)
There are a number of other fields that constrain and define the ontology,
such as wether a relation is functional or not, or whether a category
should admit only proper nouns or not, but these are probably less
relevant to discuss here. In general these additional fields merely lead
to hard rules about what candidate beliefs NELL will disallow from
becoming promoted.
This roughly is the level of inernal consistency that humans define and
that NELL must enforce. The choice of how to define the above settings is
entirely up to humans, or at least in the case of automatically-generated
relations, at least supervised by humans.
How do we ensure that the humans have designed an ontology that makes
sense? We don't, really, and it quickly turns into a difficult
philosophical question that is not within our interests to try to answer.
For instance, for the categories, we initially thought we would divid
things at the top level into "abstract thing", "agent", "item", and
"location". That seemed reasonable at first, but it's not difficult to
start wondering whether a building should count as an item or a location,
and many other things like that. Having put some time myself into trying
to reorganize NELL's ontology to be more philosophically consistent, I can
say that I think it's a big mess and too difficult to resolve.
Interestingly, early on, we met with some of the people running Freebase
at the time, and I thought one of the interesting main points that came
out of that discussion was that in the case of Freebase, much more human
effort goes into developing a sensible ontology than into populating it.
In my opinion, this correctly reflects the relative difficulty of
establishing a consisten ontology vs. populating it. Considering this,
it's no surprise to me that it has been a much greater challenge to create
automated ways of extending the ontology than to create ways automated
ways to populate a given ontology.
All that being said, the particular choice of ontology and the quality of
the ontology is not really central to the NELL problem. One of the main
ideas behind NELL is that the ontology is entirely variable -- NELL is
meant to be a system that can learn successfully given any sufficiently
reasonable ontology. We have done some experiments using alternate
ontologies (e.g. a biomedical domain) and the results tend to confirm that
indeed NELL will work using ontoogies that are entirely different,
structurally. So for us it is sufficient at this point to have an
ontology that is "good enough" as far as philosophical consistency goes
even though it has a number of obvious flaws.
bki...@cs.cmu.edu