IDs indicates an entity?

36 views
Skip to first unread message

Wang Sibo

unread,
Jun 21, 2016, 1:23:24 PM6/21/16
to NELL: Never-Ending Language Learner
I have a question about how an entity is identified. 

I checked the NELL dataset, and consider the following two rows:

concept:person:A hasdaught concept:female:B
concept:female:B hasfather concept:male:A

For ease of exposition, I would say A / B as the IDs. 

In this case, is the entity corresponding to  A identified by the id "A", or by "concept:person:A"? 

If the first holds, then we can say concept:person:A and concept:male:A actually represent the same entity despite that the "type" changes from person to male. 

If the second holds, then "concept:person:A" and "concept:male:A" actually are regarded as different by NELL.



I have a gut feeling that the first holds. However, after reading a paper by Prof. William W. Cohen titled "Automatic gloss finding for a knowledge base using ontological constraints", it seems to infer that the latter holds. 

They use an example of apple to illustrate.  

We may have concept:company:apple and concept:fruit:apple in NELL. This indicates that the apple alone cannot indicate an entity. 

However, I checked in NELL dataset 985 (the latest dataset), there does not exist such an example. It will be distinguished by ID: apple_company and ID: apple_fruit.

Could NELL members help to explain a little bit?  Thanks a lot.

Bryan Kisiel

unread,
Jun 21, 2016, 2:18:33 PM6/21/16
to NELL: Never-Ending Language Learner
Hi Wang,

This is a common source of confusion. It's to do with how we originally
enabled NELL to model polysemy, and I'm not sure that we got it quite
right, but that's the way things are for the time being.

Basically, one can think of NELL's knowledge base as being divided into
two halves, which we internally refer to as the "token side" and the
"concept side." Maybe better wording would be something like "surface
form" and "latent form." Token knowledge is comprised of direct
observations from text. So for instance NELL might read three phrases,
"john ate the apple", "bob eats apples", and "apple sells iphones" and
then record toke-side evidence that "apple" and "apples" belong in the
food category and also that "apple" belongs in the company category.

From this collected evidence NELL ideally would decide that there exist
two latent concepts, one that is a food that can be referred to by the
words "apple" and "apples" and one that is a company that can be referred
to by the word "apple." This abstract concept knowledge is what is
presented on the website and in the published files.

The problem here is that there's not necessarily a good way to name these
latent concepts. As far as NELL is concerned, thse may as well be named
concept:12345 and concept:67890, but a decision was made early on to try
to give them human-friendly names instead. So we have a greedy algorithm
that tries to pick a single most representative surface form and single
most representative category for each concept, and that's how we wind up
with concept:food:apples and concept:company:apple. The common pitfall is
that a human looks at this and thinks that these somewhat arbitrary names
indicate that NELL believes "apples" to be a food and "apple" to be a
company. While that's often close enough for many purposes, it's not
actually correct. I would not be surprised in the least if this is what
was done for the gloss-finding work.

I believe the naming algorithm would not allow there to be both a
concept:person:A and concept:male:A because male implies person, but if
there are two different names then there are two different concepts in the
KB.

Let me know if that doesn't quite clear things up.

bki...@cs.cmu.edu
> --
> You received this message because you are subscribed to the Google Groups "NELL: Never-Ending Language Learner" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cmunell+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

parag...@gmail.com

unread,
Sep 14, 2016, 11:28:30 AM9/14/16
to NELL: Never-Ending Language Learner
Emphasising on the last point you made: "male implies person", Does NELL have a mechanism to internally  create a new representation "concept:person:male:A" from the two concepts "concept:person:A" and "concept:male:A" in a scenario where either of these two representations were existing in the KB and the other one was found at a subsequent point in time.

I am trying to know about the self-improving mechanism of NELL.

Bryan Kisiel

unread,
Sep 14, 2016, 12:08:03 PM9/14/16
to NELL: Never-Ending Language Learner
NELL uses a more simple mechanism than that: if evidence exists that A is
person and that A is a male, then they must necessarily be the same
entity. So there could never be case where there would be both a
concept:person:A and a cocnept:male:A.

If there is only evidence that A is a person, then there will be a
only a concept:person:A.

If there is only evidence that A is a male, or if there is evidence that A
is both a person and a male, then there will be a only a concept:male:A.

bki...@cs.cmu.edu
>> bki...@cs.cmu.edu <javascript:>
>> an email to cmunell+u...@googlegroups.com <javascript:>.
Reply all
Reply to author
Forward
0 new messages