Hi Wang,
This is a common source of confusion. It's to do with how we originally
enabled NELL to model polysemy, and I'm not sure that we got it quite
right, but that's the way things are for the time being.
Basically, one can think of NELL's knowledge base as being divided into
two halves, which we internally refer to as the "token side" and the
"concept side." Maybe better wording would be something like "surface
form" and "latent form." Token knowledge is comprised of direct
observations from text. So for instance NELL might read three phrases,
"john ate the apple", "bob eats apples", and "apple sells iphones" and
then record toke-side evidence that "apple" and "apples" belong in the
food category and also that "apple" belongs in the company category.
From this collected evidence NELL ideally would decide that there exist
two latent concepts, one that is a food that can be referred to by the
words "apple" and "apples" and one that is a company that can be referred
to by the word "apple." This abstract concept knowledge is what is
presented on the website and in the published files.
The problem here is that there's not necessarily a good way to name these
latent concepts. As far as NELL is concerned, thse may as well be named
concept:12345 and concept:67890, but a decision was made early on to try
to give them human-friendly names instead. So we have a greedy algorithm
that tries to pick a single most representative surface form and single
most representative category for each concept, and that's how we wind up
with concept:food:apples and concept:company:apple. The common pitfall is
that a human looks at this and thinks that these somewhat arbitrary names
indicate that NELL believes "apples" to be a food and "apple" to be a
company. While that's often close enough for many purposes, it's not
actually correct. I would not be surprised in the least if this is what
was done for the gloss-finding work.
I believe the naming algorithm would not allow there to be both a
concept:person:A and concept:male:A because male implies person, but if
there are two different names then there are two different concepts in the
KB.
Let me know if that doesn't quite clear things up.
bki...@cs.cmu.edu
> --
> You received this message because you are subscribed to the Google Groups "NELL: Never-Ending Language Learner" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
cmunell+u...@googlegroups.com.
> For more options, visit
https://groups.google.com/d/optout.
>