Different concepts for singular and plural forms

77 views
Skip to first unread message

Manny

unread,
Aug 30, 2011, 8:18:25 AM8/30/11
to cmu...@googlegroups.com
Hi,

I was wondering if it is intentional that NELL learns singular and plural forms of the same lemma as different concepts.
"mammal:hamster" and "mammal:hamsters" is a good example for this phenomenon. And there is no direct relation between both forms either that would allow to conclude the fact that one is the plural/singular form of the other. Personally, I would prefer to merge both concepts in cases like these.


Manuel Warum

Bryan Kisiel

unread,
Aug 30, 2011, 4:38:02 PM8/30/11
to cmu...@googlegroups.com
Hi Manuel,

I don't know that we've really thought too much about singular vs. plural.
Even among the seed examples that get fed into NELL to start its learning,
there is no uniformity of pluralization.

There's a semi-related issue that NELL tends to not distinguish between
classes of things, e.g. "computer mouse", and instances, e.g., "Bryan's
computer mouse on his desk at work". And there are gradations here, like
"Dell computer mouse model 04P608", which could be a class or an instance
depending on what is being said about it. Not having this distinction can
make things difficult. Maybe all of these things are legitimate members
of the consumerElectronicItem category, and we might know that
consumerElectronicItems are physical objects and that physical objects
have locations, but actually only one of those three mice corresponds to a
single physical thing. I think we have more thinking to do on that one as
well.

I'm not sure if this class vs. instance issue would actually come up in
practice if we were to start merging singular and plural as a rule. I'd
want to keep an eye on that. But either way, we certainly could benefit
from linking singular and plural if we had a way to identify them, and
then it would be possible for NELL to learn to transfer knowledge between
them. There's probably a bunch of similar stuff we could be doing as
well, like maybe with tenses, identifying common prefixes, and maybe even
trying to predict etymology -- I know I do that when I'm trying to read a
word I don't know.

We'd need some real person-hours to go that far, and we might also need to
get a little more clever in distinguishing learning about words vs.
learning about what they refer to, but in the short run I think I'll add
an isMultipleOf relation and maybe NELL can learn to link easy cases like
hamster / hamsters.

bki...@cs.cmu.edu

Manny

unread,
Aug 31, 2011, 4:36:33 AM8/31/11
to cmu...@googlegroups.com
Hi Bryan,

You are of course right, classes versus instances is always a tricky thing. But I believe, the slightly simpler case of NELL learning a singular and a plural form of the same class (as I think is the case with hamster / hamsters) is a bit easier to tackle. Then again, it might require a different kind of learner. I do wonder if it would be possible to tap Oxford Dictionaries for that. Granted, there should also be a few restrictions as to when singular and plural classes become related (or merged), or Apple (Computers) could end up getting linked to apples.

By the way, we are in the process of trying to employ NELL to measure semantic relatedness between terms and use it as one of the knowledge bases our classifiers use to identify potential co-references within texts. The reason I brought this up is the following: I did notice that our path length algorithms could not link "hamster" and "rodent", but were quickly able to find a short path between "hamsters" and "rodents". But there is no need to "fix" this within NELL if you are running out of resources; we'll try to extend our KB import process with a reduction step that attempts to find such cases and merge concepts as necessary. If you want to, I'll report back and tell you how that went.

Manuel

Bryan Kisiel

unread,
Sep 1, 2011, 12:38:44 PM9/1/11
to cmu...@googlegroups.com
Hi Manny,

Reading the dictionary is a neat idea. NELL doesn't have much ability to
disentangle individual sentences at this point, although we could identify
bodies of text as being authoritative and crank up the learning gain, so
to speak, on them. I think that sort of thing is definitely in the cards,
since we want to make NELL more self-directed down the road. Building a
specialized extractor is of course an option (and maybe good one in this
case) but we're particularly interested in techniques that depend on what
NELL has learned so far so that those techniques will perform better as
NELL learns more.

More generally, I think you've raised a good point: NELL tries to learn
and reason about the things that words and language represent, but not
about the words themselves -- or at least not in the same way. And the
plural/singluar thing makes both a good test case for design and a
profitable way to learn more by association. One more thing that's going
to take some time to address...

Do keep us posted on how things work for co-reference! We've got our
hands full trying to produce the learned knowledge, so it's great to find
consumers to work with. What are you thinking of doing for a reduction
step?

bki...@cs.cmu.edu

Manny

unread,
Sep 28, 2011, 8:00:26 AM9/28/11
to cmu...@googlegroups.com
Hi there,

I see that you seem to have introduced new relations, namely isoneoccurrenceof and ismultipleof. Thanks for that :-)
I'm curious to see how it turns out.

Bryan Kisiel

unread,
Sep 29, 2011, 2:25:54 PM9/29/11
to cmu...@googlegroups.com
Looks like it's off to an iffy start, but that seems to be true of most
relations that do not have a very specific domain and range. If you think
up others to add, just drop a line.

bki...@cs.cmu.edu

Reply all
Reply to author
Forward
0 new messages