I don't know that we've really thought too much about singular vs. plural.
Even among the seed examples that get fed into NELL to start its learning,
there is no uniformity of pluralization.
There's a semi-related issue that NELL tends to not distinguish between
classes of things, e.g. "computer mouse", and instances, e.g., "Bryan's
computer mouse on his desk at work". And there are gradations here, like
"Dell computer mouse model 04P608", which could be a class or an instance
depending on what is being said about it. Not having this distinction can
make things difficult. Maybe all of these things are legitimate members
of the consumerElectronicItem category, and we might know that
consumerElectronicItems are physical objects and that physical objects
have locations, but actually only one of those three mice corresponds to a
single physical thing. I think we have more thinking to do on that one as
well.
I'm not sure if this class vs. instance issue would actually come up in
practice if we were to start merging singular and plural as a rule. I'd
want to keep an eye on that. But either way, we certainly could benefit
from linking singular and plural if we had a way to identify them, and
then it would be possible for NELL to learn to transfer knowledge between
them. There's probably a bunch of similar stuff we could be doing as
well, like maybe with tenses, identifying common prefixes, and maybe even
trying to predict etymology -- I know I do that when I'm trying to read a
word I don't know.
We'd need some real person-hours to go that far, and we might also need to
get a little more clever in distinguishing learning about words vs.
learning about what they refer to, but in the short run I think I'll add
an isMultipleOf relation and maybe NELL can learn to link easy cases like
hamster / hamsters.
Reading the dictionary is a neat idea. NELL doesn't have much ability to
disentangle individual sentences at this point, although we could identify
bodies of text as being authoritative and crank up the learning gain, so
to speak, on them. I think that sort of thing is definitely in the cards,
since we want to make NELL more self-directed down the road. Building a
specialized extractor is of course an option (and maybe good one in this
case) but we're particularly interested in techniques that depend on what
NELL has learned so far so that those techniques will perform better as
NELL learns more.
More generally, I think you've raised a good point: NELL tries to learn
and reason about the things that words and language represent, but not
about the words themselves -- or at least not in the same way. And the
plural/singluar thing makes both a good test case for design and a
profitable way to learn more by association. One more thing that's going
to take some time to address...
Do keep us posted on how things work for co-reference! We've got our
hands full trying to produce the learned knowledge, so it's great to find
consumers to work with. What are you thinking of doing for a reduction
step?