It might, but it has not been specifically configured to do that. There
are two main internet-based resources that NELL uses at the moment. One
is the ClueWeb09 corpus that consists of about 1 billion web pages crawled
in 2009. The other is using Google to look up web pages on which it might
discover additional facts by associating them with things that it has
already learned. So it's possible that NELL has extracted knowledge from
online literature using one of those methods, but how much I don't really
know.
Someone once suggested using Project Gutenberg for NELL, which is a good
idea, but at this point I'm not sure that we would see a definitive
benefit from anything that's not in the 100s of gigabytes in size. My
thinking is that it has to make a significant impact to the ~17 terabytes
worth of ClueWeb09 that we use because our current process throws away all
but the most commonly-stated things. (We'd like to do better on this in
the future, but we don't yet.) But if you happen to know of some
compilation that is that big, we can certainly look into hooking it up.
Also, does NELL need whole documents? If not, the 5-gram data
available from Google's Ngram Viewer seems useful
http://ngrams.googlelabs.com/datasets
Do you know of a particular patent corpus that is freely available?
As for the ngram dataset, we happen to be using that already for some
research on scoping knowledge temporally. But now that you mention it,
the 5-grams are probably long enough that we could hope to get some
extractions out of them, at least for category instances. We'd surely
wind up with multiword noun phrases with their beginings or endings
chopped off, but I think NELL would benefit from an aditional subcomponent
dedicated to filtering those out anyway. Thanks for the suggestion!
NELL could also benefit from reading court cases. They're in the
public domain. See http://www.commonlii.org/ and related sites. You
will probably need to talk to AustLII (http://www.austlii.edu.au) for
bulk access, as even slow crawling alerts their sys admins. My IP
address has been banned, for example.