Thank you for the suggestion. However, according to the website, the toal
size of that corpus is a little over 5GB. The corpus that we're using
currently is maybe around 17,500GB. So I'm afraid we wouldn't get a whole
lot of benefit compared to the cost to purchase a copy and time to
reformat it for NELL to read.
On the topic of books, we were recently given a copy of most or all of the
books from the Internet Archive, which looks to be about 1000GB in size.
Actually, it's still sitting on disk waiting to be reorganized and
reformatted... But it will be interesting to see if that is big enough to
make a noticable difference in NELL's learning.