Hypothetically if I was to start feveloping a smilar Web scraper and Learner like NELL, which paper should I begin reading ?

Skip to first unread message

Lucifer Morningstar

Aug 23, 2018, 1:02:05 PM8/23/18
to NELL: Never-Ending Language Learner
I have strong mathematical foundations and have recently completed my programming theory classes, as a mini project I'd like to implement something similar to NELL to start off as I'm too inexperienced to contribute NELL itself atm (I have a period of 1 year totally free at hand), projects like NELL will go a long way to index the web and help improve searches through use of AI. Where do I start off and thanks in advance.

Olfert Rahbek

Aug 23, 2018, 1:18:09 PM8/23/18
to cmu...@googlegroups.com

Hi there,


You are raising a very interesting question. Last I talked with the good people at NELL (admittedly, some time ago) there were concerns in relationship with the fundamental structure. This is a tricky topic to discuss in general terms but there are obvious topic candidates that might benefit from your input. I would suggest looking at highly frequent or medium frequent words such as:













These and other words like them are used in different meaning domains and an ontology would be better for your well researched input.


Best, Olfert




Olfert Rahbek

+45 4052 3114





Margrethevej 28
DK-2900 Hellerup

VAT no. 3756 6020





You received this message because you are subscribed to the Google Groups "NELL: Never-Ending Language Learner" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cmunell+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Aug 23, 2018, 1:22:48 PM8/23/18
to NELL: Never-Ending Language Learner
Hi Lucifer,

As you may have seen, all of the relevant publications are on our website at http://rtw.ml.cmu.edu/rtw/publications and the best starting point would be the latest system overview paper from AAAI15, http://www.cs.cmu.edu/~tom/pubs/NELL_aaai15.pdf.  I must confess that I haven't seen any of the video lectures linked in the next section down, but I have seen some of Tom's other overview presentations, and they're generally quite good for introductions to the basic design philosophy and what the individual learning components are doing.

Some, but not all, of the individual learning components (CPL, SEAL, CMC, PRA, OpenEval, etc.) have papers of their own that I believe should be cited in the AAAI15 paper, but you can check through the publications page if you want to try to scour for more technical detail.  Most of these are more abstract descriptions of an underlying approach rather than technical rundowns of how exactly they applied and implemented for NELL's use.  In this regard, the publications fall somewhat short, but the engineering and implementation to turn this gaggle of algorithms into something that runs and runs tractably is a sufficiently long and meandering story that we have had to made a decision not to release the source code simply because we just don't have the time and manpower to help people go through and understand it all.

However, quite a number of technical questions have been asked in this group over the years, and you're entirely welcome to ask additional ones.

Lucifer Morningstar

Aug 23, 2018, 5:27:25 PM8/23/18
to NELL: Never-Ending Language Learner

Thanks a lot for the reply, "not all, of the individual learning components (CPL, SEAL, CMC, PRA, OpenEval, etc.) have papers of their own" - I guess visit the github repo (if available) and try to decipher how all of this connects. Also "In this regard, the publications fall somewhat short, but the engineering and implementation to turn this gaggle of algorithms into something that runs and runs tractably is a sufficiently long and meandering story that we have had to made a decision not to release the source code simply because we just don't have the time and manpower to help people go through and understand it all." -  I have done some work too, and am completely able to understand your reasons, why you wouldn't want to start writing up docs and explanations for the minutiae of the bigger NELL project.


Aug 28, 2018, 8:13:38 AM8/28/18
to NELL: Never-Ending Language Learner
Hi Lucifer,

With minor exception, all of the source code lives in a private subversion repository, so there's not much for the public to dig through.  In certain cases, a particular group member who developed one of these core algorithms might have a public github repository or some other such thing, although that's not going to include the NELL-specific glue that determines how exactly each one of these algorithms is used by NELL.  We have at various times opened up access when it looked like there might be a good opportunity for collaboration, but, in any case, the better way to get started is to use a venue like this in case you have particular questions, and then we can take it from there as might be appropriate.

Similarly, we do have some relatively overview-level internal documentation on how a lot of the tinker toys work end-to-end, but we've never had enough spare time to translate that into something we could put on the web site, so that's another thing to potentially draw on depending on where things go.  The basic rule is that we'd like to try to be helpful, but that it has to start out with things that come at a pretty trivial cost for us.

Reply all
Reply to author
0 new messages