about NELL API

232 views
Skip to first unread message

peter

unread,
May 17, 2012, 3:58:04 AM5/17/12
to NELL: Never-Ending Language Learner
Dear NELL team:

I am a graduate student. My research area is about analogy
retrieval and analogy mapping. I am fascinated by the large data NELL
read from the web. I wonder if there is any API that I can use to
access the dataset? I'd appreciate it if any information is provided.
Thank you very much in advance.

Best,


peter

Bryan Kisiel

unread,
May 17, 2012, 6:30:03 PM5/17/12
to NELL: Never-Ending Language Learner
Hi Peter,

Analogy retrieval and mapping sounds like a very good idea to me --
personally, I think being able to capture metaphor and analogy are
fundamentally essential in order to have any true understanding of
language.

But, unfortunately, we still don't have a nice public API. The easiest
way to access what NELL learns is to follow the "every belief in the KB"
link on http://rtw.ml.cmu.edu/rtw/resources, which will get you a
tab-seperated-value file where each line is one belief that NELL has
learned. This link is automatically updated every so often with the
latest copy of NELL's KB -- usually about once a week. I'd be happy to
explain the content in more detail, and if it doesn't seem to have what
you're looking for then let me know and maybe I can set up a different
kind of dump.

What kind of API would you be interested in seeing?

bki...@cs.cmu.edu

peter

unread,
May 22, 2012, 10:02:43 AM5/22/12
to NELL: Never-Ending Language Learner

Hi, Bryan,

Thank you for your reply.
I try to open a file "NELL.08m.570.esv.csv" and try to see the content
of the file.
I think I need to read some papers or guides of this project to know
more about its details.
In my plan, I would like these knowledge can be in some format, like
semantic network(graphs containing nodes and edges, nodes are
concepts,
edges are relations).
Maybe I can create a parser to extract the concepts and relations.
Then build a network something like that.

After I know more details, I'll discuss with you.
Thank you very much.

Best,


peter

On 5月18日, 上午6時30分, Bryan Kisiel <bkis...@cs.cmu.edu> wrote:
> Hi Peter,
>
> Analogy retrieval and mapping sounds like a very good idea to me --
> personally, I think being able to capture metaphor and analogy are
> fundamentally essential in order to have any true understanding of
> language.
>
> But, unfortunately, we still don't have a nice public API.  The easiest
> way to access what NELL learns is to follow the "every belief in the KB"
> link onhttp://rtw.ml.cmu.edu/rtw/resources, which will get you a
> tab-seperated-value file where each line is one belief that NELL has
> learned.  This link is automatically updated every so often with the
> latest copy of NELL's KB -- usually about once a week.  I'd be happy to
> explain the content in more detail, and if it doesn't seem to have what
> you're looking for then let me know and maybe I can set up a different
> kind of dump.
>
> What kind of API would you be interested in seeing?
>
> bkis...@cs.cmu.edu

Bryan Kisiel

unread,
May 23, 2012, 11:15:32 AM5/23/12
to NELL: Never-Ending Language Learner
Hi Peter,

http://rtw.ml.cmu.edu/papers/carlson-aaai10.pdf is probably the best paper
to read. It describes the overall system architecture.

bki...@cs.cmu.edu

petera...@gmail.com

unread,
Jul 31, 2012, 10:36:52 PM7/31/12
to cmu...@googlegroups.com

Hi, Alisa,
 
             I did not find APIs for NELL. If I write or find any APIs,
I would be glad to inform you!              

                                                                                               Peter

On 2012年07月24日 09:04, Alisa_IPN wrote:
Hi Peter!
 
I am a graduate student working in relation inference in Knowledge Bases and I have a similar question as you.
 
Did you manage to resolve the problem with APIs for NELL?   Have you written or found some API to access the dataset?
 
I'll appreciate your answer!
 
Thanks,
Alisa 
 
 
 

четверг, 17 мая 2012 г., 0:58:04 UTC-7 пользователь peter написал:

Manny

unread,
Aug 21, 2012, 5:39:33 AM8/21/12
to cmu...@googlegroups.com
"True" APIs don't exist yet. However, as indicated earlier in the discussion, you can simply download the compressed file and go from there.

If it helps, here is how I did it (in C#/.NET):
  • I used the Rhino ETL library for importing data. I wrote one operation to read the file line by line and extracting information I required along the way.
  • I stored relations locally in a mongoDB instance (but any SQL or NoSQL solution should do the trick).
  • I then used MapReduce to create a string -> conceptId mapping.
Note that trying to extract the CSV file and opening it in a text editor is a bad idea; the file is way to big for notepad et al. to open it. Your best bet are the usual set of linux command line tools (eg. less), or text viewers specifically tailored to open very large files. For Windows, the Large Text File Viewer comes to mind (http://www.swiftgear.com/ltfviewer/features.html)

I'll append the first few lines from the 620th iteration, hopefully this gives you an idea about how the file is structured:

Entity Relation Value Iteration of Promotion Probability Source Candidate Source Entity literalStrings Value literalStrings Best Entity literalString Best Value literalString Categories for Entity Categories for Value
concept:everypromotedthing generalizations everything 0 NaN (null) [OntologyModifier-Iter:603-2012/06/27-16:15:36-tsv_to_om_category.pl-categories.xls] "EVERYTHING" "Everything" "everything" EVERYTHING
concept:everypromotedthing:kareem concept:istallerthan concept:athlete:shaq 620 1.0 MBL-Iter:620-2012/08/01-09:57:01-From ErrorBasedIntegrator (OntologyModifier(kareem,shaq)) [OntologyModifier-Iter:603-2012/06/27-16:15:36-<token=kareem,shaq>-tsv_to_om_relation.pl-relations.xls-seed] "Kareem" "Shaq" "SHAQ" "shaq" Kareem Shaq concept:everypromotedthing concept:athlete
concept:everypromotedthing:kareem generalizations concept:everypromotedthing 603 1.0 MBL-Iter:603-2012/06/28-02:59:30-From ErrorBasedIntegrator (OntologyModifier(kareem,everypromotedthing)) [OntologyModifier-Iter:603-2012/06/27-16:15:36-<token=kareem,everypromotedthing>-tsv_to_om_relation.pl-relations.xls-seed] "Kareem" Kareem concept:everypromotedthing
concept:everypromotedthing:gaggle generalizations concept:everypromotedthing 405 1.0 MBL-Iter:405-2011/09/07-15:42:20-From ErrorBasedIntegrator (OntologyModifier(gaggle,everypromotedthing)) [OntologyModifier-Iter:405-2011/09/06-15:36:21-<token=gaggle,everypromotedthing>-tsv_to_om_relation.pl-relations.xls-seed] "gaggle" "Gaggle" gaggle concept:everypromotedthing
concept:everypromotedthing:svn concept:synonymfor concept:everypromotedthing:subversion 502 1.0 MBL-Iter:502-2012/02/04-23:01:34-From ErrorBasedIntegrator (OntologyModifier(svn,subversion), CPL(svn,subversion)) [OntologyModifier-Iter:130-2010/07/23-09:57:44-<token=svn,subversion>-tsv_to_om_relation.pl-relations.xls-seed, CPL-Iter:155-2010/09/28-18:02:06-<token=svn,subversion>-instances: "arg2 and Tortoise arg1"] "svn" "SVN" "subversion" "Subversion" svn subversion concept:everypromotedthing concept:everypromotedthing
concept:everypromotedthing:svn generalizations concept:everypromotedthing 130 1.0 MBL-Iter:130-2010/07/23-12:32:59-From ErrorBasedIntegrator (OntologyModifier(svn,everypromotedthing)) [OntologyModifier-Iter:130-2010/07/23-09:57:44-<token=svn,everypromotedthing>-tsv_to_om_relation.pl-relations.xls-seed] "svn" "SVN" svn concept:everypromotedthing
concept:everypromotedthing:armistice_day concept:synonymfor concept:everypromotedthing:remembrance_day 130 1.0 MBL-Iter:130-2010/07/23-12:32:59-From ErrorBasedIntegrator (OntologyModifier(armistice_day,remembrance_day)) [OntologyModifier-Iter:130-2010/07/23-09:57:44-<token=armistice_day,remembrance_day>-tsv_to_om_relation.pl-relations.xls-seed] "Armistice Day" "Remembrance Day" Armistice Day Remembrance Day concept:everypromotedthing concept:everypromotedthing
concept:everypromotedthing:armistice_day generalizations concept:everypromotedthing 130 1.0 MBL-Iter:130-2010/07/23-12:32:59-From ErrorBasedIntegrator (OntologyModifier(armistice_day,everypromotedthing)) [OntologyModifier-Iter:130-2010/07/23-09:57:44-<token=armistice_day,everypromotedthing>-tsv_to_om_relation.pl-relations.xls-seed] "Armistice Day" Armistice Day concept:everypromotedthing
concept:everypromotedthing:roberto_duran generalizations concept:everypromotedthing 389 1.0 MBL-Iter:389-2011/08/15-20:53:49-From ErrorBasedIntegrator (OntologyModifier(roberto_duran,everypromotedthing)) [OntologyModifier-Iter:0-2010/01/16-8:00:00--<token=roberto_duran,everypromotedthing>-tsv_to_om_relation.pl-relations.xls-seed] "roberto duran" "Roberto Duran" roberto duran concept:everypromotedthing
concept:everypromotedthing:city_of_angels generalizations concept:everypromotedthing 620 1.0 MBL-Iter:620-2012/08/01-09:57:01-From ErrorBasedIntegrator (OntologyModifier(city_of_angels,everypromotedthing)) [OntologyModifier-Iter:67-2010/03/24-11:15:27-<token=city_of_angels,everypromotedthing>-tsv_to_om_relation.pl-relations.xls-seed] "city of angels" "City Of Angels" "City of Angels" city of angels concept:everypromotedthing

If you only require (entity, relation, entity) tuples, I think a simple regular expression like ^([^\t]+)\t([^\t]+)\t([^\t]+) should do.

If required, I'll try to refactor my .NET based import application to make it more readable and then make it available on bitbucket or github.


 - Manuel
Reply all
Reply to author
Forward
0 new messages