How to read the dataset

112 views
Skip to first unread message

kemas wiharja

unread,
Jan 11, 2017, 12:24:01 PM1/11/17
to NELL: Never-Ending Language Learner
Hi NELL,

My name is Kemas Wiharja. I am a researcher in University of Aberdeen.
Currently I am working in Knowledge Graph research area.

I want to ask about how to read your dataset which contain Belief in the KB (the link is http://rtw.ml.cmu.edu/resources/results/08m/NELL.08m.995.esv.csv.gz ).
I've downloaded and uncompressed the dataset. But when I tried to read it using any editor (microsoft excel/gedit/wordpad), the file doesn't show me anything.

Thanks for the reply.

Kemas Wiharja


Bryan Kisiel

unread,
Jan 11, 2017, 1:31:02 PM1/11/17
to NELL: Never-Ending Language Learner
Hi Kemas,

That link works for me and the content shows up in gedit. What I've
noticed a few times over the years is that Firefox for some reason will,
from what I can understand, misinterpret the headers returned by Apache
about the content being compressed and will fail to decompress it,
resulting in a downloaded file that has been gzipped twice.

If the file size is 705678641 then you have a copy that has been gzipped
once. If the file size is 3509291532 then you have the fully decompressed
version.

So as a first thing to try, I would suggest either using a different
browser or method of download (e.g. wget), or gunzipping the downloaded
file twice. If that doesn't work, let me know.

bki...@cs.cmu.edu
> --
> You received this message because you are subscribed to the Google Groups "NELL: Never-Ending Language Learner" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cmunell+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

kemas wiharja

unread,
Jan 12, 2017, 9:16:09 AM1/12/17
to NELL: Never-Ending Language Learner
Hi Bryan,
After splitting the big file into several smaller files, I could read all the contents :-)

my next question : does NELL have belief data set in RDF or OWL format ?

Best Regards,
Kemas

Bryan Kisiel

unread,
Jan 12, 2017, 4:02:43 PM1/12/17
to NELL: Never-Ending Language Learner
Hi Kemas,

Strange as it might seem, when NELL began the group involved was not very
well versed in existing semantic web work, including the use of RDF and
OWL. Consequently, NELL's KB is stored in a custom system we call Theo,
which was first conceived of here at CMU a few decades ago. Last time I
looked into it, it seemed to me that there should be a reasonably
straightforward mapping between Theo and RDF, but there's never been
enough of a need to establish an official one.

So, no. No RDF or OWL.

bki...@cs.cmu.edu


On Thu, 12 Jan 2017, kemas wiharja wrote:

> Hi Bryan,
> After splitting the big file into several smaller files, I could read all
> the contents :-)
>
> my next question : does NELL have belief data set in RDF or OWL format ?
>
> Best Regards,
> Kemas
>
> On Wednesday, January 11, 2017 at 6:31:02 PM UTC, bkisiel wrote:
>>
>> Hi Kemas,
>>
>> That link works for me and the content shows up in gedit. What I've
>> noticed a few times over the years is that Firefox for some reason will,
>> from what I can understand, misinterpret the headers returned by Apache
>> about the content being compressed and will fail to decompress it,
>> resulting in a downloaded file that has been gzipped twice.
>>
>> If the file size is 705678641 then you have a copy that has been gzipped
>> once. If the file size is 3509291532 then you have the fully decompressed
>> version.
>>
>> So as a first thing to try, I would suggest either using a different
>> browser or method of download (e.g. wget), or gunzipping the downloaded
>> file twice. If that doesn't work, let me know.
>>
>> bki...@cs.cmu.edu <javascript:>
>>
>>
>> On Wed, 11 Jan 2017, kemas wiharja wrote:
>>
>>> Hi NELL,
>>>
>>> My name is Kemas Wiharja. I am a researcher in University of Aberdeen.
>>> Currently I am working in Knowledge Graph research area.
>>>
>>> I want to ask about how to read your dataset which contain Belief in the
>> KB
>>> (the link is
>>> http://rtw.ml.cmu.edu/resources/results/08m/NELL.08m.995.esv.csv.gz ).
>>> I've downloaded and uncompressed the dataset. But when I tried to read
>> it
>>> using any editor (microsoft excel/gedit/wordpad), the file doesn't show
>> me
>>> anything.
>>>
>>> Thanks for the reply.
>>>
>>> Kemas Wiharja
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>> Groups "NELL: Never-Ending Language Learner" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>> an email to cmunell+u...@googlegroups.com <javascript:>.
Reply all
Reply to author
Forward
0 new messages