Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Regarding the twitter interface
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  4 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Enki A. Waterhaus II  
View profile  
 More options Jul 26 2011, 5:02 pm
From: "Enki A. Waterhaus II" <john.o...@gmail.com>
Date: Tue, 26 Jul 2011 14:02:24 -0700 (PDT)
Local: Tues, Jul 26 2011 5:02 pm
Subject: Regarding the twitter interface
Judging by the types of misspellings I see from the assumption-ranking
mechanism, I get the impression that NELL learns from twitter in
addition to posting some of its assumptions. Is this accurate?
Alternately, what kinds of sources does it learn from aside from the
2009 web scrape mentioned in the methodology section of the web page?

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bryan Kisiel  
View profile  
 More options Jul 27 2011, 6:14 pm
From: Bryan Kisiel <bkis...@cs.cmu.edu>
Date: Wed, 27 Jul 2011 18:14:20 -0400 (EDT)
Local: Wed, Jul 27 2011 6:14 pm
Subject: Re: [cmunell] Regarding the twitter interface
Hi Enki,

NELL is not hooked up to read twitter feeds especially.  However, the SEAL
sub-learner uses a process of issuing queries to search engines,
retrieving pages from links that look interesting, and trying to extract
facts from the content of those pages.  So it's possible for NELL to read
things from twitter if SEAL downloads somebody's twitter page and finds
something good to extract.  But it turns out that there are only a handful
of things that NELL believes that are supported by something it read from
twitter.

At the moment, downloading pages from search results via SEAL is the only
other significant source that NELL reads other than the 2009 web scrape.

There are other sub-learners that look at what has been read from these
two sources and look for patterns, or chains of inference, and things like
that.  The CMC sub-learner looks for orthographical regularities, and has,
for instance, noticed that names of rivers frequently end in the word
"river", "creek", or "brook" (see the first entry of
http://rtw.ml.cmu.edu/rtw/kbbrowser/predmeta:river).  It has also noticed
that person names tend to begin with capital letters.  It's not always so
clever, though, and we've noticed that it often latches on to strange
spellings, so that may be responsible for some of what you've noticed.

Another thing that I have seen happen is that SEAL will retrieve a page
that is desinged to be highly ranked by search engines by containing many
common misspellings of a popular word.  Then it can get tricked into
thinking that it's seeing a list of different things that are all in the
same category.  That's what happened with all the misspellings of
"pregnancy" for
http://rtw.ml.cmu.edu/rtw/kbbrowser/pred:nondiseasecondition

bkis...@cs.cmu.edu

On Tue, 26 Jul 2011, Enki A. Waterhaus II wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Ohno  
View profile  
 More options Jul 28 2011, 8:19 am
From: John Ohno <john.o...@gmail.com>
Date: Thu, 28 Jul 2011 08:19:32 -0400
Local: Thurs, Jul 28 2011 8:19 am
Subject: Re: [cmunell] Regarding the twitter interface
Have you looked into using some of the other sources of 'good' data?
For instance, the wikipedia full text or the project gutenberg texts?
The two combined are smaller than Project Lemur's web scrapes and
arguably have a greater signal to noise ratio than arbitrary web
searches. While I realize that the goal is to read the web (with all
the parsing of dirty data that implies), starting off with sources
that have comparatively few typos (and other non-coding
irregularities) seems like a reasonable choice.

--
--
John Ohno
http://firstchurchofspacejesus.blogspot.com/

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bryan Kisiel  
View profile  
 More options Jul 28 2011, 3:58 pm
From: Bryan Kisiel <bkis...@cs.cmu.edu>
Date: Thu, 28 Jul 2011 15:58:03 -0400 (EDT)
Local: Thurs, Jul 28 2011 3:58 pm
Subject: Re: [cmunell] Regarding the twitter interface

Hi John,

I guess the real reason why we never gave much thought to pointing SEAL at
a higher-quality corpus is that it would take time and effort to get that
done.  SEAL isn't written to operate off of a corpus sitting on disk, and
we'd also have to index that corpus for searching, and then we might
discover that we don't have enough computer power to run those searches as
fast as we'd like.  NELL's biggest limitation is really that we don't have
enough person-hours to keep up with all the good ideas.  But it would be
valuable to do as you suggest and give SEAL access to local corpora, and
it would also be valuable to give CPL acceess to the terrabytes worth of
web pages that SEAL has downloaded in the past year and a half.  One of
these days (I hope)...

I think that when we look at NELL, we tend to see so many opportunities
for doing things better or for getting more out of the text that it reads
that we are not inclined to worry too much about the mistakes that it
currently makes.

bkis...@cs.cmu.edu


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »