Re: [cmunell] Data set with source page info

40 views
Skip to first unread message

Bryan Kisiel

unread,
Apr 29, 2013, 12:01:43 PM4/29/13
to cmu...@googlegroups.com
Hi Arnab,

Nominally, the "every belief in the KB" link from our resources page at
http://rtw.ml.cmu.edu/rtw/resources contains source information in the
"Candidate Source" column, although the formatting is irregular and this
can be a difficult column to parse.

Most of NELL's evidence comes from two of its learning subcomponents, SEAL
and CPL. SEAL will provide a list of URLs for the facts that it proposes.
CPL will provide a list of textual extraction patterns (e.g. it might
provide a pattern like "mayor of _" for a noun phrase believed to be a
city).

The two next most common kinds of source information available are top-N
features from linear models that match against a given fact. In the case
of category instances, this comes from a subcomponent called CMC that uses
a feature space constructed from orthographical features of noun phrases,
like prefixes, suffixes, word length, and patterns of capitalization. In
the case of relation instances, this comes from a subcomponent called PRA
that is somewhat like a first-order logic rule learner in that each of the
features in its feature space is a chain of relations connecting the two
category instances that are the arguments to the relation instance in
question.

Then there are some less-common sources, like seeds, human feedback, a
FOIL rule learner that we used to run, a component that attempts to match
category instances against geolocation databases, and a component that
attempts to match category instances against Wikipedia pages.

Are you interested in any of these sources in particular? Depending on
what you're looking for, I might be able to generate a file that would be
easier to process than the "every belief in the KB" file.

bki...@cs.cmu.edu


On Mon, 29 Apr 2013, Arnab Dutta wrote:

> Hi all,
> I am currently working with the NELL data set. However, is it possible to
> have the extraction source page information for a fact ? It has the snippet
> which makes NELL think the fact to be true to some extent. Will be nice if
> the source page information can be readily retrieved somehow.
> Any ideas or pointers?
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups "NELL: Never-Ending Language Learner" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cmunell+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
Reply all
Reply to author
Forward
0 new messages