I'd like to open the year with a question for the community.
We've been quietly continuing our work mapping our subject headings
onto DBPedia, Freebase and other ontologies. In this process we
realized that we could save some time because our subject headings for
publicly traded companies are associated with a stock symbol and an
exchange. For example our subject heading for 'Apple Inc.' is
associated with the stock symbol 'AAPL' on the 'NASDAQ' exchange.
Armed with this knowledge, we can automatically map our publicly-
traded company subject headings to their Freebase and DBPedia
counterparts through queries to those services' APIs.
Although this process has thus far proven accurate, there remains a
potential for errors to creep into the data. For this reason, I want
the NYT data to explicitly indicate if a sameAs relation was manually
or algorithmically derived.
As you recall, our subject headings are published in RDF files
containing two resources: a resource describing the subject heading
itself, and a resource describing the document containing the
resource. For example the document http://data.nytimes.com/N66220017142656459133.rdf
contains two resources:
http://data.nytimes.com/N66220017142656459133
http://data.nytimes.com/N66220017142656459133.rdf
The resource http://data.nytimes.com/N66220017142656459133 asserts two
same as relations:
owl:sameAs http://dbpedia.org/resource/Stephen_Colbert
owl:sameAs http://rdf.freebase.com/ns/en.stephen_colbert
One approach to indicating that these relations were determined
manually would be to add the following relations to the resource
http://data.nytimes.com/N66220017142656459133.rdf
nyt:manually_mapped http://dbpedia.org/resource/Stephen_Colbert
nyt:manually_mapped http://rdf.freebase.com/ns/en.stephen_colbert
So my questions for the group is: what do you think of this approach?
If you don't like it, how would you qualify a sameAs relations to
indicate the method by which it was derived?
Thanks for your feedback!
Evan Sandhaus
--
Semantic Technologist
NYT R+D
@kansandhaus
Happy New Year!
Evan:
See: http://trdf.sourceforge.net/provenance/ns.html, I think it will
help re. this matter. Basically, use it to provide provenance data for
the NYT Linked Data Space.
--
Regards,
Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com
So, it's not so much as a general annotation on a relation, as it is
describing the provenance, version, and error characteristics of the
annotations themselves.
And, as I write that sentence, Kingsley Idehen's note comes in!
--
Will
Thanks for the pointers, I will read and learn.
@Eric
We do keep track of provenance information internally. The
(relational) database that backs our internal webapp for mapping terms
indicates the user that created the mapping. We use a special user to
indicate that a mapping was automagically generated.
Keep the good advice coming.
~Evan
On Jan 5, 9:46 am, Eric Hellman <open...@gmail.com> wrote:
> Very interesting question! This is provenance information. Unfortunately the linked data community does not have a uniform approach to provenance.
>
> I'm guessing the most likely path will be for provenance info to be tied to named graphs along with licensing info. It would be interesting to see what people think about tying manually_mapped to the ".rdf" resource, and implicitly to the graph, and avoid tying it to a particular triple.
>
> I'm assuming that you'll keep track of the provenance internally, though, whatever gets emitted.
>
> Eric
>
> On Jan 5, 2010, at 9:21 AM, Evan Sandhaus wrote:
>
>
>
> > Happy 2010!
>
> > I'd like to open the year with a question for the community.
>
> > We've been quietly continuing our work mapping our subject headings
> > onto DBPedia, Freebase and other ontologies. In this process we
> > realized that we could save some time because our subject headings for
> > publicly traded companies are associated with a stock symbol and an
> > exchange. For example our subject heading for 'Apple Inc.' is
> > associated with the stock symbol 'AAPL' on the 'NASDAQ' exchange.
> > Armed with this knowledge, we can automatically map our publicly-
> > traded company subject headings to their Freebase and DBPedia
> > counterparts through queries to those services' APIs.
>
> > Although this process has thus far proven accurate, there remains a
> > potential for errors to creep into the data. For this reason, I want
> > the NYT data to explicitly indicate if a sameAs relation was manually
> > or algorithmically derived.
>
> > As you recall, our subject headings are published in RDF files
> > containing two resources: a resource describing the subject heading
> > itself, and a resource describing the document containing the
> > resource. For example the documenthttp://data.nytimes.com/N66220017142656459133.rdf
> > contains two resources:
>
> >http://data.nytimes.com/N66220017142656459133
> >http://data.nytimes.com/N66220017142656459133.rdf
>
> > The resourcehttp://data.nytimes.com/N66220017142656459133asserts two
> > same as relations:
>
> > owl:sameAshttp://dbpedia.org/resource/Stephen_Colbert
> > owl:sameAshttp://rdf.freebase.com/ns/en.stephen_colbert
>
> > One approach to indicating that these relations were determined
> > manually would be to add the following relations to the resource
> >http://data.nytimes.com/N66220017142656459133.rdf
>
> > nyt:manually_mappedhttp://dbpedia.org/resource/Stephen_Colbert
> > nyt:manually_mappedhttp://rdf.freebase.com/ns/en.stephen_colbert
>
> > So my questions for the group is: what do you think of this approach?
> > If you don't like it, how would you qualify a sameAs relations to
> > indicate the method by which it was derived?
>
> > Thanks for your feedback!
>
> > Evan Sandhaus
> > --
> > Semantic Technologist
> > NYT R+D
> > @kansandhaus
>
> Eric Hellman
> President, Gluejar, Inc.
> 41 Watchung Plaza, #132
> Montclair, NJ 07042
> USA
>
> e...@hellman.nethttp://go-to-hellman.blogspot.com/