Qualifying sameAs relations

15 views
Skip to first unread message

Evan Sandhaus

unread,
Jan 5, 2010, 9:21:41 AM1/5/10
to The New York Times Linked Open Data Community
Happy 2010!

I'd like to open the year with a question for the community.

We've been quietly continuing our work mapping our subject headings
onto DBPedia, Freebase and other ontologies. In this process we
realized that we could save some time because our subject headings for
publicly traded companies are associated with a stock symbol and an
exchange. For example our subject heading for 'Apple Inc.' is
associated with the stock symbol 'AAPL' on the 'NASDAQ' exchange.
Armed with this knowledge, we can automatically map our publicly-
traded company subject headings to their Freebase and DBPedia
counterparts through queries to those services' APIs.

Although this process has thus far proven accurate, there remains a
potential for errors to creep into the data. For this reason, I want
the NYT data to explicitly indicate if a sameAs relation was manually
or algorithmically derived.

As you recall, our subject headings are published in RDF files
containing two resources: a resource describing the subject heading
itself, and a resource describing the document containing the
resource. For example the document http://data.nytimes.com/N66220017142656459133.rdf
contains two resources:

http://data.nytimes.com/N66220017142656459133
http://data.nytimes.com/N66220017142656459133.rdf

The resource http://data.nytimes.com/N66220017142656459133 asserts two
same as relations:

owl:sameAs http://dbpedia.org/resource/Stephen_Colbert
owl:sameAs http://rdf.freebase.com/ns/en.stephen_colbert

One approach to indicating that these relations were determined
manually would be to add the following relations to the resource
http://data.nytimes.com/N66220017142656459133.rdf

nyt:manually_mapped http://dbpedia.org/resource/Stephen_Colbert
nyt:manually_mapped http://rdf.freebase.com/ns/en.stephen_colbert

So my questions for the group is: what do you think of this approach?
If you don't like it, how would you qualify a sameAs relations to
indicate the method by which it was derived?

Thanks for your feedback!

Evan Sandhaus
--
Semantic Technologist
NYT R+D
@kansandhaus

Kingsley Idehen

unread,
Jan 5, 2010, 9:38:32 AM1/5/10
to nyt_linked...@googlegroups.com
All,

Happy New Year!

Evan:

See: http://trdf.sourceforge.net/provenance/ns.html, I think it will
help re. this matter. Basically, use it to provide provenance data for
the NYT Linked Data Space.

--


Regards,

Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com


Will Fitzgerald

unread,
Jan 5, 2010, 9:40:51 AM1/5/10
to nyt_linked...@googlegroups.com
In my own experience, the distinction isn't between 'manually mapped'
and 'algorithmically mapped.' Eventually (one assumes) that you'll
want to add checks to ensure that algorithmically mapped relations are
'good enough' and correct them if they are not. And the same is true
for 'manually mapped,' right? And the same is true for relations other
than sameAs.

So, it's not so much as a general annotation on a relation, as it is
describing the provenance, version, and error characteristics of the
annotations themselves.

And, as I write that sentence, Kingsley Idehen's note comes in!

--
Will

Eric Hellman

unread,
Jan 5, 2010, 9:46:56 AM1/5/10
to nyt_linked...@googlegroups.com
Very interesting question! This is provenance information. Unfortunately the linked data community does not have a uniform approach to provenance. 

I'm guessing the most likely path will be for provenance info to be tied to named graphs along with licensing info. It would be interesting to see what people think about tying manually_mapped to the ".rdf" resource, and implicitly to the graph, and avoid tying it to a particular triple.

I'm assuming that you'll keep track of the provenance internally, though, whatever gets emitted.

Eric
Eric Hellman
President, Gluejar, Inc.
41 Watchung Plaza, #132
Montclair, NJ 07042
USA




Evan Sandhaus

unread,
Jan 5, 2010, 9:55:08 AM1/5/10
to The New York Times Linked Open Data Community
@Kingsley & @Will

Thanks for the pointers, I will read and learn.

@Eric

We do keep track of provenance information internally. The
(relational) database that backs our internal webapp for mapping terms
indicates the user that created the mapping. We use a special user to
indicate that a mapping was automagically generated.

Keep the good advice coming.

~Evan

On Jan 5, 9:46 am, Eric Hellman <open...@gmail.com> wrote:
> Very interesting question! This is provenance information. Unfortunately the linked data community does not have a uniform approach to provenance.
>
> I'm guessing the most likely path will be for provenance info to be tied to named graphs along with licensing info. It would be interesting to see what people think about tying manually_mapped to the ".rdf" resource, and implicitly to the graph, and avoid tying it to a particular triple.
>
> I'm assuming that you'll keep track of the provenance internally, though, whatever gets emitted.
>
> Eric
>
> On Jan 5, 2010, at 9:21 AM, Evan Sandhaus wrote:
>
>
>
> > Happy 2010!
>
> > I'd like to open the year with a question for the community.
>
> > We've been quietly continuing our work mapping our subject headings
> > onto DBPedia, Freebase and other ontologies.  In this process we
> > realized that we could save some time because our subject headings for
> > publicly traded companies are associated with a stock symbol and an
> > exchange.  For example our subject heading for 'Apple Inc.' is
> > associated with the stock symbol 'AAPL' on the 'NASDAQ' exchange.
> > Armed with this knowledge, we can automatically map our publicly-
> > traded company subject headings to their Freebase and DBPedia
> > counterparts through queries to those services' APIs.
>
> > Although this process has thus far proven accurate, there remains a
> > potential for errors to creep into the data.  For this reason, I want
> > the NYT data to explicitly indicate if a sameAs relation was manually
> > or algorithmically derived.
>
> > As you recall, our subject headings are published in RDF files
> > containing two resources:  a resource describing the subject heading
> > itself, and a resource describing the document containing the

> > resource.  For example the documenthttp://data.nytimes.com/N66220017142656459133.rdf

> > The resourcehttp://data.nytimes.com/N66220017142656459133asserts two


> > same as relations:
>
> > owl:sameAshttp://dbpedia.org/resource/Stephen_Colbert

> > owl:sameAshttp://rdf.freebase.com/ns/en.stephen_colbert


>
> > One approach to indicating that these relations were determined
> > manually would be to add the following relations to the resource
> >http://data.nytimes.com/N66220017142656459133.rdf
>
> > nyt:manually_mappedhttp://dbpedia.org/resource/Stephen_Colbert

> > nyt:manually_mappedhttp://rdf.freebase.com/ns/en.stephen_colbert


>
> > So my questions for the group is: what do you think of this approach?
> > If you don't like it, how would you qualify a sameAs relations to
> > indicate the method by which it was derived?
>
> > Thanks for your feedback!
>
> > Evan Sandhaus
> > --
> > Semantic Technologist
> > NYT R+D
> > @kansandhaus
>
> Eric Hellman
> President, Gluejar, Inc.
> 41 Watchung Plaza, #132
> Montclair, NJ 07042
> USA
>

> e...@hellman.nethttp://go-to-hellman.blogspot.com/

Reply all
Reply to author
Forward
0 new messages