Re: Major Update to RDF Structure on data.nytimes.com

12 views
Skip to first unread message
Message has been deleted

Kingsley Idehen

unread,
Nov 10, 2009, 1:42:28 PM11/10/09
to nyt_linked...@googlegroups.com
Evan Sandhaus wrote:
> Twelve days ago, we took our first step into linked open data. Since
> then we’ve received much great feedback on how best to improve our
> Linked Data Service. Based on this feedback, we are making several
> changes to the structure of our linked data documents.
>
> The first change you’ll notice is that each document now contains two
> resources. The reason for this is as follows.
>
> Lets pretend we have a resource with the URI http://data.nytimes.com/foo
> that is served from a file named http://data.nytimes.com/foo.rdf. In
> our original release this document contained a single resource
> http://data.nytimes.com/foo. Since we attached licensing information
> to this resource and declared it to be owl:sameAs external resources,
> an inference engine could conclude that The New York Times was
> asserting ownership and license terms over data that didn’t belong to
> us.
>
> Since it was never our intention to do anything of this sort, we have
> revised our documents to contain two resources. The document
> http://data.nytimes.com/foo.rdf now contains resources http://data.nytimes.com/foo
> and http://data.nytimes.com/foo.rdf. Licensing information is now
> attached to the resource ending in “.rdf” and owl:sameAs assertions
> are made in the resource http://data.nytimes.com/foo. To make clear
> the relation between these two resources, the resource ending in
> “.rdf” is asserted have foaf:primaryTopic “http://data.nytimes.com/
> foo.” We believe that this approach both avoids unwanted propagation
> of license terms yet preserves the clarity of the license information.
>
> So that’s the big change. We have also made several smaller updates.
>
> 1. The predicates 'time:start' and 'time:end' have been replaced with
> 'nyt:first_use' and 'nyt:last_use' respectively. The intent of the
> 'time:[start|end]' triples was to express the time a subject heading
> was first and last used in the Times. Unfortunately, these triples
> were ambiguous, so we have decided to extend the 'time:[start|end]'
> predicate with our own predicates which we will define to have the
> above semantics.
>
> 2. The 'nyt:topicPage', 'cc:attributionURL', and 'cc:license' triples
> now refer to resource URIs , rather than literal URLs.
>
> 3. The incorrectly stated 'cc:Attribution' predicate has been replaced
> with the correct 'cc:attributionURL' predicate.
>
> 4. The incorrectly stated 'cc:License' predicate has been replaced
> with the correct 'cc:license' predicate. (capitalization)
>
> 5. We have resolved issues with content negotiation on our server.
>
> 6. An XML declaration was added to the top of the rdf documents.
>
> 7. The freebase namespace declaration 'xmlns:fb="http://
> rdf.freebase.com/ns/"' was removed from the RDF declaration as it is
> not used in any statements contained in our document.
>
> 8. Freebase resources are now linked using the URI structure
> http://rdf.freebase.com/ns/foo rather than http://rdf.freebase.com/rdf/foo.
> This URI structure permits freebase’s servers to perform content
> negotiation on the requested URI, leading to a better user experience
> for human readers.
>
> 9. We have added a “dcterms:modified” triple to the http://data.nytimes.com/foo.rdf
> resource that indicates the time at which the resource was last
> updated.
>
> 10. Creative Commons branding has been added to the HTML renderings of
> our resources.
>
> So these are today’s changes, but there are several more updates still
> in the pipeline. These include:
>
> 1. New York Times namespace documentation
>
> 2. More mappings from subject headings to dbpedia and freebase.
>
> 3. Sample applications of data.
>
> Almost every change announced today is the result of community
> feedback. We really mean it when we say that we appreciate and value
> your comments, criticisms and suggestions. So please, keep them
> coming.
>
>
Evan,

Great turnaround.

See:
http://linkeddata.uriburner.com/about/html/http/data.nytimes.com/N87331589133328408563

Note, you have the triple:

<http://data.nytimes.com/N87331589133328408563> owl:sameAs
<http://data.nytimes.com/N87331589133328408563.rdf>

Linked Data debugger helps highlight the problem re. the above:

1. http://tr.im/EGJ5 -- Shows functional Generic HTTP URI working fine
2. http://tr.im/EGIq -- Shows why the triple above is problematic.

--


Regards,

Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com


Evan Sandhaus

unread,
Nov 10, 2009, 2:17:10 PM11/10/09
to The New York Times Linked Open Data Community
The problem was as follows:

http://data.nytimes.com/N87331589133328408563 was declared owl:sameAs
http://data.nytimes.com/pacino_al_per
http://data.nytimes.com/pacino_al_per was declared owlSame as
http://data.nytimes.com/N87331589133328408563.rdf

So an inference engine would infer:
This is a problem, but it is fixed now.

http://data.nytimes.com/pacino_al_per is now declared owlSame as
http://data.nytimes.com/N87331589133328408563

Which will prevent the inference.

Best,

Evan

Good news the

On Nov 10, 1:42 pm, Kingsley Idehen <kide...@openlinksw.com> wrote:
> Evan Sandhaus wrote:
> > Twelve days ago, we took our first step into linked open data.  Since
> > then we’ve received much great feedback on how best to improve our
> > Linked Data Service.  Based on this feedback, we are making several
> > changes to the structure of our linked data documents.
>
> > The first change you’ll notice is that each document now contains two
> > resources.  The reason for this is as follows.
>
> > Lets pretend we have a resource with the URIhttp://data.nytimes.com/foo
> > that is served from a file namedhttp://data.nytimes.com/foo.rdf.   In
> > our original release this document contained a single resource
> >http://data.nytimes.com/foo.   Since we attached licensing information
> > to this resource and declared it to be owl:sameAs external resources,
> > an inference engine could conclude that The New York Times was
> > asserting ownership and license terms over data that didn’t belong to
> > us.
>
> > Since it was never our intention to do anything of this sort, we have
> > revised our documents to contain two resources. The document
> >http://data.nytimes.com/foo.rdfnow contains resourceshttp://data.nytimes.com/foo
> > andhttp://data.nytimes.com/foo.rdf. Licensing information is now
> >http://rdf.freebase.com/ns/foorather thanhttp://rdf.freebase.com/rdf/foo.
> > This URI structure permits freebase’s servers to perform content
> > negotiation on the requested URI, leading to a better user experience
> > for human readers.
>
> > 9. We have added a “dcterms:modified” triple to thehttp://data.nytimes.com/foo.rdf
> > resource that indicates the time at which the resource was last
> > updated.
>
> > 10.        Creative Commons branding has been added to the HTML renderings of
> > our resources.
>
> > So these are today’s changes, but there are several more updates still
> > in the pipeline. These include:
>
> > 1. New York Times namespace documentation
>
> > 2. More mappings from subject headings to dbpedia and freebase.
>
> > 3. Sample applications of data.
>
> > Almost every change announced today is the result of community
> > feedback.   We really mean it when we say that we appreciate and value
> > your comments, criticisms and suggestions.  So please, keep them
> > coming.
>
> Evan,
>
> Great turnaround.
>
> See:http://linkeddata.uriburner.com/about/html/http/data.nytimes.com/N873...
>
> Note, you have the triple:
>
> <http://data.nytimes.com/N87331589133328408563> owl:sameAs
> <http://data.nytimes.com/N87331589133328408563.rdf>
>
> Linked Data debugger helps highlight the problem re. the above:
>
> 1.  http://tr.im/EGJ5-- Shows functional Generic HTTP URI working fine
> 2.  http://tr.im/EGIq--  Shows why the triple above is problematic.

Eric Hellman

unread,
Nov 10, 2009, 2:29:10 PM11/10/09
to nyt_linked...@googlegroups.com
Nice work!

On Nov 10, 2009, at 12:26 PM, Evan Sandhaus wrote:


Twelve days ago, we took our first step into linked open data.  Since
then we’ve received much great feedback on how best to improve our
Linked Data Service.  Based on this feedback, we are making several
changes to the structure of our linked data documents.

The first change you’ll notice is that each document now contains two
resources.  The reason for this is as follows.

Lets pretend we have a resource with the URI http://data.nytimes.com/foo
that is served from a file named http://data.nytimes.com/foo.rdf.   In

our original release this document contained a single resource
http://data.nytimes.com/foo.   Since we attached licensing information
to this resource and declared it to be owl:sameAs external resources,
an inference engine could conclude that The New York Times was
asserting ownership and license terms over data that didn’t belong to
us.

Since it was never our intention to do anything of this sort, we have
revised our documents to contain two resources. The document
and http://data.nytimes.com/foo.rdf. Licensing information is now

This URI structure permits freebase’s servers to perform content
negotiation on the requested URI, leading to a better user experience
for human readers.

9. We have added a “dcterms:modified” triple to the http://data.nytimes.com/foo.rdf

resource that indicates the time at which the resource was last
updated.

10. Creative Commons branding has been added to the HTML renderings of
our resources.

So these are today’s changes, but there are several more updates still
in the pipeline. These include:

1. New York Times namespace documentation

2. More mappings from subject headings to dbpedia and freebase.

3. Sample applications of data.

Almost every change announced today is the result of community
feedback.   We really mean it when we say that we appreciate and value
your comments, criticisms and suggestions.  So please, keep them
coming.

Eric Hellman
President, Gluejar, Inc.
41 Watchung Plaza, #132
Montclair, NJ 07042
USA




Kingsley Idehen

unread,
Nov 10, 2009, 5:48:36 PM11/10/09
to nyt_linked...@googlegroups.com
Evan,

If we add a few triples to the description of the RDF data container
(resource holding the data):

<http://data.nytimes.com/N74378810797427897533.rdf> a <foaf:Document>.

Alongside existing triples such as:
<http://data.nytimes.com/N74378810797427897533.rdf> foaf:primarytopic
<http://data.nytimes.com/N74378810797427897533>;
<http://data.nytimes.com/N74378810797427897533> owl:sameAs
<http://rdf.freebase.com/ns/en.paul_hackett_1947>;
owl:sameAs
<http://dbpedia.org/resource/Paul_Hackett_%28American_football%29>.
..

Then you hit the question: What is entity / data item:
<http://data.nytimes.com/hackett_paul_per>? A skos:Concept, foaf:Person,
or foaf:Document? Basically, that triple is missing right now.

Note, based on what exists currently:

If, <http://data.nytimes.com/hackett_paul_per> owl:sameAs
<http://data.nytimes.com/N74378810797427897533.rdf>

Then, I should get a 303 response for both via HTTP GET since the
entities in the owl:sameAs relation are deemed to be of compatible
entity types (i.e., not disjoint, as is the case between say a Person
Entity and a Document Entity).

At the current time though: <http://data.nytimes.com/hackett_paul_per>,
is a generic HTTP URI (just a Name albeit bound to its metadata doc via
303 re-direction when a representation of its metadata is requested via
HTTP), but its type is unknown.

While: <http://data.nytimes.com/N74378810797427897533.rdf> is a
traditional Web resource URL (report/document with its contents
structured in line with the RDF data model i.e., triples expressed in a
variety of data representation formats, with
<http://data.nytimes.com/N74378810797427897533> as its primarytopic).

Once you resolve Entity Type for:
<http://data.nytimes.com/hackett_paul_per>, your Linked Data graph will
hang together more cohesively (even if there is no reasoning taking
place i.e., data exploration via a browser will work better).

Hope this helps etc. :-)

Evan Sandhaus

unread,
Nov 10, 2009, 6:18:37 PM11/10/09
to The New York Times Linked Open Data Community
Kingsley,

You're quite right to note that http://data.nytimes.com/hackett_paul_per
redirects to http://data.nytimes.com/N74378810797427897533.html.

That is handled via a 303 redirect on the server side.

If you visit http://data.nytimes.com/hackett_paul_per.rdf, (or
request the URI with an Accept header of "Accept: application/rdf
+xml") however, you will see an RDF document asserting that
This would be obvious...if we had good documentation :) (I assure you
that's coming).

Since I am asserting that:
Then we can treat <http://data.nytimes.com/hackett_paul_per> as being
of type <skos:concept> since it is owl:sameAs <http://data.nytimes.com/
N74378810797427897533> which is declared of this type.

Does this address your concerns?

Cheers,

Evan
On Nov 10, 5:48 pm, Kingsley Idehen <kide...@openlinksw.com> wrote:
> Evan Sandhaus wrote:
> > The problem was as follows:
>
> >http://data.nytimes.com/N87331589133328408563was declared owl:sameAs
> >http://data.nytimes.com/pacino_al_per
> >http://data.nytimes.com/pacino_al_perwas declared owlSame as
> >http://data.nytimes.com/N87331589133328408563.rdf
>
> > So an inference engine would infer:
> > <http://data.nytimes.com/N87331589133328408563> owl:sameAs <http://
> > data.nytimes.com/N87331589133328408563.rdf>
>
> > This is a problem, but it is fixed now.
>
> >http://data.nytimes.com/pacino_al_peris now declared owlSame as
> >http://data.nytimes.com/N87331589133328408563
>
> > Which will prevent the inference.
>
> > Best,
>
> > Evan
>
> > Good news the
>
> > On Nov 10, 1:42 pm, Kingsley Idehen <kide...@openlinksw.com> wrote:
>
> >> Evan Sandhaus wrote:
>
> >>> Twelve days ago, we took our first step into linked open data.  Since
> >>> then we’ve received much great feedback on how best to improve our
> >>> Linked Data Service.  Based on this feedback, we are making several
> >>> changes to the structure of our linked data documents.
>
> >>> The first change you’ll notice is that each document now contains two
> >>> resources.  The reason for this is as follows.
>
> >>> Lets pretend we have a resource with the URIhttp://data.nytimes.com/foo
> >>> that is served from a file namedhttp://data.nytimes.com/foo.rdf.   In
> >>> our original release this document contained a single resource
> >>>http://data.nytimes.com/foo.   Since we attached licensing information
> >>> to this resource and declared it to be owl:sameAs external resources,
> >>> an inference engine could conclude that The New York Times was
> >>> asserting ownership and license terms over data that didn’t belong to
> >>> us.
>
> >>> Since it was never our intention to do anything of this sort, we have
> >>> revised our documents to contain two resources. The document
> >>>http://data.nytimes.com/foo.rdfnowcontains resourceshttp://data.nytimes.com/foo
> >>>http://rdf.freebase.com/ns/fooratherthanhttp://rdf.freebase.com/rdf/foo.
> >>> This URI structure permits freebase’s servers to perform content
> >>> negotiation on the requested URI, leading to a better user experience
> >>> for human readers.
>
> >>> 9. We have added a “dcterms:modified” triple to thehttp://data.nytimes.com/foo.rdf
> >>> resource that indicates the time at which the resource was last
> >>> updated.
>
> >>> 10.        Creative Commons branding has been added to the HTML renderings of
> >>> our resources.
>
> >>> So these are today’s changes, but there are several more updates still
> >>> in the pipeline. These include:
>
> >>> 1. New York Times namespace documentation
>
> >>> 2. More mappings from subject headings to dbpedia and freebase.
>
> >>> 3. Sample applications of data.
>
> >>> Almost every change announced today is the result of community
> >>> feedback.   We really mean it when we say that we appreciate and value
> >>> your comments, criticisms and suggestions.  So please, keep them
> >>> coming.
>
> >> Evan,
>
> >> Great turnaround.
>
> >> See:http://linkeddata.uriburner.com/about/html/http/data.nytimes.com/N873...
>
> >> Note, you have the triple:
>
> >> <http://data.nytimes.com/N87331589133328408563> owl:sameAs
> >> <http://data.nytimes.com/N87331589133328408563.rdf>
>
> >> Linked Data debugger helps highlight the problem re. the above:
>
> >> 1.  http://tr.im/EGJ5--Shows functional Generic HTTP URI working fine

Kingsley Idehen

unread,
Nov 10, 2009, 9:05:18 PM11/10/09
to nyt_linked...@googlegroups.com
Evan Sandhaus wrote:
> Kingsley,
>
> You're quite right to note that http://data.nytimes.com/hackett_paul_per
> redirects to http://data.nytimes.com/N74378810797427897533.html.
>
> That is handled via a 303 redirect on the server side.
>
> If you visit http://data.nytimes.com/hackett_paul_per.rdf, (or
> request the URI with an Accept header of "Accept: application/rdf
> +xml") however, you will see an RDF document asserting that
>
> <http://data.nytimes.com/hackett_paul_per> owl:sameAs <http://
> data.nytimes.com/N74378810797427897533>
>
> This would be obvious...if we had good documentation :) (I assure you
> that's coming).
>
> Since I am asserting that:
>
> <http://data.nytimes.com/hackett_paul_per> owl:sameAs <http://
> data.nytimes.com/N74378810797427897533>
>
> Then we can treat <http://data.nytimes.com/hackett_paul_per> as being
> of type <skos:concept> since it is owl:sameAs <http://data.nytimes.com/
> N74378810797427897533> which is declared of this type.
>
> Does this address your concerns?
>
> Cheers,
>
> Evan
>
Evan,

Yes, its better now. But there is still an unknown type in the graph
based on a missing triple that asserts rdf:type. Its best to assume that
most Linked Data user agents won't be endowed with OWL reasoning
capability; especially when the "human" agent type browses via a Web
page, along the lines demonstrated in the sequence that follows:

1.
http://linkeddata.uriburner.com/about/html/http/data.nytimes.com/66209802438676211043
-- An HTML+RDFa representation of the Description of Entity: Madonna,
Type: skos:Concept
2.
http://linkeddata.uriburner.com/about/html/http/data.nytimes.com/66209802438676211043
-- The Description of Entity <http://data.nytimes.com/madonna_per>,
Type: Unknown .

You just need :
<http://data.nytimes.com/madonna_per> a <skos:Concept>
added to the RDF doc :-)

You can get the ODE tool I use at:

1. http://ode.openlinksw.com
2. http://uriburner.com

In either case, you can use the bookmarklet available on each page.

Kingsley

Reply all
Reply to author
Forward
0 new messages