Major Update to RDF Structure on http://data.nytimes.com

2 views
Skip to first unread message

Evan Sandhaus

unread,
Nov 10, 2009, 12:28:11 PM11/10/09
to The New York Times Linked Open Data Community
Twelve days ago, we took our first step into linked open data. Since
then we’ve received much great feedback on how best to improve our
Linked Data Service. Based on this feedback, we are making several
changes to the structure of our linked data documents.

The first change you’ll notice is that each document now contains two
resources. The reason for this is as follows.

Lets pretend we have a resource with the URI http://data.nytimes.com/foo
that is served from a file named http://data.nytimes.com/foo.rdf. In
our original release this document contained a single resource
http://data.nytimes.com/foo. Since we attached licensing information
to this resource and declared it to be owl:sameAs external resources,
an inference engine could conclude that The New York Times was
asserting ownership and license terms over data that didn’t belong to
us.

Since it was never our intention to do anything of this sort, we have
revised our documents to contain two resources. The document
http://data.nytimes.com/foo.rdf now contains resources http://data.nytimes.com/foo
and http://data.nytimes.com/foo.rdf. Licensing information is now
attached to the resource ending in “.rdf” and owl:sameAs assertions
are made in the resource http://data.nytimes.com/foo. To make clear
the relation between these two resources, the resource ending in
“.rdf” is asserted have foaf:primaryTopic “http://data.nytimes.com/
foo.” We believe that this approach both avoids unwanted propagation
of license terms yet preserves the clarity of the license information.

So that’s the big change. We have also made several smaller updates.

1. The predicates 'time:start' and 'time:end' have been replaced with
'nyt:first_use' and 'nyt:last_use' respectively. The intent of the
'time:[start|end]' triples was to express the time a subject heading
was first and last used in the Times. Unfortunately, these triples
were ambiguous, so we have decided to extend the 'time:[start|end]'
predicate with our own predicates which we will define to have the
above semantics.

2. The 'nyt:topicPage', 'cc:attributionURL', and 'cc:license' triples
now refer to resource URIs , rather than literal URLs.

3. The incorrectly stated 'cc:Attribution' predicate has been replaced
with the correct 'cc:attributionURL' predicate.

4. The incorrectly stated 'cc:License' predicate has been replaced
with the correct 'cc:license' predicate. (capitalization)

5. We have resolved issues with content negotiation on our server.

6. An XML declaration was added to the top of the rdf documents.

7. The freebase namespace declaration 'xmlns:fb="http://
rdf.freebase.com/ns/"' was removed from the RDF declaration as it is
not used in any statements contained in our document.

8. Freebase resources are now linked using the URI structure
http://rdf.freebase.com/ns/foo rather than http://rdf.freebase.com/rdf/foo.
This URI structure permits freebase’s servers to perform content
negotiation on the requested URI, leading to a better user experience
for human readers.

9. We have added a “dcterms:modified” triple to the http://data.nytimes.com/foo.rdf
resource that indicates the time at which the resource was last
updated.

10. Creative Commons branding has been added to the HTML renderings of
our resources.

So these are today’s changes, but there are several more updates still
in the pipeline. These include:

1. New York Times namespace documentation

2. More mappings from subject headings to dbpedia and freebase.

3. Sample applications of data.

Almost every change announced today is the result of community
feedback. We really mean it when we say that we appreciate and value
your comments, criticisms and suggestions. So please, keep them
coming.
Reply all
Reply to author
Forward
0 new messages