Twelve days ago, we took our first step into linked open data. Since
then we’ve received much great feedback on how best to improve our
Linked Data Service. Based on this feedback, we are making several
changes to the structure of our linked data documents.
The first change you’ll notice is that each document now contains two
resources. The reason for this is as follows.
Lets pretend we have a resource with the URI http://data.nytimes.com/foo
that is served from a file named http://data.nytimes.com/foo.rdf. In
our original release this document contained a single resource
http://data.nytimes.com/foo. Since we attached licensing information
to this resource and declared it to be owl:sameAs external resources,
an inference engine could conclude that The New York Times was
asserting ownership and license terms over data that didn’t belong to
us.
Since it was never our intention to do anything of this sort, we have
revised our documents to contain two resources. The document
http://data.nytimes.com/foo.rdf now contains resources http://data.nytimes.com/foo
and http://data.nytimes.com/foo.rdf. Licensing information is now
attached to the resource ending in “.rdf” and owl:sameAs assertions
are made in the resource http://data.nytimes.com/foo. To make clear
the relation between these two resources, the resource ending in
“.rdf” is asserted have foaf:primaryTopic “http://data.nytimes.com/
foo.” We believe that this approach both avoids unwanted propagation
of license terms yet preserves the clarity of the license information.
So that’s the big change. We have also made several smaller updates.
1. The predicates 'time:start' and 'time:end' have been replaced with
'nyt:first_use' and 'nyt:last_use' respectively. The intent of the
'time:[start|end]' triples was to express the time a subject heading
was first and last used in the Times. Unfortunately, these triples
were ambiguous, so we have decided to extend the 'time:[start|end]'
predicate with our own predicates which we will define to have the
above semantics.
2. The 'nyt:topicPage', 'cc:attributionURL', and 'cc:license' triples
now refer to resource URIs , rather than literal URLs.
3. The incorrectly stated 'cc:Attribution' predicate has been replaced
with the correct 'cc:attributionURL' predicate.
4. The incorrectly stated 'cc:License' predicate has been replaced
with the correct 'cc:license' predicate. (capitalization)
5. We have resolved issues with content negotiation on our server.
6. An XML declaration was added to the top of the rdf documents.
7. The freebase namespace declaration 'xmlns:fb="http://
rdf.freebase.com/ns/"' was removed from the RDF declaration as it is
not used in any statements contained in our document.
8. Freebase resources are now linked using the URI structure
http://rdf.freebase.com/ns/foo rather than http://rdf.freebase.com/rdf/foo.
This URI structure permits freebase’s servers to perform content
negotiation on the requested URI, leading to a better user experience
for human readers.
9. We have added a “dcterms:modified” triple to the http://data.nytimes.com/foo.rdf
resource that indicates the time at which the resource was last
updated.
10. Creative Commons branding has been added to the HTML renderings of
our resources.
So these are today’s changes, but there are several more updates still
in the pipeline. These include:
1. New York Times namespace documentation
2. More mappings from subject headings to dbpedia and freebase.
3. Sample applications of data.
Almost every change announced today is the result of community
feedback. We really mean it when we say that we appreciate and value
your comments, criticisms and suggestions. So please, keep them
coming.
Thanks
Ivan
--
Ivan Herman
Bankrashof 108, 1183NW Amstelveen, The Netherlands
tel: +31-641044153;
URL: http://www.ivan-herman.net
- the class specification has to be on the top level and not as a child
of owl:Ontology (this led to syntax errors with parsers)
- the usage of rdf:ID was incorrect; indeed, the default namespace does
not apply to the value of rdf:ID
- the namespace used in nytd2.rdf was not the same as the ontology URI
But these are mini issues. Once these were handled (I attach the files
as I used them) it works. The nice thing is that OWL 2 RL (ie, the rule
engine profile of OWL) also applies to this solution. Ie, running the
two files through my OWLRL reasoner (there is an online service at [1])
you do get triples like:
<http://data.nytimes.com/N24334380828843769853.rdf> a
<http://creativecommons.org/ns#license> "The New York Times
Company"^^xsd:string, <http://creativecommons.org/licenses/by/3.0/us/> ;
<http://purl.org/dc/terms/creator> "The New York Times
Company"^^xsd:string ;
<http://purl.org/dc/terms/modified> "2009-11-11"^^xsd:date ;
<http://purl.org/dc/terms/rightsHolder> "The New York Times
Company"^^xsd:string ;
....
which is what you wanted. And I would expect such RL based reasoners to
come to the fore more.
That being said, I am not sure that the 'modified' date should be part
of NYTimesDescription class. This looks like something much more
malleable than the rest. But I may be wrong.
Cheers
Ivan
[1] http://www.ivan-herman.net/Misc/2008/owlrl/
> er...@hellman.net <mailto:ope...@gmail.com>
Richard Cyganiak wrote:Eric,On 18 Nov 2009, at 14:19, Eric Hellman wrote:So on to the question for potential users of linked data- is it better for organizations like the NYT to move "boilerplate triples" into an ontology, or is it better the way it is?I use a lot of simple RDF-based tools that don't include an OWL reasoner. So if the license was encoded in this way, I would not have any way of seeing it. Requiring the use of an OWL reasoner to use your data is a bad idea IMO. Also, I don't see the problem with explicitly adding the licensing triples to each document on your site.+1
As much as I like OWL, reasoning simply cannot be a pre-requisite for publishing Linked Data.
Reasoning is a subjective act, and there are other ways of injecting OWL into the mix, unobtrusively. This is ultimately what makes the RDF data model so powerful i.e. Schema comes last, and from a perspective of the Linked Data beholder (consumer). You cannot model data perfectly for cognitive beings, that's just the way it is -- we are all wired to see the same things differently :-)
KingsleyBest,Richard
--
Regards,
Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO OpenLink Software Web: http://www.openlinksw.com
Eric,
On 18 Nov 2009, at 14:19, Eric Hellman wrote:So on to the question for potential users of linked data- is it better for organizations like the NYT to move "boilerplate triples" into an ontology, or is it better the way it is?
I use a lot of simple RDF-based tools that don't include an OWL reasoner. So if the license was encoded in this way, I would not have any way of seeing it. Requiring the use of an OWL reasoner to use your data is a bad idea IMO. Also, I don't see the problem with explicitly adding the licensing triples to each document on your site.
Best,
Richard
Eric Hellman wrote:Interesting perspective.In your scenario, is there any way that any end user or participant in the linked data distribution chain would see licensing or attribution data?Yes, by de-referencing the HTTP URI associated with the License Data Item (which would be associated with the RDF doc via a triple), that's the beauty of Creative Commons Licenses, they have URIs [1] :-)
Links:
1. http://creativecommons.org/licenses/by/3.0/
2. http://linkeddata.uriburner.com/about/html/http://linkeddata.uriburner.com/about/id/http/creativecommons.org/licenses/by/3.0/
Kingsley
On 18 Nov 2009, at 21:55, Eric Hellman wrote:But in the current form of the data, the licensing and attribution is "hidden" by the need to dereference the ".rdf" URI's; otherwise there is no way to know what triples are covered by the license and attribution.
I don't get your point. If you want to use the data, you have to dereference the .rdf URI anyway, because that's how you get the data. You get the license packaged along with the data. In what way is this “hiding” the license?
Richard