Link from HTML to RDF version (was Re: Major Update to RDF Structure on data.nytimes.com)

17 views
Skip to first unread message

Stephane Corlosquet

unread,
Nov 11, 2009, 10:39:46 AM11/11/09
to nyt_linked...@googlegroups.com
Hi Evan,

It seems the RDF link at the bottom of each HTML page describing a resource is pointing to the same URI. For example, when browsing to http://data.nytimes.com/85579320575627034213.html, I would expect the RDF link at the bottom of the page to point to the RDF version of the HTML page I'm looking at, i.e. in this particular case, I would expect the link to be http://data.nytimes.com/85579320575627034213.rdf, but instead I'm brought to http://data.nytimes.com/N74378810797427897533.rdf (RDF describing Hackett, Paul).

I've tried several other people subject heading pages, and it seems they all link to the same RDF version of Hackett, Paul. It would be nice if instead they were pointing to the RDF version of the same people.

regards,
Stephane.

On Tue, Nov 10, 2009 at 12:26 PM, Evan Sandhaus <kan...@gmail.com> wrote:

Twelve days ago, we took our first step into linked open data.  Since
then we’ve received much great feedback on how best to improve our
Linked Data Service.  Based on this feedback, we are making several
changes to the structure of our linked data documents.

The first change you’ll notice is that each document now contains two
resources.  The reason for this is as follows.

Lets pretend we have a resource with the URI http://data.nytimes.com/foo
that is served from a file named http://data.nytimes.com/foo.rdf.   In
our original release this document contained a single resource
http://data.nytimes.com/foo.   Since we attached licensing information
to this resource and declared it to be owl:sameAs external resources,
an inference engine could conclude that The New York Times was
asserting ownership and license terms over data that didn’t belong to
us.

Since it was never our intention to do anything of this sort, we have
revised our documents to contain two resources. The document
http://data.nytimes.com/foo.rdf now contains resources http://data.nytimes.com/foo
and http://data.nytimes.com/foo.rdf. Licensing information is now
attached to the resource ending in “.rdf” and owl:sameAs assertions
are made in the resource  http://data.nytimes.com/foo.  To make clear
the relation between these two resources, the resource ending in
“.rdf” is asserted have foaf:primaryTopic “http://data.nytimes.com/
foo.”  We believe that this approach both avoids unwanted propagation
of license terms yet preserves the clarity of the license information.

So that’s the big change.   We have also made several smaller updates.

1.      The predicates 'time:start' and 'time:end' have been replaced with
'nyt:first_use' and 'nyt:last_use' respectively.  The intent of the
'time:[start|end]' triples was to express the time a subject heading
was first and last used in the Times.  Unfortunately, these triples
were ambiguous, so we have decided to extend the 'time:[start|end]'
predicate with our own predicates which we will define to have the
above semantics.

2.      The 'nyt:topicPage', 'cc:attributionURL', and 'cc:license' triples
now refer to resource URIs , rather than literal URLs.

3.      The incorrectly stated 'cc:Attribution' predicate has been replaced
with the correct 'cc:attributionURL' predicate.

4.      The incorrectly stated 'cc:License' predicate has been replaced
with the correct 'cc:license' predicate. (capitalization)

5.      We have resolved issues with content negotiation on our server.

6.      An XML declaration was added to the top of the rdf documents.

7.      The freebase namespace declaration 'xmlns:fb="http://
rdf.freebase.com/ns/"' was removed from the RDF declaration as it is
not used in any statements contained in our document.

8.      Freebase resources are now linked using the URI structure
http://rdf.freebase.com/ns/foo rather than http://rdf.freebase.com/rdf/foo.
This URI structure permits freebase’s servers to perform content
negotiation on the requested URI, leading to a better user experience
for human readers.

9.      We have added a “dcterms:modified” triple to the http://data.nytimes.com/foo.rdf
resource that indicates the time at which the resource was last
updated.

10.     Creative Commons branding has been added to the HTML renderings of
our resources.

So these are today’s changes, but there are several more updates still
in the pipeline. These include:

1.      New York Times namespace documentation

2.      More mappings from subject headings to dbpedia and freebase.

3.      Sample applications of data.

Almost every change announced today is the result of community
feedback.   We really mean it when we say that we appreciate and value
your comments, criticisms and suggestions.  So please, keep them
coming.

Evan Sandhaus

unread,
Nov 11, 2009, 11:27:24 AM11/11/09
to The New York Times Linked Open Data Community
Fixed.

Thanks for the catch.

~Evan

On Nov 11, 10:39 am, Stephane Corlosquet <scorlosq...@gmail.com>
wrote:
> Hi Evan,
>
> It seems the RDF link at the bottom of each HTML page describing a resource
> is pointing to the same URI. For example, when browsing tohttp://data.nytimes.com/85579320575627034213.html, I would expect the RDF
> link at the bottom of the page to point to the RDF version of the HTML page
> I'm looking at, i.e. in this particular case, I would expect the link to behttp://data.nytimes.com/85579320575627034213.rdf, but instead I'm brought tohttp://data.nytimes.com/N74378810797427897533.rdf(RDF describing Hackett,
> Paul).
>
> I've tried several other people subject heading pages, and it seems they all
> link to the same RDF version of Hackett, Paul. It would be nice if instead
> they were pointing to the RDF version of the same people.
>
> regards,
> Stephane.
>
> On Tue, Nov 10, 2009 at 12:26 PM, Evan Sandhaus <kan...@gmail.com> wrote:
>
> > Twelve days ago, we took our first step into linked open data.  Since
> > then we’ve received much great feedback on how best to improve our
> > Linked Data Service.  Based on this feedback, we are making several
> > changes to the structure of our linked data documents.
>
> > The first change you’ll notice is that each document now contains two
> > resources.  The reason for this is as follows.
>
> > Lets pretend we have a resource with the URIhttp://data.nytimes.com/foo
> > that is served from a file namedhttp://data.nytimes.com/foo.rdf.   In
> > our original release this document contained a single resource
> >http://data.nytimes.com/foo.   Since we attached licensing information
> > to this resource and declared it to be owl:sameAs external resources,
> > an inference engine could conclude that The New York Times was
> > asserting ownership and license terms over data that didn’t belong to
> > us.
>
> > Since it was never our intention to do anything of this sort, we have
> > revised our documents to contain two resources. The document
> >http://data.nytimes.com/foo.rdfnow contains resources
> >http://data.nytimes.com/foo
> > andhttp://data.nytimes.com/foo.rdf. Licensing information is now
> >http://rdf.freebase.com/ns/foorather thanhttp://rdf.freebase.com/rdf/foo

Eric Hellman

unread,
Nov 17, 2009, 12:47:31 PM11/17/09
to nyt_linked...@googlegroups.com
I've defined a class and property with an owl (1) document


Pending some checking, these should allow the attribution verbosity of the NYT linked data to be significantly reduced.

This (the doc, not the uri): 

could be replaced by this:

or this

I'd appreciate feedback as to whether this (with corrections if needed) should be recommended as a way for organizations to assert licensing and attribution.


Eric Hellman
President, Gluejar, Inc.
41 Watchung Plaza, #132
Montclair, NJ 07042
USA




Ivan Herman

unread,
Nov 17, 2009, 2:28:24 PM11/17/09
to nyt_linked...@googlegroups.com
Eric,

are the URI-s o.k.? My browser reports a 'could not locate remove
server' error :-(

Ivan

Eric Hellman

unread,
Nov 17, 2009, 3:44:43 PM11/17/09
to nyt_linked...@googlegroups.com

Ivan Herman

unread,
Nov 18, 2009, 3:42:48 AM11/18/09
to nyt_linked...@googlegroups.com
Hm. Yes it works now... I should have checked. (Though I thought
browsers have some heuristics to try to find host names and this was one
of those...)

Thanks

Ivan

--

Ivan Herman
Bankrashof 108, 1183NW Amstelveen, The Netherlands
tel: +31-641044153;
URL: http://www.ivan-herman.net

Ivan Herman

unread,
Nov 18, 2009, 4:13:30 AM11/18/09
to nyt_linked...@googlegroups.com
I think it works, modulo some RDF/XML encoding errors:-(

- the class specification has to be on the top level and not as a child
of owl:Ontology (this led to syntax errors with parsers)
- the usage of rdf:ID was incorrect; indeed, the default namespace does
not apply to the value of rdf:ID
- the namespace used in nytd2.rdf was not the same as the ontology URI

But these are mini issues. Once these were handled (I attach the files
as I used them) it works. The nice thing is that OWL 2 RL (ie, the rule
engine profile of OWL) also applies to this solution. Ie, running the
two files through my OWLRL reasoner (there is an online service at [1])
you do get triples like:

<http://data.nytimes.com/N24334380828843769853.rdf> a
<http://creativecommons.org/ns#license> "The New York Times
Company"^^xsd:string, <http://creativecommons.org/licenses/by/3.0/us/> ;
<http://purl.org/dc/terms/creator> "The New York Times
Company"^^xsd:string ;
<http://purl.org/dc/terms/modified> "2009-11-11"^^xsd:date ;
<http://purl.org/dc/terms/rightsHolder> "The New York Times
Company"^^xsd:string ;
....

which is what you wanted. And I would expect such RL based reasoners to
come to the fore more.

That being said, I am not sure that the 'modified' date should be part
of NYTimesDescription class. This looks like something much more
malleable than the rest. But I may be wrong.

Cheers

Ivan


[1] http://www.ivan-herman.net/Misc/2008/owlrl/

> er...@hellman.net <mailto:ope...@gmail.com>

nyt.owl
nytd2.rdf

Eric Hellman

unread,
Nov 18, 2009, 8:19:01 AM11/18/09
to nyt_linked...@googlegroups.com
Thanks so much, Ivan!

I haven't authored OWL before, so there's a bit of a learning curve.

You're probably right about the mod date.

Somewhere else it's been noted that the object of dc:creator should be a resource, not a literal.

A comment on the "generator service". Obviously it can be used as a sort of validator even if it's not one per se, but because the word "validator" doesn't appear on its web page, I never found it searching for "owl validator" or "owl validation".

So on to the question for potential users of linked data- is it better for organizations like the NYT to move "boilerplate triples" into an ontology, or is it better the way it is?

Eric
> <?xml version="1.0" encoding="UTF-8"?>
> <rdf:RDF
> xmlns:owl="http://www.w3.org/2002/07/owl#"
> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
> xmlns="http://www.gluejar.com/rdf/nytd#"
>>
> <owl:Ontology rdf:about="">
> <rdfs:comment>This ontology is meant to provide definitions to support NYT linked data</rdfs:comment>
> <rdfs:label>Helper ontology for NYT</rdfs:label>
> </owl:Ontology>
> <owl:Class rdf:about="http://www.gluejar.com/rdf/nytd#NYTimesDescription">
> <owl:intersectionOf rdf:parseType="Collection">
> <owl:Restriction>
> <owl:onProperty rdf:resource="http://purl.org/dc/terms/rightsHolder" />
> <owl:hasValue rdf:datatype="http://www.w3.org/2001/XMLSchema#string">The New York Times Company</owl:hasValue>
> </owl:Restriction>
> <owl:Restriction>
> <owl:onProperty rdf:resource="http://purl.org/dc/terms/modified" />
> <owl:hasValue rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2009-11-11</owl:hasValue>
> </owl:Restriction>
> <owl:Restriction>
> <owl:onProperty rdf:resource="http://purl.org/dc/terms/creator" />
> <owl:hasValue rdf:datatype="http://www.w3.org/2001/XMLSchema#string">The New York Times Company</owl:hasValue>
> </owl:Restriction>
> <owl:Restriction>
> <owl:onProperty rdf:resource="http://creativecommons.org/ns#license" />
> <owl:hasValue rdf:resource="http://creativecommons.org/licenses/by/3.0/us/"/>
> </owl:Restriction>
> <owl:Restriction>
> <owl:onProperty rdf:resource="http://creativecommons.org/ns#license" />
> <owl:hasValue rdf:datatype="http://www.w3.org/2001/XMLSchema#string">The New York Times Company</owl:hasValue>
> </owl:Restriction>
> </owl:intersectionOf>
> </owl:Class>
> <rdf:Property rdf:about="http://www.gluejar.com/rdf/nytd#nytTopic">
> <rdfs:subClassOf rdf:resource="http://xmlns.com/foaf/0.1/primaryTopic"/>
> <rdfs:subClassOf rdf:resource="http://creativecommons.org/ns#attributionURL"/>
> </rdf:Property>
> </rdf:RDF><nytd2.rdf>

Richard Cyganiak

unread,
Nov 18, 2009, 8:46:58 AM11/18/09
to nyt_linked...@googlegroups.com
Eric,

On 18 Nov 2009, at 14:19, Eric Hellman wrote:
> So on to the question for potential users of linked data- is it
> better for organizations like the NYT to move "boilerplate triples"
> into an ontology, or is it better the way it is?

I use a lot of simple RDF-based tools that don't include an OWL
reasoner. So if the license was encoded in this way, I would not have
any way of seeing it. Requiring the use of an OWL reasoner to use your
data is a bad idea IMO. Also, I don't see the problem with explicitly
adding the licensing triples to each document on your site.

Best,
Richard

Kingsley Idehen

unread,
Nov 18, 2009, 9:09:56 AM11/18/09
to nyt_linked...@googlegroups.com
Richard Cyganiak wrote:
> Eric,
>
> On 18 Nov 2009, at 14:19, Eric Hellman wrote:
>> So on to the question for potential users of linked data- is it
>> better for organizations like the NYT to move "boilerplate triples"
>> into an ontology, or is it better the way it is?
>
> I use a lot of simple RDF-based tools that don't include an OWL
> reasoner. So if the license was encoded in this way, I would not have
> any way of seeing it. Requiring the use of an OWL reasoner to use your
> data is a bad idea IMO. Also, I don't see the problem with explicitly
> adding the licensing triples to each document on your site.
+1

As much as I like OWL, reasoning simply cannot be a pre-requisite for
publishing Linked Data.

Reasoning is a subjective act, and there are other ways of injecting OWL
into the mix, unobtrusively. This is ultimately what makes the RDF data
model so powerful i.e. Schema comes last, and from a perspective of the
Linked Data beholder (consumer). You cannot model data perfectly for
cognitive beings, that's just the way it is -- we are all wired to see
the same things differently :-)


Kingsley
>>> rdf:resource="http://purl.org/dc/terms/modified" />
>>> <owl:hasValue
>>> rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2009-11-11</owl:hasValue>
>>>
>>> </owl:Restriction>
>>> <owl:Restriction>
>>> <owl:onProperty
>>> rdf:resource="http://purl.org/dc/terms/creator" />
--


Regards,

Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com




Eric Hellman

unread,
Nov 18, 2009, 9:16:37 AM11/18/09
to nyt_linked...@googlegroups.com
Is rdf schema-defined vocabulary ok? Do your tools understand creative commons vocabularies; if the info was there in the triples, would any of your tools pay attention? Does OWL-lite versus DL matter?

Eric Hellman

unread,
Nov 18, 2009, 9:20:43 AM11/18/09
to nyt_linked...@googlegroups.com
Interesting perspective.

In your scenario, is there any way that any end user or participant in the linked data distribution chain would see licensing or attribution data?

Eric


On Nov 18, 2009, at 9:09 AM, Kingsley Idehen wrote:

Richard Cyganiak wrote:
Eric,

On 18 Nov 2009, at 14:19, Eric Hellman wrote:
So on to the question for potential users of linked data- is it better for organizations like the NYT to move "boilerplate triples" into an ontology, or is it better the way it is?

I use a lot of simple RDF-based tools that don't include an OWL reasoner. So if the license was encoded in this way, I would not have any way of seeing it. Requiring the use of an OWL reasoner to use your data is a bad idea IMO. Also, I don't see the problem with explicitly adding the licensing triples to each document on your site.
+1

As much as I like OWL, reasoning simply cannot be a pre-requisite for publishing Linked Data.

Reasoning is a subjective act, and there are other ways of injecting OWL into the mix, unobtrusively. This is ultimately what makes the RDF data model so powerful i.e. Schema comes last, and from a perspective of the Linked Data beholder (consumer). You cannot model data perfectly for cognitive beings, that's just the way it is -- we are all wired to see the same things differently :-)


Kingsley



Best,
Richard






--


Regards,

Kingsley Idehen      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO OpenLink Software     Web: http://www.openlinksw.com







Eric Hellman
President, Gluejar, Inc.
41 Watchung Plaza, #132
Montclair, NJ 07042
USA

Richard Cyganiak

unread,
Nov 18, 2009, 9:53:25 AM11/18/09
to nyt_linked...@googlegroups.com
Hi Eric,

On 18 Nov 2009, at 15:16, Eric Hellman wrote:
> Is rdf schema-defined vocabulary ok?

Typically RDFS is not used to remove boilerplate triples from
documents in the way you did. RDFS is used to introduce redundant
triples that express the same thing in more generic terms (e.g. Eric
is not just a Person but also an Agent and a LivingThing and a Thing).
As such, not understanding RDFS is not such a big deal, I will still
see the most specific piece of information (Eric is a Person).

> Do your tools understand creative commons vocabularies; if the info
> was there in the triples, would any of your tools pay attention?

My tools don't understand the CC vocabulary, but I do, so the triples
in the data tell me what I need to know about re-using the NYT data.
The fact that I *could* program my tools to discriminate based on
licenses is a nice plus, of course.

> Does OWL-lite versus DL matter?

No. Adding an OWL Lite reasoner to an RDF tool is no easier than for
OWL DL.

(Eric, we're drifting off-topic; this list is about the NYT's open
data, so maybe we should take this thread off-list.)

Best,
Richard

Kingsley Idehen

unread,
Nov 18, 2009, 10:01:06 AM11/18/09
to nyt_linked...@googlegroups.com
Eric Hellman wrote:
> Interesting perspective.
>
> In your scenario, is there any way that any end user or participant in
> the linked data distribution chain would see licensing or attribution
> data?
Yes, by de-referencing the HTTP URI associated with the License Data
Item (which would be associated with the RDF doc via a triple), that's
the beauty of Creative Commons Licenses, they have URIs [1] :-)

Links:

1. http://creativecommons.org/licenses/by/3.0/
2.
http://linkeddata.uriburner.com/about/html/http://linkeddata.uriburner.com/about/id/http/creativecommons.org/licenses/by/3.0/

Kingsley
>> <http://www.openlinksw.com/blog/%7Ekidehen>
>> President & CEO OpenLink Software Web: http://www.openlinksw.com
>>
>>
>>
>>
>
>
>
> Eric Hellman
> President, Gluejar, Inc.
> 41 Watchung Plaza, #132
> Montclair, NJ 07042
> USA
>
> er...@hellman.net <mailto:ope...@gmail.com>
> http://go-to-hellman.blogspot.com/

Evan Sandhaus

unread,
Nov 18, 2009, 10:04:04 AM11/18/09
to The New York Times Linked Open Data Community
Just adding my $0.02 to this subject.

While the conceptual elegance of Eric's approach is appealing, I share
Richard's concern that there is a danger that this approach obscures
the licensing details.

Speaking of OWL and RDFS, we're currently authoring the official
specification for our namespace, hope to have that up soon.

All the best,

Evan


Tom Heath (Gmail)

unread,
Nov 18, 2009, 10:44:09 AM11/18/09
to nyt_linked...@googlegroups.com
2009/11/18 Evan Sandhaus <kan...@gmail.com>:
> Just adding my $0.02 to this subject.

Likewise...

> While the conceptual elegance of Eric's approach is appealing, I share
> Richard's concern that there is a danger that this approach obscures
> the licensing details.

+1. After having gained some consensus that explicit licensing
statements are highly desirable, let's not adopt approaches that
obscure them in any way, however elegant the modelling :)

Tom.

Eric Hellman

unread,
Nov 18, 2009, 3:37:46 PM11/18/09
to nyt_linked...@googlegroups.com
Richard,

Do your tools pay attention to owl:sameAs? 
If not, are you also suggesting to that linked data publishers should not use any owl vocabulary?
If so, how are linked data publishers to know which elements of the owl vocabulary are generally useful, and which ones should be avoided?

(I'm not arguing, just trying to understand the viewpoints.)

Eric

On Nov 18, 2009, at 8:46 AM, Richard Cyganiak wrote:

Eric,

On 18 Nov 2009, at 14:19, Eric Hellman wrote:
So on to the question for potential users of linked data- is it better for organizations like the NYT to move "boilerplate triples" into an ontology, or is it better the way it is?

I use a lot of simple RDF-based tools that don't include an OWL reasoner. So if the license was encoded in this way, I would not have any way of seeing it. Requiring the use of an OWL reasoner to use your data is a bad idea IMO. Also, I don't see the problem with explicitly adding the licensing triples to each document on your site.

Best,
Richard





Eric Hellman
President, Gluejar, Inc.
41 Watchung Plaza, #132
Montclair, NJ 07042
USA

Eric Hellman

unread,
Nov 18, 2009, 3:55:37 PM11/18/09
to nyt_linked...@googlegroups.com
But in the current form of the data, the licensing and attribution is "hidden" by the need to dereference the ".rdf" URI's; otherwise there is no way to know what triples are covered by the license and attribution. If consumers of the linked data always dereferenced every URI, then this is not a problem of course.

Beauty requires a beholder, in other words.


On Nov 18, 2009, at 10:01 AM, Kingsley Idehen wrote:

Eric Hellman wrote:
Interesting perspective.

In your scenario, is there any way that any end user or participant in the linked data distribution chain would see licensing or attribution data?
Yes, by de-referencing the HTTP URI associated with the License Data Item (which would be associated with the RDF doc via a triple), that's the beauty of Creative Commons Licenses, they have URIs [1] :-)

Links:

1. http://creativecommons.org/licenses/by/3.0/
2. http://linkeddata.uriburner.com/about/html/http://linkeddata.uriburner.com/about/id/http/creativecommons.org/licenses/by/3.0/

Kingsley


Kingsley Idehen

unread,
Nov 18, 2009, 4:09:52 PM11/18/09
to nyt_linked...@googlegroups.com
Eric Hellman wrote:
> But in the current form of the data, the licensing and attribution is
> "hidden" by the need to dereference the ".rdf" URI's; otherwise there
> is no way to know what triples are covered by the license and
> attribution. If consumers of the linked data always dereferenced every
> URI, then this is not a problem of course.
>
> Beauty requires a beholder, in other words.
Yes, and the current setup caters to the Human or Machine beholders :-)

Kingsley
>
>
> On Nov 18, 2009, at 10:01 AM, Kingsley Idehen wrote:
>
>> Eric Hellman wrote:
>>> Interesting perspective.
>>>
>>> In your scenario, is there any way that any end user or participant
>>> in the linked data distribution chain would see licensing or
>>> attribution data?
>> Yes, by de-referencing the HTTP URI associated with the License Data
>> Item (which would be associated with the RDF doc via a triple),
>> that's the beauty of Creative Commons Licenses, they have URIs [1] :-)
>>
>> Links:
>>
>> 1. http://creativecommons.org/licenses/by/3.0/
>> 2.
>> http://linkeddata.uriburner.com/about/html/http://linkeddata.uriburner.com/about/id/http/creativecommons.org/licenses/by/3.0/
>>
>> Kingsley
>>
>>
>> Regards,
>>
>> Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
>> <http://www.openlinksw.com/blog/%7Ekidehen>
>> President & CEO OpenLink Software Web: http://www.openlinksw.com
>>
>>
>>
>>
>


--

Richard Cyganiak

unread,
Nov 18, 2009, 4:29:12 PM11/18/09
to nyt_linked...@googlegroups.com
On 18 Nov 2009, at 21:37, Eric Hellman wrote:
> Do your tools pay attention to owl:sameAs?

> If not, are you also suggesting to that linked data publishers
> should not use any owl vocabulary?
> If so, how are linked data publishers to know which elements of the
> owl vocabulary are generally useful, and which ones should be avoided?

Every bit of OWL can be useful when publishing RDF. You just have to
be aware that many RDF consumers will be completely oblivious to all
or parts of the things stated in OWL.

Licenses are what enables others to legally use your intellectual
property. So if you want your IP to be used, then it's probably a good
idea to state your license as clearly, simply and unambiguously as
possible.

Hence my argument that when publishing RDF, you should express the
license as plainly as possible, in RDF, inside the published file,
using some established convention. The NYT data, as published, does
exactly this, which is great.

Best,
Richard

Richard Cyganiak

unread,
Nov 18, 2009, 4:45:49 PM11/18/09
to nyt_linked...@googlegroups.com
On 18 Nov 2009, at 21:55, Eric Hellman wrote:
> But in the current form of the data, the licensing and attribution
> is "hidden" by the need to dereference the ".rdf" URI's; otherwise
> there is no way to know what triples are covered by the license and
> attribution.

I don't get your point. If you want to use the data, you have to
dereference the .rdf URI anyway, because that's how you get the data.
You get the license packaged along with the data. In what way is this
“hiding” the license?

Richard

Eric Hellman

unread,
Nov 18, 2009, 6:36:16 PM11/18/09
to nyt_linked...@googlegroups.com
I'm thinking of the 2nd hand data consumer. Suppose you load the data into your tuple store (along with freebase and dbpedia data, for example), set up a SPARQL endpoint and make that available. Someone getting data from you would have to have some way to also get the attribution and licensing assertions, and associate them with the triples they've extracted. Since those assertions are made on the ".rdf" resource, the second hand consumers would have to actually dereference the document to see what's covered, but they don't need to dereference the document to get all the triples. So in that sense, the license assertions that are explicit on the document are implicit on the data.

You might think this is just an issue for the suits, but being able to track an assertion back to its source is a good thing. Its different for the New York Times to assert that Barack Obama was born in the US than it is for Richard Cyganiak to make that assertion. That's why I was exploring the concept of "hiding" attribution info in an ontology. The New York Times has to look at its business objectives to judge what's best for them. If all they're wanting is more traffic to topics pages, then certainly putting attribution in an ontology would be very silly. If, as Evan suggested in his June talk, the Times also aims to build its status as a "fact source of record", then a binding of the data with the attribution might not be so silly.

This is a terrible analogy, but a custom ontology could act as a sort of "shrink-wrap license", by forcing anyone wanting to make sense of the data to also be forced to ingest the license. Again, this might be a bad idea (as I think shrink-wrap licenses are) but I think it's worth understanding the implications.

Eric

On Nov 18, 2009, at 4:45 PM, Richard Cyganiak wrote:

On 18 Nov 2009, at 21:55, Eric Hellman wrote:
But in the current form of the data, the licensing and attribution is "hidden" by the need to dereference the ".rdf" URI's; otherwise there is no way to know what triples are covered by the license and attribution.

I don't get your point. If you want to use the data, you have to dereference the .rdf URI anyway, because that's how you get the data. You get the license packaged along with the data. In what way is this “hiding” the license?

Richard


Eric Hellman
President, Gluejar, Inc.
41 Watchung Plaza, #132
Montclair, NJ 07042
USA

Richard Cyganiak

unread,
Nov 19, 2009, 3:41:17 AM11/19/09
to nyt_linked...@googlegroups.com
On 19 Nov 2009, at 00:36, Eric Hellman wrote:
> I'm thinking of the 2nd hand data consumer. Suppose you load the
> data into your tuple store (along with freebase and dbpedia data,
> for example), set up a SPARQL endpoint and make that available.
> Someone getting data from you would have to have some way to also
> get the attribution and licensing assertions, and associate them
> with the triples they've extracted. Since those assertions are made
> on the ".rdf" resource, the second hand consumers would have to
> actually dereference the document to see what's covered, but they
> don't need to dereference the document to get all the triples.

All RDF stores support named graphs these days. If you ingest RDF from
anywhere into an RDF store, and you care about provenance, then you
load the stuff that comes from 1234.rdf into a named graph called
1234.rdf. Then you can do things like:

SELECT * WHERE {
GRAPH ?graph {
?graph cc:attributionName ?source
}
}

> So in that sense, the license assertions that are explicit on the
> document are implicit on the data.

Only if you don't manage your data properly, e.g., by throwing away
context information.

Richard

Eric Hellman

unread,
Nov 19, 2009, 8:52:34 AM11/19/09
to nyt_linked...@googlegroups.com
If named graph implementations are sufficiently standardized and the practice is explicit, then named graphs are an excellent solution. (I need to study up on this.)

Eric Hellman

unread,
Nov 20, 2009, 1:35:17 PM11/20/09
to nyt_linked...@googlegroups.com
Two more questions on the named graph approach.

Individual assertions about ".rdf" uri's should get properly mapped to graph ids if the data is picked up topic by topic, but what happens if the full "people.rdf" file is consumed. Will clients typically assign the whole big graph a single "people.rdf" graph id? Shouldn't there be an additional attribution assertion on uri for the big file?

Would it be correct, good or bad practice to use rdf:about="" when you want to make assertions about "this document"?

Eric

Eric Hellman

unread,
Nov 21, 2009, 12:20:23 PM11/21/09
to nyt_linked...@googlegroups.com
Thanks to everyone who contributed to the discussion of my suggestion. I've blogged what I've learned for general consumption, including the conclusion that using owl is not the right thing to do here, at http://go-to-hellman.blogspot.com/2009/11/putting-linked-data-boilerplate-in-box.html

Richard Cyganiak

unread,
Nov 21, 2009, 5:17:52 PM11/21/09
to nyt_linked...@googlegroups.com
On 20 Nov 2009, at 19:35, Eric Hellman wrote:
> Individual assertions about ".rdf" uri's should get properly mapped
> to graph ids if the data is picked up topic by topic, but what
> happens if the full "people.rdf" file is consumed. Will clients
> typically assign the whole big graph a single "people.rdf" graph id?

Yes. From a client's POV, the people.rdf file is no different than the
individual .rdf files, it's just bigger.

> Shouldn't there be an additional attribution assertion on uri for
> the big file?

Yes, that's perhaps a good idea, in the interest of stating the
license on every possible path that a client could take towards the
data. A machine client looking at people.rdf has no way of knowing
that it's the sum of all the individual .rdf files, so the assertions
about those files won't mean much to it.

> Would it be correct, good or bad practice to use rdf:about="" when
> you want to make assertions about "this document"?

It's correct and IMO a good idea.

Best,
Richard

Evan Sandhaus

unread,
Nov 21, 2009, 5:58:59 PM11/21/09
to nyt_linked...@googlegroups.com
Great post Eric, and good call Richard.

Look for a licensing information resource  in the next version of the people.rdf file.

Cheers,

Evan

olyerickson

unread,
Dec 7, 2009, 2:55:14 PM12/7/09
to The New York Times Linked Open Data Community
I'd like to test my understanding of the named graph approach, esp. by
checking whether what has been said is consistent with OAI-ORE (http://
www.openarchives.org/ore/) --- which of course is based on named
graph principles.

As I understand what has been proposed above, when the file
"people.rdf" is consumed, a URI naming that aggregation is assigned.
By the guidelines of linked data (and ORE...) yet another resource
must be created --- the resource map (ReM) for "people.rdf" --- which
in addition to possible other metadata would include any rights
assertions about that aggregation.

The ORE data model specifies the relationship between aggregations and
resource maps here:

http://www.openarchives.org/ore/1.0/datamodel#ReM-to-aggr

Thoughts?

John
John S. Erickson, Ph.D.
Bitwacker Associates
http://bitwacker.blogspot.com
olyer...@gmail.com
> e...@hellman.nethttp://go-to-hellman.blogspot.com/

Evan Sandhaus

unread,
Dec 14, 2009, 12:12:15 PM12/14/09
to The New York Times Linked Open Data Community
Thanks for the pointer to this great resource. I will circulate the
document around the office and see what people think.

Cheers,

Evan

On Dec 7, 2:55 pm, olyerickson <olyerick...@gmail.com> wrote:
> I'd like to test my understanding of the named graph approach, esp. by
> checking whether what has been said is consistent with OAI-ORE (http://www.openarchives.org/ore/)  --- which of course is based on named
> graph principles.
>
> As I understand what has been proposed above, when the file
> "people.rdf" is consumed, a URI naming that aggregation is assigned.
> By the guidelines of linked data (and ORE...) yet another resource
> must be created --- the resource map (ReM) for "people.rdf" --- which
> in addition to possible other metadata would include any rights
> assertions about that aggregation.
>
> The ORE data model specifies the relationship between aggregations and
> resource maps here:
>
> http://www.openarchives.org/ore/1.0/datamodel#ReM-to-aggr
>
> Thoughts?
>
> John
> John S. Erickson, Ph.D.
> Bitwacker Associateshttp://bitwacker.blogspot.com
> olyerick...@gmail.com
Reply all
Reply to author
Forward
0 new messages