Fwd: Fwd: BCID updates

John Deck

unread,

Jul 19, 2013, 5:38:07 PM7/19/13

to tdwg...@googlegroups.com

Hi TDWG-RDF'ers...

Here is a question i'm posting to the list that has come up in relation to saying things about identifiers, where the identifiers are representing physical objects. I'm copying a portion of a strand between Steve Baskauf and myself about this.

Essentially, i would like to say "this identifier is about this physical object" and "this identifier has a particular license associated with its use" (as opposed to being a license about the physical object). Current results of an RDF/XML metadata request for this identifier are:

curl -H "Accept: application/rdf+xml" http://biscicol.org/id/ark:/21547/R2_MBIO56

<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:dc="http://purl.org/dc/elements/1.1/"

xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"

xmlns:bsc="http://biscicol.org/terms/index.html#"

xmlns:dcterms="http://purl.org/dc/terms/">

<rdf:Description rdf:about="ark:/21547/R2_MBIO56">

<rdf:type rdf:resource="http://rs.tdwg.org/dwc/terms/Occurrence" />

<dcterms:mediator rdf:resource="http://biscicol.org/id/metadata/ark:/21547/R2_MBIO56" />

<dcterms:hasVersion rdf:resource="http://biocode.berkeley.edu/specimens/MBIO56" />

<dcterms:isPartOf rdf:resource="http://dx.doi.org/10.7286/V1154F0D" />

<dcterms:rights rdf:resource="http://creativecommons.org/licenses/by/3.0/" />

<dc:title>Moorea Biocode Specimens</dc:title>

<dc:creator>Biocode Project</dc:creator>

<dc:date>2013-Jul-19 21:29:37UTC</dc:date>

<dc:source>MBIO56</dc:source>

<bsc:suffixPassthrough>true</bsc:suffixPassthrough>

</rdf:Description>

</rdf:RDF>

Essentially, i would like a clear way of expressing that dcterms:rights and bsc:suffixPassthrough are related to the identifier and not the object.

John Deck

On Thu, Jul 18, 2013 at 5:53 PM, Steve Baskauf <steve....@vanderbilt.edu> wrote:

John,
Cool! The content negotiation seems to work fine. That's really exciting that you are getting the BCID system functioning.

There was one thing that I was confused about. You assert the triple
<ark:/21547/R2_MBIO56> dcterms:rights <http://creativecommons.org/licenses/by/3.0/>
In the email you were saying that from that you wanted to apply the license to the identifier. But from the way I think about URIs (which I suppose ark:/21547/R2_MBIO56 is one), when you apply a property to the URI, you are saying that it's a property of the thing that the URI identifies, not the URI itself. It seems like like there should be a way to make statements about the identifier itself, but I'm not sure how you would do it in RDF.

This reminds me a little of some of the discussion at the iDigBio meeting when people were talking about specimens, specimen identifiers, and specimen records as if they were synonymous. I don't think they are, but it's not clear to me how one makes the distinction between them in RDF. I suppose somebody in the RDF group has a better idea than me.

Steve

John Deck wrote:

Hey Steve,
Just an FYI -- i went through your suggestions from an email a couple of weeks back and changed some things with respect to identifier response mechanisms... The following email describes what it I changed and why.

John

---------- Forwarded message ----------
From: John Deck <jd...@berkeley.edu>
Date: Wed, Jul 17, 2013 at 10:38 AM
Subject: BCID updates
To: Nico Cellinese <ncell...@flmnh.ufl.edu>, Robert Guralnick <rob...@gmail.com>, "to...@cs.uoregon.edu" <to...@cs.uoregon.edu>

Another item for today's agenda-- I've done some house-cleaning with the BCID system.. some of which you'll actually see on the interface (and alot of it hidden under the hood):

http://biscicol.org/bcid/

Some new descriptive text on the homepage.

Essentially, the identifiers are shorter/cleaner and return proper RDF/XML if you're a machine:

curl -H "Accept: application/rdf+xml" http://biscicol.org/id/ark:/21547/R2_MBIO56

Also, if you look at the response from the above closely you'll see the following line:

<dcterms:rights rdf:resource="http://creativecommons.org/licenses/by/3.0/" />

I've gone ahead and chosen a specific license to apply to the identifier itself which basically says that the ID itself needs attribution, or in other words, don't toss this in a pile and create a new one. Please maintain the existing identifier. Probably this license decision needs further discussion. At the moment, i need a way to say something about the license terms.

Also, i've created a URI for suffixPassthrough so when we say whether a particular ID is the product of suffix passthrough (e.g <bsc:suffixPassthrough>true</bsc:suffixPassthrough>) we need to know what suffixPasstherough actually means. So, the bsc:suffixPassthrough resolves to:

http://biscicol.org/terms/index.html#suffixPassthrough

Finally, instead of waiting for folks to flail around, screw things up, and then have us rush in and offer an alternative system for identifiers, we want to instead offer these identifiers for projects from the get-go. Basically, providing a way to build them into data acquisition systems. Login using demo/demo and click on "Project Creator", the idea being we can create sets of group-level identifiers that can be tied to particular implementations. This is still very draft and ultimately may not even go with the BCID system (and instead be a new project) but i'm including it here since it is convenient and can easily use existing code/authentication for BCID.

John

--

John Wieczorek

unread,

Jul 19, 2013, 5:47:34 PM7/19/13

to tdwg...@googlegroups.com

Interesting problem. Metadata about the primal property of an object. Really curious how one would do such a thing, or indeed if anyone ever has.

--
You received this message because you are subscribed to the Google Groups "TDWG RDF/OWL Task Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tdwg-rdf+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Bob Morris

unread,

Jul 19, 2013, 9:30:14 PM7/19/13

to tdwg...@googlegroups.com

I am mightily confused. Are you licensing the use of the ID? If yes,
by the use of CC-BY license, you are asserting it is subject to
copyright or some other form of legal protection. I find it hard to
imagine that an ID is copyrightable in any jurisdiction, since making
it is not a creative act.

What am I missing?

Robert A. Morris

Emeritus Professor of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390

IT Staff
Filtered Push Project
Harvard University Herbaria
Harvard University

email: morri...@gmail.com
web: http://efg.cs.umb.edu/
web: http://wiki.filteredpush.org
http://www.cs.umb.edu/~ram
===
The content of this communication is made entirely on my
own behalf and in no way should be deemed to express
official positions of The University of Massachusetts at Boston or
Harvard University.

John Deck

unread,

Jul 19, 2013, 10:33:27 PM7/19/13

to tdwg...@googlegroups.com

Well, we can argue if it is a creative act or not. The whole idea here is to try and get downstream consumers of this identifiers (e.g. VertNet, GBIF, iDigBio) to maintain the identifier itself as a representation of the specimen. Typically, aggregators do a lousy job of maintaining identifiers coming from the source (granted, the # of good identifiers minted at the source is small). This is meant to say to those aggregators, please keep this identifier attached to any metadata you consume related to this object.

John

--
John Deck
(541) 321-0689

Roderic D. M. Page

unread,

Jul 20, 2013, 2:20:35 AM7/20/13

to tdwg...@googlegroups.com

Surely the way to encourage downstream use of an identifier is to make it useful, i.e.:

1. persistent (so that I trust it will be around next week, or next year)

2. resolvable on the web (so it gives me something useful to look at)

3. machine readable (so I can save myself some time and grab the data directly)

Aggregators such as GBIF have not reused identifiers because in most cases there haven't been any (which has led to massive duplication of data in GBIF as providers change local "identifiers" whenever they feel like it). They are very keen to do so, it's up to providers to get their act together on this.

As Bob points out Creative Commons is based on copyright, so you are asserting copyright on an identifier string. To my mind, trying to bludgeon people with a dubious license seems the wrong strategy, and is almost an admission that the identifier you are offering doesn't have enough intrinsic value to be adopted on its own merits.

Regards

Rod

Hilmar Lapp

unread,

Jul 20, 2013, 5:43:31 AM7/20/13

to tdwg...@googlegroups.com

On Jul 20, 2013, at 8:20 AM, Roderic D. M. Page wrote:

As Bob points out Creative Commons is based on copyright, so you are asserting copyright on an identifier string. To my mind, trying to bludgeon people with a dubious license seems the wrong strategy, and is almost an admission that the identifier you are offering doesn't have enough intrinsic value to be adopted on its own merits.

Right. And surely, the semantics of dc:rights don't include "please keep this thing around in your aggregation".

It's bad enough if we use natural language in potentially confusing and misleading ways. Machine readable semantics ought to be about being less ambiguous, and less misleading, not about making misleading use of language machine readable.

-hilmar

--

===========================================================

: Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org :

===========================================================

Steve Baskauf

unread,

Jul 21, 2013, 8:24:54 AM7/21/13

to tdwg...@googlegroups.com

This thread has reminded me of a discovery that I made a couple months ago while investigating Creative Common's guidelines for expressing licenses as RDF [1]. Creative Commons is heavily invested in RDFa [2] - I think because it can be related to the way that schema.org extracts metadata from web pages. According to the Creative Commons guidelines, the license for an image can be expressed as

<a about="/bar.jpg" href="http://creativecommons.org/licenses/by/2.0/" rel="license">cc by 2.0</a>

which is a statement in RDFa in addition to a microformat statement. I am not very familiar with RDFa, but using

rel="license"

appears to be equivalent to asserting the triple

foo:/bar.jpg xhtml:license <http://creativecommons.org/licenses/by/2.0/>

where xhtml: is the abbreviation for "http://www.w3.org/1999/xhtml/vocab/#". Audubon Core recommends using xmpRights:UsageTerms for the license expressed as a literal (e.g. the triple

foo:/bar.jpg xmpRights:UsageTerms "Available under a Creative Commons Attribution 2.0 (CC BY) license"

or something like that. But maybe xmpRights:UsageTerms shouldn't be used with a URI reference (the XMP spec isn't clear to me on this point). It would be perfectly valid to use dcterms:rights as a predicate whose object is a Creative Commons license URI since the range of dcterms:rights is dcterms:RightsStatement [3]. But it seems to me that if we are going to recommend a best practice on this, we should recommend that providers use xhtml:license rather than dcterms:rights to link a resource to its license. There are two reasons for this. One is that xhtml:license is a more specific predicate intended for use with licenses, whereas dcterms:rights could be used for any kind of rights statement, including a copyright statement. The other is that there are probably already millions or perhaps billions of media items which are linked RDF to a CC license URI using the predicate xhtml:licence because of microformat/RDFa markup of the XHTML in the web page that contains them.

Any thoughts on this?

Steve

[1] http://wiki.creativecommons.org/RelLicense
[2] http://wiki.creativecommons.org/RDFa
[3] defined as "A statement about the intellectual property rights (IPR) held in or over a Resource, a legal document giving official permission to do something with a resource, or a statement about access rights.

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
PMB 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 322-4942
If you fax, please phone or email so that I will know to look for it.
http://bioimages.vanderbilt.edu

greg whitbread

unread,

Jul 21, 2013, 8:55:31 PM7/21/13

to tdwg...@googlegroups.com

As Rod says, persistent, resolvable and readable might help, but then, taking the additional step of adding content negotiation and re-using the identifier as the cc:attributionURL for the cc:license ( ref: http://wiki.creativecommons.org/XMP ) included with each record has not improved aggregator ( small & large ) take-up - yet. Even though this back-reference is more important to them than the content stripped away.

<tc:TaxonConcept rdf:about="http://biodiversity.org.au/apni.taxon/644466">

...

<cc:license rdf:resource="http://creativecommons.org/licenses/by/3.0/"/>

<cc:attributionURL rdf:resource="http://biodiversity.org.au/apni.taxon/644466"/>

...

</tc:TaxonConcept>

greg

If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email.

--
Greg Whitbread
Australian National Botanic Gardens
Australian National Herbarium
+61 2 62509482
g...@anbg.gov.au

Paul Murray

unread,

Jul 21, 2013, 11:36:46 PM7/21/13

to tdwg...@googlegroups.com

Paul Murray

unread,

Jul 21, 2013, 11:40:41 PM7/21/13

to tdwg...@googlegroups.com

Drat - google doesn't like email certificates. Apologies for the repost.

On 20/07/2013, at 7:38 AM, John Deck wrote:

Essentially, i would like to say "this identifier is about this physical object" and "this identifier has a particular license associated with its use" (as opposed to being a license about the physical object).

Wouldn't you just add an rdf:Description for the identifier itself, and some dcterms or creative commons … ahh - I see the difficulty.

Four options suggest themslves:

* Use an annotation property

* Create a (potentialy anonymous) object that is about the identifier, and add properties to that

* quote the url as a literal, and add some properties to it (this is valid RDF, but may make certain things choke)

* Reify the triple itself by way of a rdf:Statement and add some triples to that.

Annoyingly, the rdf reification vocabulary has a way to reify triples, but not the uris themselves.

I don't know of any standard way of doing this. Hang on: we have this - http://www.w3.org/2004/06/rei but it's not a standard or anything like it. http://www.w3.org/DesignIssues/Reify.html .

As far as I can see, you'd

<rei:Symbol xmlns:rei="http://www.w3.org/2004/06/rei#">

<rei:uri>http://The/Identifier/in/question</rei:uri>

<dcterms:license-info>license info goes here.</dcterms:license-info>

</rei:Symbol>

Incidentally - here at biodiversity.org.au we are using the creative commons vocabulary for a simliar problem. We use cc:attributionUrl to declare that some specific uri is the one that you should be quoting when you are talking about ip relating to our triples. To do what you want to do, I suppose you are saying that the URI itself is a "work", and that it has these IP encumbrances. This meshes perfectly well with the use of that experimental reification vocabulary. Make the "symbol" object above an instance of both "rei:Symbol" and "cc:Work", and away you go.

http://creativecommons.org/ns

RDF Schema

--
You received this message because you are subscribed to the Google Groups "TDWG RDF/OWL Task Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tdwg-rdf+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Paul Murray

unread,

Jul 22, 2013, 1:15:17 AM7/22/13

to tdwg...@googlegroups.com

Sorry about the multiple posts, but I am a little rusty.

It seems to me that the rei:uri predicate does the same job as dcterms:identifier, and I would be inclined to use both.

You want to attach the info about the uri to the info about the object itself, but it's important to not do this in a way that indicated that the licensing information is about the object. This means that "rights" and "license" are not the correct predicates to use. Aside from the old standby seeAlso, perhaps dcterms:requires is relevant here.

(edit: oh, note that I have explicitly declared that the literal is of type xs:anyURI)

I have not syntax-checked this turtle: kindly forgive any solecisms:

<http://example.org/anObject>

owl:sameAs <http://proprietaryUri> ;

dcterms:requires [

is rdf:Statement ;

rdf:subject <http://example.org/anObject> ;

rdf:predicate owl:sameas ;

rdf:object <http://proprietaryUri> ;

dcterms:requires [

is cc:Work , rei:Symbol, <http://example.org/copyrighted_URI> ;

rei:uri "http://example.org/anObject"^^xs:anyURI ;

dcterms:identifier "http://example.org/anObject"^^xs:anyURI ;

cc:license [

is cc:License ;

-- license info goes here --

] .

] ;

] .

Of course, of you want to use both cc:license and dcterms:license, you'll have to pull that license object out and give it a uri.

This states that the declaration that anObject is the sameAs proprietaryUrl requires an anonymous object - A reified URI which is a cc:Work. Then again: if we are attaching rights to the declaration itself, then is that triple itself a "cc:Work"? Hmm. In that case, perhaps the triple itself is both and rdf:Statement and a cc:Work. Actually, that works well because we don't have to use the nonstandard reification vocabulary. No, this doesn't work - the licence does not relate to the act of declaration of "sameAs", but to the URI itself. There's no way to get around the fact that the rdf core vocabulary has a hole in it.

Other uses for reifying the URIs themselves would be to do a job similar to a 302 - to declare that a URI has been replaced by another URI, which is not the same as saying that the object the URI refers to has been replaced. Another use would be for declaring that a given URI was an LSID as opposed to an HTTP URI, or that it is hosted by some particular institution, that it became unavailable at some particular time. Interestingly, dcterms has a number of predicates relating to this - isReferencedBy, isReplacedBy, isVersionOf and so on. But is still misses the critical motion of being able to discuss a URI as a resource in itself, rather than being a name for a resource. Hmm. They have a type vocabulary - perhaps a URI is a Dataset. Kinda-sorta.

Hmm.

http://stackoverflow.com/questions/16366904/rdf-vocabulary-for-describing-uri-components

http://www.w3.org/wiki/URI

http://www.w3.org/TR/2013/NOTE-vocab-adms-20130528/#dcat-accessurl

This vocabulary also gets it wrong. I this vocabulary accessURL and downloadURL are both resource types, not literal types. Browsing through the other vocabularies at WC3, over and over people are getting it wrong, assuming "if it's a URI, then it's a resource".

Nope. RDF simply does not seem to have a vocabulary about URIs. That's what's missing. Sorry.

Paul Murray

unread,

Jul 22, 2013, 1:41:05 AM7/22/13

to tdwg...@googlegroups.com

On 20/07/2013, at 12:33 PM, John Deck wrote:

> Well, we can argue if it is a creative act or not. The whole idea here is to try and get downstream consumers of this identifiers (e.g. VertNet, GBIF, iDigBio) to maintain the identifier itself as a representation of the specimen.

Oh - I see that I have completely misunderstood the issue. As Greg I think mentioned, we use Creative Commons cc:license and cc:attributionURL .

<cc:license rdf:resource="http://creativecommons.org/licenses/by/3.0/"/>

<cc:attributionURL rdf:resource="http://biodiversity.org.au/apni.taxon/54321"/>

This comes through consistently in our JSON, XML, and RDF, which is important given the way we do content negotiation.

If downstream aggregators tend to strip these, then this merely goes to show that claiming IP is far easier than defending it.

John Deck

unread,

Jul 22, 2013, 8:56:44 PM7/22/13

to tdwg...@googlegroups.com

OK, thanks for the comments all...

Interestingly, In my initial email I said: "Essentially, i would like to say "this identifier is about this physical object" and "this identifier has a particular license associated with its use" (as opposed to being a license about the physical object). " Now, folks have clearly debunked this statement-- we don't want to apply a license to the identifier itself. However, as Steve pointed out earlier on, the statements in the example RDF were actually properties of the thing the URI identifies, not about the URI itself. So, the syntax of the RDF was actually correct if we are asserting the license is about the metadata about the object.

I don't know about applying CC licenses to physical objects... that seems strange--- but i see plenty of examples of identifier metadata about physical objects giving them all kinds of properties that are not about the object itself (e.g. title, publisher, creator in Datacite metadata about specimens given DOIs). However, if we're referring to metadata about a physical object, it seems that the suggestions that Paul, Greg, and Steve gave are on target (e.g. http://wiki.creativecommons.org/XMP and http://www.w3.org/1999/xhtml/vocab/#license).

As far as the purpose here, its not to "bludgeon" folks with a license... the focus is on promoting the interests of the agent who did the labor of collecting and/or describing something and ensuring that there is a mechanism for delivering credit downstream. There was a real example recently where a large data provider saw their data being used in a publication with credit given to the aggregator but not the data provider. Increasingly we see multi-tiered aggregations where data is taken from the field researcher, held at an institution, sucked up by an aggregation service, then to a national node, etc.... What mechanisms can we use to assure the field researcher that they can take the initial steps of putting their data up on the web and know they will get credit? As Greg pointed out, we can do all the good things (persistent, resolvable, machine readable) and it still hasn't improved aggregator uptake (yet).

Finally, while we're on the topic, there is another property that really is about the identifier: that is the property http://biscicol.org/terms/index.html#suffixPassthrough . This is an indication I would like to see to describe an identifier that employs a hierarchical scheme... that is, the root of the identifier is registered and we are passing a suffix that is some local specific identifier that belongs to a particular group.

John

Paul Murray

unread,

Jul 22, 2013, 11:21:34 PM7/22/13

to tdwg...@googlegroups.com

On 23/07/2013, at 10:56 AM, John Deck wrote:

So, the syntax of the RDF was actually correct if we are asserting the license is about the metadata about the object.

One of the things RDF is missing is an easy way to talk about the provenance of a triple. Ideally, I'd like to assign a URI to the RDF document as a whole (or parts of it), and for the triples asserted by that document to have an implied fourth term. At that point, we can talk about licensing and "degree of confidence" and such things.

At present, the only way to do it would be to generate a rdf:Statement object for every single triple and stuff that in the file.

To put it another way - RDF is missing a "this document" construct. The dcterms vocabulary is a case in point. Most of them only make sense if the thing identified by the URI is a document of some sort (accessRights, dateCopyrighted, isVersionOf), but it's pretty certain that people are using these terms to talk about the metadata itself.

The core of this issue is the distinction between an "information resource" and an "other resource", which parallels the old idea of a URI that is a "locator" as opposed to an "identifier", and the LSID notion of "data" and "metadata". They are all talking about the same thing. The dcterms vocabulary only makes sense when applied to things that are information resources/locators/metadata (except in those cases where the URI identifies a book or other real-world document, such as an ISBN).

Finally, while we're on the topic, there is another property that really is about the identifier: that is the property http://biscicol.org/terms/index.html#suffixPassthrough . This is an indication I would like to see to describe an identifier that employs a hierarchical scheme... that is, the root of the identifier is registered and we are passing a suffix that is some local specific identifier that belongs to a particular group.

And this gets back to the fact that there doesn't seem to be an ontology for uris in and of themselves. URI schemas in general have parts that it's meaningful to discuss ( http://tools.ietf.org/html/rfc3986#section-1.2.3 ), but there seems to be no defined standard way to talk about this in RDF.

What would be nice is if there were a URI scheme named "meta", whose format is meta:<any uri>. Thus, http://example.org/taxon/5 is the id if a certain taxon, and meta:http://example.org/taxon/5 is a legitimate identifier for that uri itself. Then we could unambiguously assert that

http://example.org/specimen/1?year=1997 hasLegs 4 .

http://example.org/specimen/1?year=1998 hasLegs 3 .

meta:http://example.org/specimen/1?year=1997 isVersionOf meta:http://example.org/specimen/1?year=1998 .

… actually, come to think of it, it's still ambiguous. This third assertion is not about the URI, but about the document that can be fetched at that URI. Maybe the world needs two new URI schemas: a 'meta:' scheme and a 'uri:' scheme.

uri:http://example.org/specimen/1?year=1998 yearPart 1998^^xs:gYear

Hilmar Lapp

unread,

Jul 23, 2013, 2:12:57 AM7/23/13

to tdwg...@googlegroups.com

On Jul 23, 2013, at 2:56 AM, John Deck wrote:

i see plenty of examples of identifier metadata about physical objects giving them all kinds of properties that are not about the object itself (e.g. title, publisher, creator in Datacite metadata about specimens given DOIs).

Datacite DOIs are really for the metadata record, not the object we casually say received the DOI. The same is true for article DOIs, BTW. You can convince yourself of that by trying to dereference either kind of DOI. If you take a Datacite DOI for a digital dataset, for example, it won't dereference to the bitstream. And an article DOI will never give you the article itself - it either gives you a metadata record, or a landing page, depending on user-agent and Accept header.

Hilmar Lapp

unread,

Jul 23, 2013, 12:11:51 PM7/23/13

to tdwg...@googlegroups.com

On Jul 23, 2013, at 5:21 AM, Paul Murray wrote:

Ideally, I'd like to assign a URI to the RDF document as a whole (or parts of it), and for the triples asserted by that document to have an implied fourth term. At that point, we can talk about licensing and "degree of confidence" and such things.

At present, the only way to do it would be to generate a rdf:Statement object for every single triple and stuff that in the file.

To put it another way - RDF is missing a "this document" construct.

Indeed, but why is this necessary. A document can be given an explicit URI if I want to say something about it, and then I can assert things about the document by using that URI as the subject. This is in fact what the RDF version of DwC does just after the header.

Similarly, OWL ontologies can be given an explicit URI using rdf:about for the owl:Ontology element.

What am I missing?

Hilmar Lapp

unread,

Jul 23, 2013, 12:14:51 PM7/23/13

to tdwg...@googlegroups.com

On Jul 22, 2013, at 2:55 AM, greg whitbread wrote:

<tc:TaxonConcept rdf:about="http://biodiversity.org.au/apni.taxon/644466">
...
<cc:license rdf:resource="http://creativecommons.org/licenses/by/3.0/"/>
<cc:attributionURL rdf:resource="http://biodiversity.org.au/apni.taxon/644466"/>
...
</tc:TaxonConcept>

Are there cases where URI and attributionURL aren't the same? I.e., are there cases for which attribution is non-trivial?

Steve Baskauf

unread,

Jul 24, 2013, 9:26:28 AM7/24/13

to tdwg...@googlegroups.com

Hmmm. This isn't the way I have been led to understand content negotiation as the advocates of Linked Data explain it (e.g. http://www.w3.org/TR/cooluris/ ). Dereferencing DOIs pretty much follow the classic content negotiation model a la Linked Data. I dereferenced http://dx.doi.org/10.1126/science.1157784 using http://linkeddata.informatik.hu-berlin.de/uridbg/ with an Accept-Header of application/rdf+xml . In accordance with the http://www.w3.org/TR/cooluris/#r303uri model, I get a 303 redirect to
http://data.crossref.org/10.1126%2Fscience.1157784
which is an RDF/XML document. The content of that page can be examined with http://www.w3.org/RDF/Validator/ . It shows that the RDF document asserts the triple:

<http://dx.doi.org/10.1126/science.1157784> rdf:type bibo:Article
and
<http://dx.doi.org/10.1126/science.1157784> owl:sameAs <doi:10.1126/science.1157784>

I interpret that to mean that the DOI URI identifies a abstract thing which is a bibo:Article, not a metadata document about the article. CrossRef doesn't actually describe <http://data.crossref.org/10.1126%2Fscience.1157784> as an RDF/XML formatted metadata document about the article, but they could. My interpretation of what the Linked Data people assert is that this RDF/XML document, a web page about the article, and a PDF version of the article itself would all be considered representations of the abstract thing. If I am correct about this, if a specimen were given a DOI, the DOI would be the identifier for the specimen itself, not the metadata record. The DOI for a digital dataset would represent the abstract thing and the bitstream would be a representation of it. But perhaps I'm misreading the cooluris document.

Steve

Hilmar Lapp wrote:

--
You received this message because you are subscribed to the Google Groups "TDWG RDF/OWL Task Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tdwg-rdf+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
PMB 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 322-4942

Steve Baskauf

unread,

Jul 24, 2013, 9:59:21 AM7/24/13

to tdwg...@googlegroups.com

This is a followup to my earlier email
(https://groups.google.com/d/msg/tdwg-rdf/1qejTl_gRPA/QXURyH2DiR0J )
pondering whether xhtml:license (i.e.
http://www.w3.org/1999/xhtml/vocab#license ) would be the preferred
property for linking to a Creative Commons URI. I noticed that Paul
mentions cc:license in his email (below). I looked up the definition of
cc:license (i.e. http://creativecommons.org/ns#license ) at
http://creativecommons.org/schema.rdf and noticed that it declares

<http://creativecommons.org/ns#license> owl:sameAs
<http://www.w3.org/1999/xhtml/vocab#license>

and

<http://creativecommons.org/ns#license> rdfs:subPropertyOf
<http://purl.org/dc/terms/license>

So using xhtml:license would not necessarily imply a cc:license property
(although I'm not sure about this since I'm having trouble finding the
RDF definitions of the xhtml: terms - I think they are defined using
RDFa which I don't understand well). But using cc:license would imply
xhtml:license AND dcterms:license properties if reasoning were carried
out. Also, the cc:license definition asserts range cc:License and
domain cc:Work .

So which do we recommend people use? I have xhtml:license in a footnote
of the DwC RDF Guide. Should I change that???
Steve

Paul Murray wrote:
> ...

> Oh - I see that I have completely misunderstood the issue. As Greg I think mentioned, we use Creative Commons cc:license and cc:attributionURL .
>
> <cc:license rdf:resource="http://creativecommons.org/licenses/by/3.0/"/>
> <cc:attributionURL rdf:resource="http://biodiversity.org.au/apni.taxon/54321"/>
>
> This comes through consistently in our JSON, XML, and RDF, which is important given the way we do content negotiation.
>

> ...

Paul Murray

unread,

Jul 25, 2013, 12:12:51 AM7/25/13

to tdwg...@googlegroups.com

On 24/07/2013, at 11:26 PM, Steve Baskauf wrote:

Hmmm. This isn't the way I have been led to understand content negotiation as the advocates of Linked Data explain it (e.g. http://www.w3.org/TR/cooluris/ ). Dereferencing DOIs pretty much follow the classic content negotiation model a la Linked Data.

As I understand it, Linked Data is quite specifically about the HTTP protocol. DOIs use a different protocol altogether, the "handle system"

http://www.rfc-editor.org/rfc/rfc3652

Although dx.doi.org functions as a bridge between the two systems, you cannot directly dereference a DOI in the linked data sense - which is why they have to explicitly assert a "sameAs".

I interpret that to mean that the DOI URI identifies a abstract thing which is a bibo:Article, not a metadata document about the article.

Agreed. The DOI case does illustrate our difficulty. the declaration "<doi:10.1126/science.1157784> hasAuthor mylicense" is saying that that particular article has an author, not that the dataset you are looking at which asserts this fact itself has that author. But this is what we often need to do, as the RDF we distribute is data in its own right.

Paul Murray

unread,

Jul 25, 2013, 12:50:01 AM7/25/13

to tdwg...@googlegroups.com

On 24/07/2013, at 11:59 PM, Steve Baskauf wrote:

> This is a followup to my earlier email (https://groups.google.com/d/msg/tdwg-rdf/1qejTl_gRPA/QXURyH2DiR0J ) pondering whether xhtml:license (i.e. http://www.w3.org/1999/xhtml/vocab#license ) would be the preferred property for linking to a Creative Commons URI. I noticed that Paul mentions cc:license in his email (below). I looked up the definition of cc:license (i.e. http://creativecommons.org/ns#license ) at http://creativecommons.org/schema.rdf and noticed that it declares
>
> <http://creativecommons.org/ns#license> owl:sameAs <http://www.w3.org/1999/xhtml/vocab#license>
>
> and
>
> <http://creativecommons.org/ns#license> rdfs:subPropertyOf <http://purl.org/dc/terms/license>
>
> So using xhtml:license would not necessarily imply a cc:license property

From a machine reasoning POV, yes it would if you are reasoning over the cc ontology. There's a meta-question of "which ontology rulesets are you giving to your reasoner?".

> (although I'm not sure about this since I'm having trouble finding the RDF definitions of the xhtml: terms - I think they are defined using RDFa which I don't understand well).

Well, I can't see that there would be any difficu… gahh!

The capsule summary seems to be here: http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/#sec_3.10.

So our question now is: where the heck is the RDF, RDFa, turtle, Owl functional syntax or *whatever* document describing <http://www.w3.org/1999/xhtml/vocab#license> in machine-readable terms? Oh, of course: it's buried in the HTML itself, *as RDFa*. Let's have a look at the HTML source for that page.

<dt id="license" about="#license" property='rdfa:term' lang='' xml:lang='' typeof="rdf:Property">license</dt>
<dd about="#license" property="rdfs:comment"
datatype="xsd:string"><span property='rdfa:uri' lang='' xml:lang='' content='http://www.w3.org/1999/xhtml/vocab#license'>license</span> refers to a resource that
defines the associated license. </dd>

So the answer seems to be: the only thing defined about <'http://www.w3.org/1999/xhtml/vocab#license> in the xhtml vocabulary itself is

:license rdfa:term rdf:Property .
:license rdfs:comment " refers to a resource that defines the associated license."^^xsd:string .

IOW: it's a property. There are no OWL reasoning rules defined beyond that. (not entirely sure what that stuff in the span element is trying to say)

So if you give those rules to your reasoner, it won't infer anything about cc:license and xhtml:license. If you give the cc rules you your reasoner, it will infer both ways (because that's what sameAs means).

Paul Murray

unread,

Jul 25, 2013, 1:09:24 AM7/25/13

to tdwg...@googlegroups.com

On 24/07/2013, at 2:11 AM, Hilmar Lapp wrote:

> Indeed, but why is this necessary. A document can be given an explicit URI if I want to say something about it, and then I can assert things about the document by using that URI as the subject. This is in fact what the RDF version of DwC does just after the header.

I'm sure you'll be pleased to know that I wrote an extensive, rambling reply which I have deleted and will not be inflicting on anyone.

Currently, We do something like this. In our RDF,

http://biodiversity.org.au/afd.taxon/4f013106-8df2-4f17-8370-f15af183c25a

refers to the taxon object for "Myzostoma attenuatum in AFD"

and
http://biodiversity.org.au/afd.taxon/4f013106-8df2-4f17-8370-f15af183c25a#Instance

refers to an OWL class for "an instance of Myzostoma attenuatum in AFD". By using a hash URI, it all gets bundled up together when you ask for the document.

Quite possibly, we could create RDF identifiers
http://biodiversity.org.au/afd.taxon/4f013106-8df2-4f17-8370-f15af183c25a#JSON
http://biodiversity.org.au/afd.taxon/4f013106-8df2-4f17-8370-f15af183c25a#RDF
http://biodiversity.org.au/afd.taxon/4f013106-8df2-4f17-8370-f15af183c25a#XML
http://biodiversity.org.au/afd.taxon/4f013106-8df2-4f17-8370-f15af183c25a#HTML

and talk about those documents - give the URLs where they are located, assert that they are all versions of one another, that the subject (topic) of each of those documents is the base URI, and that they are encumbered with licenses. Perhaps it might even be meaningful to declare a
http://biodiversity.org.au/afd.taxon/4f013106-8df2-4f17-8370-f15af183c25a#dataset-held-at-boa

which names the data that we hold, as distinct from the taxon concept itself or any particular representation of that data. It could declare that it is part of the afd dataset as a whole, as extracted at some particular date. But this is starting to get a mite over-engineered.

But there is no convenient syntax to declare that "The fact that Myzostoma attenuatum is ectoparasitic is declared by …#dataset-held-at-boa". It might be nice to declare that one part of our document is sourced from one place, and one part from another, and that these declarations apply to every triple declared in those parts. Particularly for aggregated data.

Having said that, I was under the impression that there are standards coming out to address these areas, although I haven't found them with a casual search.

Hilmar Lapp

unread,

Jul 25, 2013, 7:30:35 AM7/25/13

to tdwg...@googlegroups.com

Hi Paul - it sounds to me that the way you're expressing this is well aligned with LOD and http-range-14 [1] recommendations. In terms of more advanced advice than amateurs like me can offer, I'd recommend you post your case and questions to the public-lod list at W3C if you haven't already. Lots of people there who wrestle with these issues daily (and there've been epic threads on http-range-14).

-hilmar

[1] http://www.w3.org/2001/tag/doc/httpRange-14/2007-05-31/HttpRange-14

> --
> You received this message because you are subscribed to the Google Groups "TDWG RDF/OWL Task Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to tdwg-rdf+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Hilmar Lapp

unread,

Jul 25, 2013, 7:47:31 AM7/25/13

to tdwg...@googlegroups.com

On Jul 25, 2013, at 6:12 AM, Paul Murray wrote:

Although dx.doi.org functions as a bridge between the two systems, you cannot directly dereference a DOI in the linked data sense - which is why they have to explicitly assert a "sameAs".

Hmm - the DOI resolver at Crossref - and meanwhile also Datacite - are LD compliant, so I'm not sure what you mean by "cannot directly dereference a DOI in the linked data sense":

http://crosstech.crossref.org/2011/04/content_negotiation_for_crossr.html

http://www.crossref.org/crweblog/2011/04/crossref_and_international_doi.html

I interpret that to mean that the DOI URI identifies a abstract thing which is a bibo:Article, not a metadata document about the article.

Agreed. The DOI case does illustrate our difficulty.

Yes, sorry for my sloppiness. It is not. however, for the PDF, and the metadata for the abstract object don't necessarily have a link to the PDF. Correspondingly for data DOIs, for which downloading the actual data will be a common use case upon dereferencing the data set's identifier, there is no standard convention for how to expose this information. We've been struggling with this for Dryad, and there'll be a better API in the near future to do this sort of thing, but there's no obvious convention to simply follow and implement as far as we're aware.

The same will likely apply to digital specimens. (Physical ones can't be downloaded yet through the wire at the current state of technology, so the issue can just be punted for those. Though, think about hooking a 3D printer to your laptop, and we're not so far off.)

Bob Morris

unread,

Jul 25, 2013, 10:30:42 AM7/25/13

to tdwg...@googlegroups.com

On Jul 25, 2013 7:47 AM, "Hilmar Lapp" <hl...@nescent.org> wrote:

> ...

> The same will likely apply to digital specimens. (Physical ones can't be downloaded yet through the wire at the current state of technology, so the issue can just be punted for those. Though, think about hooking a 3D printer to your laptop, and we're not so far off.)
>
> -hilmar
>

I'm waiting for pricing to drop on quantum entanglement hardware. Then we'll really be not so far off, while being as far off as we like.

- Bob

> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org :
> ===========================================================
>
>
>

Paul Murray

unread,

Jul 26, 2013, 2:40:26 AM7/26/13

to tdwg...@googlegroups.com

On 25/07/2013, at 9:47 PM, Hilmar Lapp wrote:

On Jul 25, 2013, at 6:12 AM, Paul Murray wrote:

Although dx.doi.org functions as a bridge between the two systems, you cannot directly dereference a DOI in the linked data sense - which is why they have to explicitly assert a "sameAs".

Hmm - the DOI resolver at Crossref - and meanwhile also Datacite - are LD compliant, so I'm not sure what you mean by "cannot directly dereference a DOI in the linked data sense":

Well -- with the caveat that I *may be wrong about this* -- what I meant was that this:

doi:10.1126/science.1157784

is a DOI. It a URI whose scheme part is 'doi'. That - by definition - is what a DOI is. This:

http://dx.doi.org/10.1126/science.1157784

is not a DOI. It is an http uri.

The clever thing about Linked Data is to have a convention whereby a URI used in an RDF document directly (well, via a 303 redirect) takes you to the relevant RDF. The URIs, rather than being opaque identifiers, are functionally useful: they are both identifiers and also locators. The magic is that when a piece of software like JENA does this internally:

rdf_document = new URL("http://biodiversity.org.au/apni.taxon/54321").getContent();

Then it just works. The linked data conventions work together with the http specification to serve up the RDF via a URI, rather than a URL.

But the conventions are tied to the http transport protocol, to "the world-wide-web" which by definition is the collection of hypertext pages and other stuff that you can get over http. I suppose the question is, "does dereferencing 'in a linked data sense' imply the use of http?" Looking at the linkeddata.org page, it seems so. The linkeddata "thing" (I'd use a german word if I knew the right one) is entirely about turning the world-wide-web into a semantic web.

There's more to the internet than the web - for instance, email. DOI resolution sits outside http in much the same way as email does. A DOI data packet is not an HTTP request/response. It doesn't "do" 303 - it has other mechanisms named different things. Inside a machine reasoner which is attempting to understand "author is_author_of doi:10.1126/science.1157784", when it tries to do this:

rdf_document = new URL("doi:10.1126/science.1157784").getContent();

we get an "unknown protocol: doi". Now the web being as ubiquitous as it is, the DOI standards have ways of interfacing with http:

http://www.doi.org/doi_handbook/3_Resolution.html#3.7 .

You can use a proxy. You can use a browser plugin. I could write a java protocol handler that simply puts "http://dx.doi/org" on the front and hands off the work to the http handler. But you cannot directly do an http GET request for the URI doi:10.1126/science.1157784 and retrieve the RDF. DOIs are not - very strictly speaking - part of the world wide web.

Maybe I'm being a bit strict with wording, but the impact is that to make Pebble work over DOIs, I'd have to write code beyond what already comes "for free" as part of Java: it's extra work for me to make it go, and that goes for anything that only understands http.

Hilmar Lapp

unread,

Jul 26, 2013, 4:40:27 AM7/26/13

to tdwg...@googlegroups.com

On Jul 26, 2013, at 8:40 AM, Paul Murray wrote:

Well -- with the caveat that I *may be wrong about this* -- what I meant was that this:

doi:10.1126/science.1157784

is a DOI. It a URI whose scheme part is 'doi'. That - by definition - is what a DOI is. This:

http://dx.doi.org/10.1126/science.1157784

is not a DOI. It is an http uri.

No, they are both DOIs, but contrary to the first one the second one is one shown according to recommended guidelines:

http://www.crossref.org/02publishers/doi_display_guidelines.html

I know publishers are slow to catch up with this, and I don't know where Datacite has positioned itself in this regard or whether they've taken any position, but that shouldn't distract from the fact that DOI and its primary registration agency are committed to LD compliance, and that hence so should we. doi:10.1126/science.1157784

Steve Baskauf

unread,

Jul 28, 2013, 9:26:10 AM7/28/13

to tdwg...@googlegroups.com

Well, this pretty much confirms what I was able to glean from the RDFa.

The issue that I see here predicting the extent to which reasoning will be performed on owl:sameAs statements. Let's say that half of the images we are interested in have their license specified using RDFa/microformat and a web scraper assembles triples from that. Those images will all have a license property expressed using xhtml:license. Imagine that the other half of the images we are interested in have their license expressed in RDF/XML using a cc:license property. If no reasoning involving owl:sameAs is done, then a SPARQL query that searches for resources that have a particular cc:license value will miss all of the images whose license is expressed using xhtml:license (and vice-versa). On the other hand, if we allow a reasoner to infer triples based on the owl:sameAs relationship expressed in the CC vocabulary, then all images having the particular license we are interested in will come up. I have no idea how common it will be for users to do such inferencing. I doubt that anybody will just turn their application/reasoner loose and let it make inferences about every owl:sameAs triple it finds - that would be too dangerous.

It seems to me that the safest thing from the standpoint of recommending a best practice for our community would be to either tell people they should use one of the properties or the other (and hope that people pay attention) or recommend that they provide both (probably the safest course of action). But it's annoying - why did CC find it necessary to mint cc:license when xhtml:license was already there. I suppose there was some reluctance to declare range and domain properties for a term that wasn't defined in their own namespace. But still, it presents us with this kind of problem.

Steve

Bob Morris

unread,

Jul 28, 2013, 10:33:03 AM7/28/13

to tdwg...@googlegroups.com

Or make your SPARQL query see if either one is bound and test the
values for it. You have to be willing to deal with multiple licenses
on the same resource, but that is hardly limited to this particular
way of looking for licenses. You can also give one of the preference
over the other, and only look for the other if the preferred one isn't
bound.

Robert A. Morris

Emeritus Professor of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390

IT Staff
Filtered Push Project
Harvard University Herbaria
Harvard University

email: morri...@gmail.com
web: http://efg.cs.umb.edu/
web: http://wiki.filteredpush.org
http://www.cs.umb.edu/~ram
===
The content of this communication is made entirely on my
own behalf and in no way should be deemed to express
official positions of The University of Massachusetts at Boston or
Harvard University.

Paul Murray

unread,

Jul 28, 2013, 11:27:33 PM7/28/13

to tdwg...@googlegroups.com

On 28/07/2013, at 11:26 PM, Steve Baskauf wrote:

> On the other hand, if we allow a reasoner to infer triples based on the owl:sameAs relationship expressed in the CC vocabulary, then all images having the particular license we are interested in will come up. I have no idea how common it will be for users to do such inferencing. I doubt that anybody will just turn their application/reasoner loose and let it make inferences about every owl:sameAs triple it finds - that would be too dangerous.

My only experience is with the Pellet reasoner running in Jena. Pellet must be configured with which graphs have inference rules that it is to use, and which graphs it is to reason over. That is - it's not the case that it simply adds every rule that it happens to browse into its ruleset. You're quite right: doing that would mean that you will wind up at some point with an inconsistency, and the whole thing stops working.

As for how common it is, I doubt it's common. I attempted it with the biodiversity.org.au data: my goal was to get the reasoner to infer that 'Echidna' was a 'Mammal' by way of transitive 'is part of taxon' rules. As I recall: it worked, but it ran too slowly to be useful.

It occurs to me that the big problem with reasoning rules is this issue that the moment there's any sort of inconsistency, the whole lot grinds to a halt. This is because to a reasoner, the graph you are working on is a single big graph of triples floating in triple space. With provenance, however, it perhaps becomes possible to say "this triple is the result of inference over graphs X, Y and Z". Rather than asking "is it true that :theSky :hasColour :blue", you'd ask "which combinations of graphs assert that the sky is blue?", which then allows you to apply some sort of confidence metric.

>
> It seems to me that the safest thing from the standpoint of recommending a best practice for our community would be to either tell people they should use one of the properties or the other (and hope that people pay attention) or recommend that they provide both (probably the safest course of action). But it's annoying - why did CC find it necessary to mint cc:license when xhtml:license was already there. I suppose there was some reluctance to declare range and domain properties for a term that wasn't defined in their own namespace. But still, it presents us with this kind of problem.

I think that part of the activity of the group should involve defining not only vocabularies, but sets of inference rules to be used for TDWG-compliant data. Perhaps the vocabularies should come in "minimal" (just the terms), "scoped" (terms and domain/range), and "full" (terms and full OWL-DL rules - including "is-different-from" rules for enumerations etc). A annotation property named "TDWGCompliance" or "inferenceRuleset" might be attached to Ontology objects to indicate how the author intends for them to be used.

Steve Baskauf

unread,

Aug 24, 2013, 6:26:03 PM8/24/13

to tdwg...@googlegroups.com

I have created a document
http://code.google.com/p/tdwg-rdf/wiki/LicenseProperties
in which I have attempted to encapsulate some of the information that was covered in this thread about cc:/xhtml:license. I have also modified the Darwin Core RDF Guide to change footnote 3 from a description of terms that can be used for expressing a license to a reference to this document.

Any feedback on the document is welcome.
Steve

Reply all

Reply to author

Forward