RDF Guide revisions

9 views
Skip to first unread message

Steve Baskauf

unread,
Nov 7, 2014, 7:31:31 AM11/7/14
to tdwg...@googlegroups.com, James Macklin, csp...@gmail.com
Now that the Executive has approved the new definitions of Darwin Core
classes, I have received the green light to revise the DwC RDF guide in
the light of those changes and then subject it to a 30 day public
comment period. Since the proposal has already been recommended by this
group (the RDF TG), I do not plan to make major substantive changes to
the document. However, given that the document has vegetated for about
a year and a half, there are several aspects of it that I have realized
probably should be changed. I have made a laundry list of change items,
which you can view at:
https://docs.google.com/document/d/153h1_kyfMllKGgfehVLB7ubzRDDaWtnaGHDDwJ6nkwE/edit?usp=sharing
However, I don't really expect people to look at that list carefully.
Rather, I'm going to send several emails about the general kinds of
changes I am planning to make. If you have a problem with those
changes, please respond to the email on this list. If I don't get
comments, I will assume that no one objects to me making that category
of changes.

I will try to create two documents on the RDF TG wiki: a "clean" version
for the public comment and a version using highlighting and
strikethrough text so that you can review the changes I made. When they
are operational, I'll send links.

Since the Executive is actually paying attention to this now, I am going
to make this happen on the timescale of days rather than weeks. So if
you care about this issue, please try to devote enough time to read the
emails. I will try to keep them brief

Steve

--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
PMB 351634
Nashville, TN 37235-1634, U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 322-4942
If you fax, please phone or email so that I will know to look for it.
http://bioimages.vanderbilt.edu
http://vanderbilt.edu/trees


Steve Baskauf

unread,
Nov 7, 2014, 9:46:32 AM11/7/14
to tdwg...@googlegroups.com
1. Since the RDF Guide was originally recommended by the TG, the RDF 1.1
specification became a recommendation (on 2014-02-25). So there is a
new suite of documents, such as http://www.w3.org/TR/rdf11-concepts/
that supersedes the original spec. Although there don't seem to be
radical changes, there are several things that potentially impact the
guide. One is that the term IRI (Internationalized Resource
Identifiers) is now used routinely instead of URI (URIs are a subset of
IRIs). I think that everything that we say about URIs in the guide also
applies to IRIs, so I plan to replace "URI" with "IRI" throughout the
document. This may introduce some confusion, given that people are
already confused about URIs and URLs. But I consider the RDF Guide to
be a technical document for those who wish to expose RDF, so if anybody
is going down that road and is confused about URIs, URLs, and IRIs, they
need to read up anyway. The general public is not going to be using
this document.

A major implication of this is also that I would change the namespace
and its abbreviation from "http://rs.tdwg.org/dwc/uri/" (dwcuri:) to
"http://rs.tdwg.org/dwc/iri/" (dwciri:). If this had already been
implemented, I would leave it, but if the W3C is going to use IRI in all
of its new specs, we might as well make this change from the start.

Steve

Steve Baskauf

unread,
Nov 7, 2014, 12:28:16 PM11/7/14
to tdwg...@googlegroups.com
In the RDF 1.1 semantics document
http://www.w3.org/TR/2014/REC-rdf11-mt-20140225/#literals-and-datatypes
it is noted that datatype D-entailment was formerly a semantic extension
of RDFS-entailment. However in the 1.1 specification, it is now a
direct extension to basic RDF. This has led me to rethink some of the
wording in the DwC RDF Guide, which in the original version implied that
including datatypes with literals was more optional than it probably
should be. I plan to re-write several sentences to indicate that
datatypes should be used when they are appropriate. Similarly, the 1.1
document makes it clear that providing a language tag to a string
implies that the literal has the datatype rdf:langString. So language
tags should be provided whenever possible for literals that represent
strings that have meaning in a particular language. The tables at the
end of the Guide divide terms that are expected to have literal values
into groups based on whether their literal values should have datatypes,
language tags, or be plain literals. The comments associated with
these tables should indicate more strongly that including these
datatypes/tags is strongly recommended.

Steve

Steve Baskauf

unread,
Nov 7, 2014, 12:41:49 PM11/7/14
to tdwg...@googlegroups.com
There are a number of rather straightforward changes that are required
by the deprecation of the class terms in the dwctype: namespace and the
creation of new class terms in the dwc: namespace.

One issue related to this is the set of tables at the end of the
document that categorize every DwC term by the way they are to be used
in RDF. I have mixed feelings about these tables. Our original
intention was for this document to be a real "how-to" document for
implementers and in that spirit, including every term in a table that
explains how it should be used is probably a good thing. However, as a
practical matter, this would mean that the RDF Guide would have to be
modified every time a term is added to DwC. None of the other guides
(Text, XML) require this kind of change when individual terms change.
So I'm wondering if the set of tables should be in a separate document
from the guide itself so that the guide itself would change little over
time, whereas the tables would change regularly in the same way the
"Quick Reference Guide" (http://rs.tdwg.org/dwc/terms/ ) is changed with
each term addition. To some extent, this is question of best practices
for managing Type 2 documents and there probably isn't just one "right"
answer to it. But I'm interested to know if there are strong opinions
about it.

Steve

John Wieczorek

unread,
Nov 7, 2014, 12:52:14 PM11/7/14
to tdwg...@googlegroups.com, James Macklin, Cynthia Parr
Hi Steve,

Could you point people to a copy of the RDF Guide for reference here and in the linked laundry list?

In the document you posed the question, "Should Table 1 and the rest of the document conform to DCMI’s new practice of using dc: to refer to the “terms” namespace rather than the legacy namespace?" I think that as long as the namespace alias is defined, and you make it clear if you need to distinguish between what were dc: and dcterms:, it would be fine - recommended even. :-)

You also posed the question, "Remove references to TDWG Ontology??? e.g. 2.4.1.2". I support this idea.

Looks like this one will require some explanation and discussion, "Change term name from dwcuri:toTaxonConcept to dcwiri:toTaxon (?)."

You called out, "change “from the Darwin Core Type vocabulary” to “from the list of recommended controlled values” (which is…?). The basisOfRecord definition must change to accomodate the deprecation of the Type Vocabulary. Since the Type Vocabulary became classes in the dwc namespace instead, the new definition should be just the first sentence of the old definition "The specific nature of the data record." The comment can then be, 'Recommended best practice is to use a controlled vocabulary such as the list of Darwin Core classes. Examples: "LivingSpecimen", "PreservedSpecimen", "FossilSpecimen", "HumanObservation", "MachineObservation".'You mentioned, "This isn’t worked out, but if there will be tables at the end such as now exist, there needs to be an understanding that they will be modified each time term changes happen that affect the ones listed there.  For this reason, should the tables be in a separate document?  Will there be a page equivalent to the Quick Reference guide http://rs.tdwg.org/dwc/terms/index.htm or are these tables it?" I think I prefer everything in a single document, but paid out for easy updating. it will inded have to be updated with every change.

Related to, "Make sure that new Organism-related terms are represented in the table and that deprecated terms are removed." it is also worth checking for conisistency for any term that has been changed in this round.

That's it for my comments on the laundry list.Thanks for pushing this forward.

Cheers,

John




--
You received this message because you are subscribed to the Google Groups "TDWG RDF/OWL Task Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tdwg-rdf+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Steve Baskauf

unread,
Nov 7, 2014, 1:00:26 PM11/7/14
to tdwg...@googlegroups.com
In the original RDF guide, dwc:Taxon was singled out for special abuse
because its definition was considered to be hopelessly ambiguous. With
the adoption of the class revisions proposal, the definition of
dwc:Taxon is probably about as clear as the definitions of all of the
other DwC classes. All of the classes now have human-readable
definitions that are (in my opinion) reasonably understandable by a
human. All are equally unclear to machines since they have no real
machine-interpretable semantics. So I plan to remove most of the
negative language in the guide related to the dwc:Taxon class and let
its use be dealt with in a similar manner to other classes that exist or
will be added in the future.

The one area that I'm still pondering is section 2.7.4
https://code.google.com/p/tdwg-rdf/wiki/DwcRdfGuideProposal#2.7.4_Description_of_a_taxonomic_entity
That section is written under the assumption that dwc:Taxon would not be
fixed and that some vaporware TCS 2.0 would ride in like a white knight
to save the day by creating a class for taxon concepts, TNUs, or some
similar entity that we could use instead. Hence the term that was
designed to link to some externally defined taxon concept thing was
called "dwcuri:toTaxonConcept". Given that we now have a better
definition for dwc:Taxon and given that there is no motion whatsoever
that I can detect towards work on TSC 2.0, I am inclined to change the
name of this term to "dwciri:toTaxon" and assume that people will mint
object properties to link dwc:Taxon instances to other related things
such as name entities, references, nomen, protonyms, or whatever just
like they are going to have to mint object properties to connect many of
the other dwc: classes to other kinds of resources.

In making these changes I also intend to get rid of any reference to the
TDWG Ontologies (e.g. the TaxonConcept Ontology).
Steve

Steve Baskauf

unread,
Nov 7, 2014, 1:10:06 PM11/7/14
to tdwg...@googlegroups.com
I think this is the last category of changes if you are getting fatigued.

In the interest of having a comprehensive guide, we included
recommendations for standard IRIs to serve as objects for particular
predicates. For example, we recommend the MARC ISO 639-2 language IRIs
(http://id.loc.gov/vocabulary/iso639-2.html ) to be used as values for
dcterms:language . However, in the year and a half since the guide was
written, the document describing the International Commission on
Stratigraphy's URIs for geological time periods
(http://resource.geosciml.org/vocabulary/timescale/isc-2012.rdf ) has
already disappeared and is producing a "Not Found" error. Similarly,
VIAF IRIs for people were recommended (still a good idea, especially for
dead people), but ORCID IDs are now widely used (but not mentioned in
the document).

This is causing me to think that it would be better to link to some
other Type 3 (ancillary document not part of the standard) that would
contain the recommendations for controlled value IRIs. This document
could be changed without invoking a change to the Darwin Core standard
itself (such as would be required to change the RDF Guide, a Type 2
document that is a non-normative part of the standard).

Steve

Steve Baskauf

unread,
Nov 7, 2014, 1:13:52 PM11/7/14
to tdwg...@googlegroups.com, James Macklin, Cynthia Parr
Thanks for the comments, John.  Sorry, I should have provided a link for the proposed guide.  The cover page is at:
https://code.google.com/p/tdwg-rdf/wiki/DwcRdf

and the actual proposed guide as recommended by the TG is at:
https://code.google.com/p/tdwg-rdf/wiki/DwcRdfGuideProposal

I'll make one response in a separate email.
Steve

John Wieczorek wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to tdwg-rdf+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "TDWG RDF/OWL Task Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tdwg-rdf+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Steve Baskauf

unread,
Nov 7, 2014, 1:38:45 PM11/7/14
to tdwg...@googlegroups.com, James Macklin, Cynthia Parr
John Wieczorek wrote:


You called out, "change “from the Darwin Core Type vocabulary” to “from the list of recommended controlled values” (which is…?). The basisOfRecord definition must change to accomodate the deprecation of the Type Vocabulary. Since the Type Vocabulary became classes in the dwc namespace instead, the new definition should be just the first sentence of the old definition "The specific nature of the data record." The comment can then be, 'Recommended best practice is to use a controlled vocabulary such as the list of Darwin Core classes. Examples: "LivingSpecimen", "PreservedSpecimen", "FossilSpecimen", "HumanObservation", "MachineObservation".
I think the change to the comment in the changed definition of dwc:basisOfRecord must be made carefully.  For non-RDF users, the comment as you have it probably makes perfect sense.  However, for RDF users, the comment blurs the distinction that we are trying to make in the guide between literal values (strings) and IRIs.  The ambiguity is in "list of Darwin Core classes".  The IRI for a class like living specimen is now:

http://rs.tdwg.org/dwc/terms/LivingSpecimen   a.k.a. dwc:LivingSpecimen

However, the corresponding literal that should be used as a value for dwc:basisOfRecord should be "LivingSpecimen".  Spreadsheet users won't care, but an RDF implementer should know that

<http://bioimages.vanderbilt.edu/vanderbilt/7-314> dwc:basisOfRecord "LivingSpecimen".

is correct, but not

<http://bioimages.vanderbilt.edu/vanderbilt/7-314> dwc:basisOfRecord "dwc:LivingSpecimen".

or

<http://bioimages.vanderbilt.edu/vanderbilt/7-314> dwc:basisOfRecord "http://rs.tdwg.org/dwc/terms/LivingSpecimen".

I suppose the precise way of stating it would be to say something like 'Recommended best practice is to use a controlled vocabulary such as the local name component of Darwin Core class IRIs. Examples: "LivingSpecimen", "PreservedSpecimen", "FossilSpecimen", "HumanObservation", "MachineObservation" '

I don't know whether this more complicated terminology in the definition would just confuse non-RDF users.  That's why I thought it might be preferable to just have a list of controlled vocabulary strings somewhere, rather than referring to the DwC classes themselves.

Steve

On Fri, Nov 7, 2014 at 1:31 PM, Steve Baskauf <steve....@vanderbilt.edu> wrote:
Now that the Executive has approved the new definitions of Darwin Core classes, I have received the green light to revise the DwC RDF guide in the light of those changes and then subject it to a 30 day public comment period.  Since the proposal has already been recommended by this group (the RDF TG), I do not plan to make major substantive changes to the document.  However, given that the document has vegetated for about a year and a half, there are several aspects of it that I have realized probably should be changed.  I have made a laundry list of change items, which you can view at:
https://docs.google.com/document/d/153h1_kyfMllKGgfehVLB7ubzRDDaWtnaGHDDwJ6nkwE/edit?usp=sharing
However, I don't really expect people to look at that list carefully.  Rather, I'm going to send several emails about the general kinds of changes I am planning to make.  If you have a problem with those changes, please respond to the email on this list.  If I don't get comments, I will assume that no one objects to me making that category of changes.

I will try to create two documents on the RDF TG wiki: a "clean" version for the public comment and a version using highlighting and strikethrough text so that you can review the changes I made.  When they are operational, I'll send links.

Since the Executive is actually paying attention to this now, I am going to make this happen on the timescale of days rather than weeks.  So if you care about this issue, please try to devote enough time to read the emails.  I will try to keep them brief

Steve

--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
PMB 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 322-4942
If you fax, please phone or email so that I will know to look for it.
http://bioimages.vanderbilt.edu
http://vanderbilt.edu/trees



--
You received this message because you are subscribed to the Google Groups "TDWG RDF/OWL Task Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tdwg-rdf+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "TDWG RDF/OWL Task Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tdwg-rdf+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Paul J. Morris

unread,
Nov 7, 2014, 2:00:30 PM11/7/14
to tdwg...@googlegroups.com
On Fri, 7 Nov 2014 12:09:40 -0600
Steve Baskauf <steve....@vanderbilt.edu> wrote:
> However, in the year and a half since the guide was
> written, the document describing the International Commission on
> Stratigraphy's URIs for geological time periods
> (http://resource.geosciml.org/vocabulary/timescale/isc-2012.rdf ) has
> already disappeared and is producing a "Not Found" error. Similarly,
> VIAF IRIs for people were recommended (still a good idea, especially
> for dead people), but ORCID IDs are now widely used (but not
> mentioned in the document).

Can we tell if this is a change to that file name, or a typo in the RDF guide raft?

That document is available (with no - in the filename) at:

http://resource.geosciml.org/vocabulary/timescale/isc2012.rdf

Broader point that external resources may change rapidly is made by the presence of two more recent timescales:

http://resource.geosciml.org/vocabulary/timescale/isc2013.rdf
http://resource.geosciml.org/vocabulary/timescale/isc2014.rdf

(Though Anthropocene hasn't made it into them yet).

I think I'd like to see assertions about best practice vocabularies to use in the normative document, with a pointer to a more rapidly changable document that contains latest current locations. Something like the current text in 3.6:

"Recommended best practice is to use URIs defined by the International Commission on Stratigraphy (http://www.stratigraphy.org/ ) in the http://resource.geosciml.org/vocabulary/timescale/isc-2012.rdf ontology."

Changing to:

"Recommended best practice is to use URIs defined by the International Commission on Stratigraphy (http://www.stratigraphy.org/ ) see {informative ancilary document}."

https://code.google.com/p/tdwg-rdf/wiki/DwcRdfGuideProposal#3.6_dwcuri:_terms_having_local_names_that_don%E2%80%99t_correspond_to

And retaining examples such as the the dwcuri:latestGeochronologicalEra reference in Example 27

https://code.google.com/p/tdwg-rdf/wiki/DwcRdfGuideProposal#2.7.6_Chronostratographic_%28geological_timescale%29_descriptors

-Paul
--
Paul J. Morris
Biodiversity Informatics Manager
Harvard University Herbaria/Museum of Comparative Zoölogy
mo...@morris.net AA3SD PGP public key available

Steve Baskauf

unread,
Nov 7, 2014, 2:05:44 PM11/7/14
to tdwg...@googlegroups.com, James Macklin, Cynthia Parr
If you don't like the multiple email format for discussing the categories of changes, I've pasted them into the revisions document and you can make comments directly on that document if you would prefer.

https://docs.google.com/document/d/153h1_kyfMllKGgfehVLB7ubzRDDaWtnaGHDDwJ6nkwE/edit?usp=sharing

Steve

Steve Baskauf

unread,
Nov 8, 2014, 9:19:53 AM11/8/14
to tdwg...@googlegroups.com
It's possible that it was just a typo, although I thought that I clicked on all of the links to make sure that they worked.  Maybe not.

I like your suggestion:

"Recommended best practice is to use URIs defined by the International Commission on Stratigraphy (http://www.stratigraphy.org/ ) see {informative ancilary document}."
and would follow it unless there is objection.
Steve
-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
PMB 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 322-4942

Steve Baskauf

unread,
Nov 13, 2014, 2:07:52 PM11/13/14
to tdwg...@googlegroups.com
I have been struggling with re-writing section 2.4.1.1 of the RDF Guide:
https://code.google.com/p/tdwg-rdf/wiki/DwcRdfGuideProposalChanges#2.4.1.1_Typed_literals
The version I've linked to above has the changes indicated with
strikeout text for what I've deleted and bold text for what I've
inserted. In particular, the paragraph that begins "In the RDF 1.1
specification, ..." has been a slog because it required me to try to
work through the Literals and Datatypes section of the new RDF 1.1 spec:
http://www.w3.org/TR/rdf11-mt/#literals-and-datatypes
I think that what I have written is correct, but it would be valuable if
someone who is more knowledgeable about the intricacies of RDF could
give me some feedback. The point I'm trying to make is that literals
that are not explicitly typed and lack language tags will be interpreted
by a client as strings and not as some other abstract or non-information
resource (i.e. numbers, physical things) that the provider might intend.

Steve

Steve Baskauf

unread,
Nov 14, 2014, 12:21:25 AM11/14/14
to tdwg...@googlegroups.com, James Macklin, csp...@gmail.com, John Wieczorek
I have completed the first pass of revisions on the RDF Guide and have
checked off all of the items on my "laundry list" in the document
https://docs.google.com/document/d/153h1_kyfMllKGgfehVLB7ubzRDDaWtnaGHDDwJ6nkwE/edit?usp=sharing
In my edits, I incorporated the comments and suggestions that I've
gotten up to this point. I'm going to give it a rest for at least a day
and then do a very severe proofreading this weekend. I'm assuming
(based on the comments I've gotten) that at least several people have
looked at the list of changes and possibly also the documents
themselves. If you still plan to look at the documents but haven't
already, please do so ASAP and either email me comments or make them
somewhere on the Google Doc. Because the document was already
recommended by the RDF Task Group in its earlier form in July 2013, I
don't think we need to go through another round of consensus-checking at
the TG level as long as anybody who cares takes a careful look at the
list of changes I've made and expresses any concerns that they have.

If you want to look at a version of the Guide that's marked up with
strikethrough for deletions and bold for additions, go to:
https://code.google.com/p/tdwg-rdf/wiki/DwcRdfGuideProposalChanges
If you want to proofread a clean copy, go to:
https://code.google.com/p/tdwg-rdf/wiki/DwcRdfGuideProposalRevised

Unless there is something really bad that I've missed, I hope that we
can initiate 30 public comment as early as sometime next week. We still
have that time interval with many possible additional sets of eyes to
spot problems we've missed.

Steve
Reply all
Reply to author
Forward
0 new messages