Because the 4sr reasoner claimed to reason on subProperty relationships,
I tried investigating whether there were any inferred triples generated
based on the declaration that dcterms:creator was a subclass of
dc:creator. However, nothing happened. So either 4sr doesn't really
make use of subProperty relationships or I don't know what I'm doing
(always a distinct possibility!). I tried looking at the instructions,
but they apparently haven't been written yet. :-(
If anybody has access to a different SPARQL endpoint that does
inferencing, they can try repeating the experiments or doing others.
The file is at
. For some reason, I can never get my files to load into triplestore 3
(maybe because the server does not correctly identify them as
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN 37235-1634, U.S.A.
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
Although I didn't look extensively, I found nothing in dcterms that
would prevent something from being both an object property and a
datatype property. More particularly, I find nothing in either
dcterms or rdfs that requires that the class dcterms:Agent and the
class rdfs:Litteral cannot intersect. Assuming that is correct, you
need a reasoner that finds a contradiction in something more
restrictive than the formal semantics of rdfs.
You will need something that asserts an axiom that either asserts or
infers such disjointness. Some species of OWL will have
that--principally OWL DL and some lower ones, but from
http://www.w3.org/TR/owl-ref/ Sec 4:
"NOTE: In OWL Full, object properties and datatype properties are not
disjoint. Because data values can be treated as individuals, datatype
properties are effectively subclasses of object properties. In OWL
Full owl:ObjectProperty is equivalent to rdf:Property In practice,
this mainly has consequences for the use of
owl:InverseFunctionalProperty. See also the OWL Full characterization
in Sec. 8.1."
But also, http://dublincore.org/documents/dcmi-terms doesn't declare
itself an owl ontology in the first place, so one would not expect to
do OWL reasoning.
I am pretty sure that to reason entirely within rdfs and the things
within dwc and those rdf vocabularies dwc uses, you will have to
promulgate a "best practice" in the form of the addition of an axiom
that asserts this disjunction (or accept the not-formally
contradictory situation that you found). Furthermore, there may be
other such conundrums besides this particular disjunction. That is
why there are more OWL species in OWL2 than in OWL1.... the different
OWL 2 "profiles" make different compromises in support of different
For things that \are/ declared as OWL ontologies, the diagnoses of
http://owl.cs.manchester.ac.uk/validator/ are often fairly
informative about what goes with something that is declared as an OWL
It's disarmingly simple to fall into OWL FULL when designing an
ontology. http://purl.org/dsw/ and
http://lod.taxonconcept.org/ontology/txn.owl both do, and probably
neither can enforce a distinction between object properties and
Robert A. Morris
Emeritus Professor of Computer Science
100 Morrissey Blvd
Boston, MA 02125-3390
Filtered Push Project
Harvard University Herbaria
The content of this communication is made entirely on my
own behalf and in no way should be deemed to express
official positions of The University of Massachusetts at Boston or
For things that \are/ declared as OWL ontologies, the diagnoses of
http://owl.cs.manchester.ac.uk/validator/ are often fairly
informative about what goes with something that is declared as an OWL
But it should say
For things that \are/ declared as OWL ontologies, ... informative
about what goes wrong with...
> from http://www.w3.org/TR/owl-ref/ Sec 4:
> "NOTE: In OWL Full, object properties and datatype properties are not
> disjoint. Because data values can be treated as individuals, datatype
> properties are effectively subclasses of object properties. In OWL
> Full owl:ObjectProperty is equivalent to rdf:Property In practice,
> this mainly has consequences for the use of owl:InverseFunctionalProperty. See also the OWL Full characterization in Sec. 8.1."
Thanks for pointing that out, Bob, and indeed I was suspecting this to be the case. Never hurts to read and remind oneself of the spec.
: Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org :
... I believe that there is a reason why Darwin Core says things like "A list (concatenated and separated) of names.." in the definition of dwc:recordedBy. It sets the expectation that users of dwc:recordedBy will provide a string value for that term. If I write code which substitutes the value of dwc:recordedBy for x in the following:
document.getElementById('copy').innerHTML='This occurrence was recorded by '+x+'.';
I want to know that I will get something that looks like:
This occurrence was recorded by Steve Baskauf.
If a data provider goes against the definition given in the DwC standard and uses a URI with dwc:recordedBy because they can "get away with it", then I'm going to get things like:
This occurrence was recorded by http://bioimages.vanderbilt.edu/contact/baskauf.
Is that "bad"? I would say "yes"!
That depends on how you're looking at it. For one, it is bad because your code is bad - simply assuming that you don't have to dereference the URI to get a label is brittle coding.
It's also bad because it is indeed counter to the expectation given in the DwC textual definition of the property. It would, however, be quite good if I were a Linked Data client trying to aggregate things for the person named "Steve Baskauf".
BTW according to the DwC documentation dwc:recordedBy refines dwc:accordingTo, which doesn't seem to exist (anymore?). At least I can't find it in the documentation. dwc:recordedBy also doesn't define a range, so as far as a machine is concerned, they really need to be prepared to find anything as the object. (Machines can't interpret definitions written for humans.)
I forgot to comment on this for an earlier post in this thread. I think the mindset that we need properties to carry all the semantics about "what to expect", or what kind of thing the property value denotes, is a remnant from our relational modeling days. There is very little place for that in an RDF world. Objects can, and should speak for themselves - if we use dereferenceable URIs wherever possible, and if we wrote software clients that don't make unwarranted assumptions, then we don't need a gazillion different properties just to tell us certain nuances about the to be expected property value.
RDF is really different from relational modeling. In a relational database, the combination of table and column, and the definition of column type, tell us mostly what we need to know about dealing with a column's value, and so we obsess about those things. We need to fully let go of this paradigm in an RDF world, or we are not gaining its benefits and might as well continue doing relational data and XML.
On Mar 8, 2012, at 10:39 AM, Steve Baskauf wrote:
Well, the web as we know it took off because browser software was written with the ability to figure out any kind of slop that people create and call HTML.
I think people who can write software can also get themselves to write clever software. But it's a bigger challenge to get people who don't understand metadata let alone RDF to write metadata fully compliant with specifications that can't be validated because the specifications are a collection of conventions rather than database-enforced integrity constraints.
Data sharing is a messy business.
On Mar 8, 2012, at 12:03 PM, Steve Baskauf wrote:
We should be careful in this group to not confuse technologies. Publishing RDF by itself isn't the semantic web, nor is publishing Linked Data. It's perhaps a great (because low-barrier) step towards it, but a semantic web also needs ontologies that support reasoning to the extent of supporting agent-based decision making and knowledge discovery. Building ontologies (in OWL, for example) that allow rich and meaningful reasoning and are commonly reused is a *hard* problem.
And Linked Data *has* taken off, BTW.
So I think there's lots of room in pointing out which practices have which advantages or downsides, or consequences in ways that people may not be aware of. In this context, I'd strongly recommend to read the draft W3C Interest Group note on "Mapping and linking life science data using RDF" that I posted earlier:
Quoting from the abstract: "This W3C Note summarizes emerging practices for creating and publishing healthcare and life sciences data as Linked Data in such a way that they are discoverable and useable by users, Semantic Web agents, and applications." Isn't that much of what we want to accomplish here for biodiversity data?
Some maybe good news:
The W3 RDB2RDF Working Group http://www.w3.org/2001/sw/rdb2rdf/ two
weeks ago released a Candidate Recommendation R2RML: RDB to RDF
Mapping Language, http://www.w3.org/TR/r2rml/
It, or at least the documents of the Working Group that led to it,
might well inform some of the ways tdwg-rdf should consider dealing
with "bad" data, at least as emitted from legacy RDBs. Perhaps
especially http://www.w3.org/TR/2010/WD-rdb2rdf-ucr-20100608/#uc "Use
Cases" has sections worth reading. FWIW, the W3C LODD IG Note
"Mapping and linking life science data using RDF",
that Hilmar mentioned cites a 3-year old paper of the rdb2rdf group.
Probably it is appropriate to verify that the Candidate Recommendation
and the LODD Note remain consistent, at least as far as tdwg-rdf
concerns may emerge.
Some OK news:
"Bad" RDF is probably perfectly useful for
-- discovery applications such as provided by Linked Open Data protocols.
-- purely syntactic data integration (i.e. RDF graph aggregation)
-- largely human-centric applications requiring no machine reasoning
Some further opinions interspersed below.
On Thu, Mar 8, 2012 at 12:03 PM, Steve Baskauf
> I see your point about HTML. But I'm still left with several questions. If
> it is so easy to write clever software to interpret RDF slop, then why
> hasn't the amazing "Semantic Web" come into fruition by now after more than
> a decade of talking about it?
In part because the emphasis has been on the "Web" at least as much as
on the "Semantic". By contrast, the biomedical informatics community
and the military information retrieval community have made vastly more
progress by focusing on knowledge representation more than data
discovery and dissemination. In fairness, both these communities have
always been substantially better funded than almost any other group
interested in Semantic <X>, for any value of <X>.
> Also, what is the purpose of making the
> distinction in OWL between datatype properties and object properties if it
> doesn't really matter whether metadata providers use strings or URIs with
> particular properties?
Sometimes it <does> matter, depending on what \other/ restrictions on
properties that providers wish on their knowledge representation.
has an example. But in OWL, or many other modeling languages, there
often is not a unique way to "preserve" any particular single kind of
restriction . This point is made very well in the brief
> I realize it is difficult in an email to convey
> one's tone, but I'm not trying to be cheeky here. I actually don't know or
> understand the answers to these questions in the light of what Hilmar has
> just said. If it doesn't really matter how people write their RDF because
> the software will just fix it all, then I'm wondering whether we actually
> need this group or not. We could just tell people to do whatever they want
> rather than trying to define best-practices.
Cheer up. The software will "just fix it all" only in the simplest of
cases, useful though they are. There's still plenty of work for
tdwg-rdf to do. It's just not clear what that is yet. :-)
In the examples in section 2 ( http://www.w3.org/TR/r2rml/#overview ),
the IRIs (i.e. broader term for URIs) are generated using the primary
keys of the database. See "triples map" rule 1 in that section. I was
under the impression that this was not considered a good practice, at
least when the intent is that the identifiers be persistent. I looked
through the TDWG GUID Applicability Statement standard and actually it
doesn't mention the issue of using primary database keys in GUIDs, but
the LSID Applicability Statement (standard at:
http://www.tdwg.org/standards/150/ , pdf viewable in browser at
http://bioimages.vanderbilt.edu/pages/LSID%20AS_2011_01_final.pdf ) does
talk about it in Recommendation 11: "LSID Authorities should not use the
primary key of relational database tables as object identifications.
Providers should create an extra column in the table (or a separate
table) to manage the LSID independently of the primary key." The
rationale is that "LSID Authorities should not use the primary key of
relational database tables as object identifications. Providers should
create an extra column in the table (or a separate table) to manage the
LSID independently of the primary key." Although this advice is given
specifically in the context of LSIDs, I think that the general principle
holds for any identifier that is intended to be stable and persistent.
I suppose perhaps that the intention of the creators of the
Recommendation was to facilitate "quick and dirty" conversion of
metadata from relational databases into RDF triples. But it seems to me
to be a really bad idea to suggest this as a general practice. The
system that is described in the Recommendation facilitates the mapping
of database column headings to well-known predicates, so I don't see why
they can't give examples where the IRIs are created from a mapping of a
column which contains a stable local identifier which is not the primary
I may just be misunderstanding what they are talking about. If so, I
would welcome clarification from somebody who knows more about it.
Although I didn't look extensively, I found nothing in dcterms that would prevent something from being both an object property and a datatype property. More particularly, I find nothing in either dcterms or rdfs that requires that the class dcterms:Agent and the class rdfs:Litteral cannot intersect.
I think that DCMI has already promulgated a best practice for us. It is expressed in human-readable form as opposed to machine-readable semantics, but it is a best practice nonetheless. It is intended to help get rid of the confusion between a person and that person's name, "a pain in the butt for implementors"(1)... I am pretty sure that to reason entirely within rdfs and the things within dwc and those rdf vocabularies dwc uses, you will have to promulgate a "best practice" in the form of the addition of an axiom that asserts this disjunction (or accept the not-formally contradictory situation that you found). ...
On 3/7/2012 10:37 PM, Bob Morris wrote:
> It's disarmingly simple to fall into OWL FULL when designing an
> ontology. http://purl.org/dsw/ and
> http://lod.taxonconcept.org/ontology/txn.owl both do, and probably
> neither can enforce a distinction between object properties and
> datatype properties.
Good news. Now get rid of the un-necessary Functional and inverseFunctional requirements and I'll be an almost perfectly happy camper when my applications are trying to define mappings into relational databases that happen to be emitting some form of DwC, perhaps DSW.
A good candidate for a best practices topic would be something like "what are the tdwg community use cases requiring InverseFunctional properties?". (My guess is none). More generally, one could consider if there are use cases for the other three presently defined OWL2 profiles. My guess is maybe).
The bottom line was that the assembled group felt that the existing DwC
terms (e.g. dwc:recordedBy) should be used in accordance with the
current term definition (i.e. not repeated, with a single string literal
value which is a concatenated list) and that it would be best to have a
new term which would be repeatable and designated specifically for use
with a URI (as opposed to literal) object. There were two suggestions
for how to accomplish this:
1. Create a new term in the current DwC namespace
(http://rs.tdwg.org/dwc/terms/) which is a modification of the current
term, e.g. dwc:recordedByURI .
2. Create a term in a new namespace which is understood to contain terms
that are intended for use with URI objects, e.g. dwcuri:recordedBy .
If option 1 were chosen, it would require making the changes through the
existing DwC namespace policy
(http://rs.tdwg.org/dwc/terms/namespace/index.htm) which could be a
lengthy process before the terms were available for use. However, it
would avoid a proliferation of namespaces.
If option 2 were chosen, it would not necessarily require a change to
Darwin Core itself (although maybe it would if the namespace were under
http://rs.tdwg.org/dwc/ e.g. http://rs.tdwg.org/dwc/dwcuri/ ). Using
the Darwin-sw namespace (http://purl.org/dsw/ ) was suggested as a
possibility. (I did not take a position on that suggestion since I've
somewhat recused myself from promotion of DSW in this context. Cam may
want to respond to that suggestion.) An advantage of using a different
namespace is that we could avoid confusing people who are not interested
in RDF by steering them away from the documentation about the use of the
terms in the new namespace (vs. having terms added to the existing quick
reference guide for the regular DwC namespace).
I would appreciate feedback about these options or the issue in
general. You can make them as replies to this email or as comments
under Issue 9 in the Issue Tracker
A couple of things:
i. There's no way to prevent terms from being repeated on the web. This
maybe isn't clear with dwc:recordedBy, but consider dwc:associatedMedia.
Many people may contribute, for example, pictures of a specimen. Each
contributor will create a triple of the form:
_:foo dwc:associatedMedia X
So whether the Xs are literals or URIs, there will be repeated elements.
ii. Of the 4 main DwC representations that we talk about - spreadheets,
rdf, xml, and rdbms - only spreadsheets do not easily permit repeated
elements. Of course, spreadsheets are the most common of the 4, so I
understand why the standard must accommodate them. The current definition
"A list (concatenated and separated) of identifiers (publication, global
unique identifier, URI) of media associated with the Occurrence."
Lists are allowed to have a single element, so, as I see it, the current
definition should suffice. But maybe there are cases I'm overlooking.
Basically, I'm wondering: what were the arguments in favor of introducing
On Wed, 4 Apr 2012, Steve Baskauf wrote:
i. There's no way to prevent terms from being repeated on the web. This maybe isn't clear with dwc:recordedBy, but consider dwc:associatedMedia. Many people may contribute, for example, pictures of a specimen. Each contributor will create a triple of the form: _:foo dwc:associatedMedia X So whether the Xs are literals or URIs, there will be repeated elements.
ii. Of the 4 main DwC representations that we talk about - spreadheets, rdf, xml, and rdbms - only spreadsheets do not easily permit repeated elements. Of course, spreadsheets are the most common of the 4, so I understand why the standard must accommodate them. The current definition is "A list (concatenated and separated) of identifiers (publication, global unique identifier, URI) of media associated with the Occurrence." Lists are allowed to have a single element, so, as I see it, the current definition should suffice. But maybe there are cases I'm overlooking. Basically, I'm wondering: what were the arguments in favor of introducing new terms?
|Subject:||RE: [tdwg-rdf: 44] Issue 9 Repeatable properties in lieu of properties that specify concatenated lists (in DwC)|
|Date:||Thu, 19 Apr 2012 10:39:12 -0500|
|From:||Kennedy, Jessie <J.Ke...@napier.ac.uk>|
|To:||Baskauf, Steven James <steve....@Vanderbilt.Edu>, tdwg...@googlegroups.com <tdwg...@googlegroups.com>|
|CC:||roge...@mac.com <roge...@mac.com>, John Wieczorek <tu...@berkeley.edu>|
Regarding your comment on the intent of TCS – you are correct – TCS was meant as a definition where everything must be complete but rather to allow people to record what they did know for legacy data and if they had good new data to capture that too.
TCS was developed to serve the wide ranging interpretations of how one might describe a Taxon.
We started with the quite strict interpretation form our own work on the Prometheus model (published in Taxon) and then when I worked on the SEEK project and looked at how ecologists described taxa – it was very different. I then worked with TDWG to understand the differing perspectives of taxa across the community. In order to deal with new and legacy data it was almost impossible to specify what a taxon must have, but we wanted to allow people to capture what they could provide, believing that if GUIDs were created the concepts could be improved over time as they were required for research and thereby it was worth investing the effort. This didn’t necessarily imply that we would have one GUID for a given taxon concept which others would come along and edit but it would allow people to capture their meaning and part of defining any new concept would be to relate their concept to other existing ones – the thought being that the author of the concept could decide of what he/she meant was congruent, included, overlapped etc with other described concepts. Slowly but surely the network of GUIDs and concepts would grow with the more important ones being sorted out first.
In the ideal world we would have the full description of all specimens, all characteristics of those specimens, the defining characteristics, the basionym etc all defined but we need to be realistic to start, so tying the taxon to at least a physical description would be something to determine meaning from.
It seems we haven’t really committed to this approach enough yet – but I still think it’s the way to go.
Hope this helps,
Edinburgh Napier University is one of Scotland's top universities for graduate employability. 93.2% of graduates are in work or further study within six months of leaving. This university is also proud winner of the Queen's Anniversary Prize for Higher and Further Education 2009, awarded for innovative housing construction for environmental benefit and quality of life.
This message is intended for the
addressee(s) only and should not be read, copied or
disclosed to anyone else outwith the University without
the permission of the sender.
It is your responsibility to ensure that this message and any attachments are scanned for viruses or other defects. Edinburgh Napier University does not accept liability for any loss or damage which may result from this email or any attachment, or for errors or omissions arising after it was sent. Email is not a secure medium. Email entering the University's system is subject to routine monitoring and filtering by the University.
Edinburgh Napier University is a registered Scottish charity. Registration number SC018373
In RDF/XML, is it possible to generate a triple that has a literal as the subject? I'd like to do this to support indexed case-insensitive searching in SPARQL
:someTaxonName :genusPart "Doodia" .
"Doodia" :toUpper "DOODIA" .
The same thing can be done just by having another property on the taxon name object, but that would mean duplicate properties for hasEpithet, hasAuthor, etc etc. And quite a lot of duplicate properties, as I'd like to do this to support Tony Rees' taxamatch algorithm (which involves a static transformation of scientific names), and stripping diacritics from author names to make searching easier there.
The other reason is a gut-feel design decision: the transformation to uppercase is not something we are saying about the taxon name object, it is about the string "Doodia" regardless of where it might be used.
Having looked into it a little further, I suspect that it just isn't valid, although at a graph level it would be perfectly ok. Protege won't accept the triples above, let alone any RDF/XML. I think I'm going to have to go with a "MatchingStrings" object and a "genusPartStrings" property … no, that won't do. There will be an enormous number of duplicates because each blank noe has a distinct identity.
A solution that would work is a MatchingStrings object that has a URI that is a hashed version of the string.
:someTaxonName :genusPartStrings strings:8787785 .
strings:8787785 rdf:value "Doodia" .
strings:8787785 :toUpper "DOODIA" .
strings:8787785 :toTaxamatch "DDA" .
and so on. The generated RDF might contain duplicates, but there won't be redundant anonymous blank nodes created - just repeated property assertions which are not stored in the graph as repeats. At least, they oughtn't be.
An alternative to using a hash is simply to use the (URL-encoded) string itself as the URI. You could even get away with making up a URI schema "literal", to discourage RDF engines from fetching them.
:someTaxonName :genusPart "Doodia" .
:someTaxonName :genusPartStrings <literal:Doodia> .
<literal:Doodia> rdf:value "Doodia" .
<literal:Doodia> :toAscii "Doodia" .
<literal:Doodia> :toUpperAscii "DOODIA" .
<literal:Doodia> :toUpper "DOODIA" .
<literal:Doodia> :toTaxamatch "DDA" .
This has a number of advantages, transparency being the big one. The disadvantage is that you get long URIs, but not if you are only doing this for epithets and authority strings.
I'm not even sure that discouraging RDF engines from snarfing them is a useful goal - why not simply use http://mydomain/strings/ as the root for all of these objects? The webserver there would not even need to store the strings, it could simply generate the RDF based on the requested url. The SPARQL server needs to have the strings in its dataset, but it only needs the ones we actually want to search on - scientific name parts and authority strings.
So, I would define an RDF class Strings, or Mappings, or Conversions, to be used in this manner, and predicates for the various conversions (rdf:value for the plain text). Importantly, I *can* make use of this without having to define a :genusPartStrings predicate. In SPARQL, the query is simply
?strings :toUpper "DOODIA". # the search
?strings rdf:value ?verbatim . # the string that matched the search
?taxonName :genusPart ?verbatim # the taxon name object whose genus part is "Doodia"
*that* works just fine, and means we don't have to clutter up the space with a swag of duplicate properties. The key is using a URL rather than blank nodes so that we don't have redundant objects in the dataset, and the fact that IRIs will accommodate pretty much anything if it's URL encoded.
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
=== The content of this communication is made entirely on my own behalf and in no way should be deemed to express official positions of The University of Massachusetts at Boston or Harvard University. .
|Subject:||[tdwg-rdf: 43] Issue 9 Repeatable properties in lieu of properties that specify concatenated lists (in DwC)|
|Date:||Wed, 4 Apr 2012 12:46:03 -0500|
|From:||Steve Baskauf <steve....@vanderbilt.edu>|
|To:||tdwg...@googlegroups.com <tdwg...@googlegroups.com>, gsa...@unb.ca <gsa...@unb.ca>, John Wieczorek <tu...@berkeley.edu>|
|References:||<4F58267F...@vanderbilt.edu> <CADUi7O4KwXh0k7AoGtw197cudVBHHnf=L611GyiLA...@mail.gmail.com> <4F68BD3A...@vanderbilt.edu> <CADUi7O6W08MabvoAe3Rwqo+BHhYHhVWECnWTF=-J0_yR...@mail.gmail.com> <8530342F-133E-4F37...@nescent.org>|