Is it "bad" for an owl:Ontology or skos:ConceptScheme to also be a dcat:Dataset and dctype:Dataset?

Steve Baskauf

unread,

Jan 8, 2017, 3:08:04 PM1/8/17

to tdwg...@googlegroups.com, gtuco...@gmail.com, Stanley Blum, Bob Morris, Jonathan A Rees, Joel Sachs, greg whitbread

I'm sending this email to the VOCAB Task Group contributors and also to
the RDF Task Group email list (if it still works) in order to solicit
advice.

During the expert review of the draft Standards Documentation Specification
https://github.com/tdwg/vocab/blob/master/documentation-specification.md
one of the reviewers made this comment:

"Regarding the use of dcat:Dataset in several of your examples, I do not
think that this is semantically correct to type a resource as being both
a dcat:Dataset and a skos:ConceptScheme or an owl:Ontology (e.g.
examples 4.5.4.1, 4.4.2.3, 4.4.1.1)"

In the examples cited by the reviewer, it is "term lists" that are typed
as dcat:Dataset. The reason was to enable the use of the property
dcat:distribution to link an abstract term list to the various forms in
which it might be distributed (e.g. html, turtle, json) as shown in Fig.
4 (Section 2.2.4). Because dcat:distribution has the domain
dcat:Dataset, making the link to the distributions using
dcat:distribution entails that the term list is a dcat:Dataset whether
we like it or not, so I stated that fact explicitly in the examples. I
should also note that because dcat:Dataset is rdfs:subClassOf
dctype:Dataset, it is also entailed that the term lists are dctype:Dataset.

The question is whether it is a problem when a term list (a
dcat:Dataset) might also be declared to be an owl:Ontology as is
recommended in section 4.4.2.2 (ontology) or if a vocabulary (a
dctype:Dataset) might also be a skos:ConceptScheme (suggested as a
possibility in 4.5.4). With respect to generating inconsistencies, none
of these classes are declared to be disjoint with each other in their
defining RDF. (skos:Concept is disjoint with skos:ConceptScheme but
that isn't an issue here.) So it seems to me that the only problem
would be if there were something "incompatible" in the human-readable
definitions.

I've complied the various definitions in this document:
https://github.com/tdwg/vocab/blob/master/dataset-related-definitions.md
The main requirement in the human readable definitions seems to be that
datasets contain "data", and it seems to me that the definition of
"data" would be fuzzy enough to include vocabulary terms regardless of
whether those terms were also part of an ontology or a concept scheme.
In fact, the definition of dctype:Dataset gives "list" as an example of
a dataset, and "subject heading lists" are listed as an example of a
skos:ConceptScheme. So there seems to be at least one clear example
(subject heading list) that could be considered both a dctype:Dataset
and a skos:ConceptScheme.

Anyway, I'm inclined to just disagree with the reviewer that the
multiple typing to which he/she objects is a problem - I can't see a
reason why it would be a problem. But I'd like to have some feedback
from some experts (you) before I take this position. Agreeing with the
reviewers objection would probably require using some term other than
dcat:distribution to make the link from term lists to distributions, and
it seemed to me that dcat:distribution was just the right well-known
term for that job. I can't think of a better alternative.

Steve

--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
PMB 351634
Nashville, TN 37235-1634, U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 322-4942
If you fax, please phone or email so that I will know to look for it.
http://bioimages.vanderbilt.edu
http://vanderbilt.edu/trees

Paul J. Morris

unread,

Jan 8, 2017, 7:25:47 PM1/8/17

to tdwg...@googlegroups.com, steve....@vanderbilt.edu, gtuco...@gmail.com, Stanley Blum, Bob Morris, Jonathan A Rees, Joel Sachs, greg whitbread

Steve,

Your position seems reasonable.

You can express a competency question to support including
dcat:distribution. I don't see a technical problem with the use of
dcat:Dataset in 4.4.3 or the examples. I concur that the concern
appears to revolve around the meaning of "data", for guidance here, I'd
also consider this quote from SKOS: "Using SKOS, a knowledge
organization system can be expressed as machine-readable data."
https://www.w3.org/TR/skos-reference/#L895

-Paul

Paul J. Morris
Biodiversity Informatics Manager
Museum of Comparative Zoölogy, Harvard University
mo...@morris.net AA3SD PGP public key available

Jonathan A Rees

unread,

Jan 9, 2017, 6:40:00 AM1/9/17

to Paul J. Morris, tdwg...@googlegroups.com, Steve Baskauf, gtuco...@gmail.com, Stanley Blum, Bob Morris, Joel Sachs, greg whitbread

I don't think you'd be doing anything logically inconsistent or out of spec by punning in this way, but it would be a serious violation of the intent of both RDF and URIs.

Webarch is not normative, but here is what it says about the punning: https://www.w3.org/TR/webarch/#URI-collision

It's all about interoperability. You may not have any inconsistencies initially, but there may be a later ontology that does something reasonable, but will fail to be consistent not because it has done something wrong but because you have a model in mind that would be totally surprising to the author of the later ontology. See https://en.wikipedia.org/wiki/Principle_of_least_astonishment .

The English comments tell you what the (or a) natural model is of the axioms. (Yes, I'm talking about OWL model theory.) If you say there exists an x such that x is both a term list and an ontology, or there exists and x such that x is both a vocabulary and a concept scheme, that is just sophistry. It makes absolutely no sense; there is no natural model that has such x's. Ignoring the prose is fine in a closed world where you have total control and don't care much about interoperability - then you can at least make puns without creating inconsistencies. But OWL is supposed to be best-effort open world. There is no way to guarantee consistency in an open world, but following the spirit of the prose definitions takes you a long way.

I'd say if you don't care about confusing people and it's really important to have the puns, the specification police can't stop you. But if you want something comprehensible to the uninitiated, you're much better off acting as if these classes are disjoint.

If you have axioms that entail that dcat:Dataset and owl:Ontology are to be interpreted in a special way that you define, that significantly goes beyond what the prose definition and original axioms license, that is a kind of 'squatting' and is frowned upon at the webarch level (compare https://www.w3.org/wiki/UriSpaceSquatting ).

Consider how you would react to an ontology in which there were individuals x that were both cars and apples, and were told that's OK because they don't have any properties in common, so there can be no inconsistencies. What if you wanted to define, in your extending ontology, a number-of-doors property that would be zero for apples but 4 for cars? I think the latter is perfectly sensible, since apples do not have doors (in a natural model), but the punning, with which it is inconsistent, is not.

But you know by now that I have a particular attitude toward OWL - that it is a way for expressing yourself clearly, that just happens to have computable logics - that many people don't share.

Jonathan

Paul J. Morris

unread,

Jan 9, 2017, 9:34:55 AM1/9/17

to Jonathan A Rees, tdwg...@googlegroups.com, Steve Baskauf, gtuco...@gmail.com, Stanley Blum, Bob Morris, Joel Sachs, greg whitbread

On Sun, 8 Jan 2017 21:50:03 -0500
Jonathan A Rees <re...@mumble.net> wrote:
> Consider how you would react to an ontology in which there were
> individuals x that were both cars and apples,

http://www.macrumors.com/roundup/apple-car/
https://en.wikipedia.org/wiki/Apple_electric_car_project

http://www.american-automobiles.com/Apple.html

http://babyccinokids.com/blog/2012/02/09/apple-cars-a-fun-little-snack/
http://www.crystalandcomp.com/wp-content/uploads/2013/05/carsnack_thumb.jpg

http://www.cheskydom.com/site/?tag=%e3%83%aa%e3%83%b3%e3%82%b4

https://www.flickr.com/photos/55763854@N00/4945330905/

https://www.flickr.com/photos/sovaira/5082413185

It is an open world.... :)

-Paul

Hilmar Lapp

unread,

Jan 9, 2017, 11:15:44 AM1/9/17

to tdwg...@googlegroups.com, Paul J. Morris, Steve Baskauf, John Wieczorek, Stan Blum, Bob Morris, Joel Sachs, greg whitbread

I agree with all your general statements below. However, I think it’s also a reality, and arguably one on the rise, that ontologies can and sometimes are used as data. For example, I don’t know how else I would categorize the role of ontologies in the Phenoscape project that are fed into the Knowledgebase. For all intents and purposes (including considerations for archiving, sharing, and reuse) they serve the role of data sets.

So if you archived such an ontology in a data repository, what other than dcat.Dataset would be a proper metadata attribute for the record? Perhaps you’d argue that the Dataset categorization would then be applied to a container object of which the ontology would be declared a part?

-hilmar

--
You received this message because you are subscribed to the Google Groups "TDWG RDF/OWL Task Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tdwg-rdf+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Hilmar Lapp -:- genome.duke.edu -:- lappland.io

Paul J. Morris

unread,

Jan 9, 2017, 2:02:09 PM1/9/17

to Jonathan A Rees, tdwg...@googlegroups.com, Steve Baskauf, gtuco...@gmail.com, Stanley Blum, Bob Morris, Joel Sachs, greg whitbread

On Mon, 9 Jan 2017 12:48:16 -0500

Jonathan A Rees <re...@mumble.net> wrote:

> Not helpful, that is just sophistry.

Assuming that you are responding to the apple/car examples I sent:

Balderdash. They go exactly to the point: What is Data?

> Oscar Wilde demonstrated that
> you can make a pun on any subject. Are you requesting that I come up
> with a more compelling example, or just trying to have fun?

Those are all cases where the object in question is both an apple and a
car. Some are cases where perhaps what you meant by apple is "the
fruit" and the apple in question is "the corporation", thus potentially
puns, others are cases where the object in question is clearly both an
apple "the fruit" and a car, another case I didn't include that is even
more unambigous is a case of a child's drawing of a car which is an
apple. It does not take going very far to assert the axiom x is a car
and x is an apple in an unsurprising way.

The discussion is not about an identifier for a resource for "Data",
therefore puns do not apply, and the discussion is about
interpretation of the English words Data and DataSet.

Can a skos:ConceptScheme also be a dcat:DataSet?

Can an owl:Ontology also be a dcat:DataSet?

For either case, is there a technical reason why not? I'm not seeing
one yet. If there isn't, then we are left with the interpretation of
the human readable concept of Data.

It feels like there is a very reasonable and unsurprising meaning of
Data for which some set of electronic structured information may be both
a skos:ConceptScheme and a dcat:DataSet, and likewise some set of
electronic structured information may be both an owl:Ontology and a
dcat:DataSet.

I'm not reading things in the human readable prose that tells me that
these these are unreasonable axioms. Thus, clearly, how I'm thinking
about data differs from how you are thinking about data, and the set of
apple/car examples are right to the point.

Jonathan A Rees

unread,

Jan 9, 2017, 5:11:32 PM1/9/17

to Paul J. Morris, tdwg...@googlegroups.com, Steve Baskauf, gtuco...@gmail.com, Stanley Blum, Bob Morris, Joel Sachs, greg whitbread

Not helpful, that is just sophistry. Oscar Wilde demonstrated that you can make a pun on any subject. Are you requesting that I come up with a more compelling example, or just trying to have fun?

Jonathan A Rees

unread,

Jan 9, 2017, 5:11:57 PM1/9/17

to Paul J. Morris, tdwg...@googlegroups.com, Steve Baskauf, gtuco...@gmail.com, Stanley Blum, Bob Morris, Joel Sachs, greg whitbread

You are in effect saying that any statement can be true, and any statement can be false. If that is your starting point then we have no basis for communication.

There is a population of speakers of English, and the question of how a word like 'data' is used or not used by it is an empirical one - if we're looking at comment properties (such as those given in https://github.com/tdwg/vocab/blob/master/dataset-related-definitions.md ). If we get into an argument over meaning, which should only happen if somebody is being unpleasant or legalistic, the question will have to be settled through empirical means, such as a corpus analysis or a human subject survey.

If you are rejecting in advance any attempt I might make to clarify, then there is no point in my answering.

Jonathan A Rees

unread,

Jan 9, 2017, 8:54:52 PM1/9/17

to Paul J. Morris, tdwg...@googlegroups.com, Steve Baskauf, gtuco...@gmail.com, Stanley Blum, Bob Morris, Joel Sachs, greg whitbread

For Steve's sake I'll take a closer look at the wording and intent of dcat and the other ontologies. I just read over the axioms for dcat and it's the usual mush created by confusing what is true with what is written down, confusing proposition with expression, etc., so there are a lot of choices to make in interpreting it. But I can try to say something useful with respect to what's particularly written. Give me a day or two...

Of course technical people are accustomed to having ordinary words be defined or used in weird ways. 'Data' is one of the most awfully abused words around so you can easily convince the poor programmer that there is no x such that x is not data. They can be taught to believe that every ontology is a data set, that due to lack of functionalProperty declarations in dcat not every data set (as intended by dcat) needs to have a byteSize, that the Washington Monument is 6 inches tall, and so on.

But this is why I brought up the principle of least astonishment. The question for me is not what can you get away with imposing on the poor soul trying to understand what's going on. It's how can you write definitions and axioms that are the least possible effort for the uninitiated to absorb. If you show them an ontology (in the owl sense) and then tell them it's a data set (in the dcat sense), I promise they will be at least a little bit astonished, even if you can make a very good case for what you say.

pmurray

unread,

Jan 10, 2017, 12:26:57 AM1/10/17

to tdwg...@googlegroups.com

On 10/1/17 3:15 am, Hilmar Lapp wrote:
> … ontologies can and sometimes are used as data. For example, I don’t

> know how else I would categorize the role of ontologies in the
> Phenoscape project that are fed into the Knowledgebase. For all
> intents and purposes (including considerations for archiving, sharing,
> and reuse) they serve the role of data sets.

I'm reminded of how the paradox "this statement cannot be proved" goes
away when you add "this statement cannot be proved in system S".

Steve Baskauf

unread,

Jan 10, 2017, 7:46:12 AM1/10/17

to Jonathan A Rees, Paul J. Morris, tdwg...@googlegroups.com, gtuco...@gmail.com, Stanley Blum, Bob Morris, Joel Sachs, greg whitbread

The problem that we are facing is that well-known terms that we want to use are too narrowly defined. It's not that I really want to type a vocabulary as dctype:Dataset or dcat:Dataset. The problem is that I want to use dcat:distribution to connect a thing to the concrete forms of the thing (files in various formats, SPARQL endpoint) that can be used to acquire information about the thing. Doing that entails that the thing is both a dcat:Dataset and dctype:dataset. There is a similar problem with the VoID vocabulary (https://www.w3.org/TR/void/) which has a number of potentially useful properties such as void:feature, which serves a similar purpose to dcat:distribution, but which due to a domain declaration entails that the subject is a void:Dataset, "a set of RDF triples that are published, maintained or aggregated by a single provider". Our vocabularies are clearly different or at least broader than that.

One might wish to declare directly the type of a term list or a vocabulary. But there don't seem to be any well-known class terms that are broad enough for our use. I looked at http://lov.okfn.org/dataset/lov/terms?q=Vocabulary to see what was there. sio:SIO_001080 (http://semanticscience.org/resource/SIO_001080) might serve the purpose. It's defined as "a vocabulary is a collection of terms". However, I don't know how "well-known" SIO is, and the ontology from which the term is taken is mucked up with all kinds of entailments, so it isn't (yet) clear to me what would be the implications of using that term. There is also voaf:Vocabulary (http://purl.org/vocommons/voaf#Vocabulary), but it is "A vocabulary used in the linked data cloud. An instance of voaf:Vocabulary relies on or is used by at least another instance of voaf:Vocabulary", which seems too narrow. Interestingly, voaf:Vocabulary is subClassOf void:Dataset, for what that's worth.

I'm wondering if the best path here is to just mint several terms that do exactly what we want:

- a property to link things to their concrete forms that can be used to acquire information about those things.

- a class for vocabularies

- maybe a class for term lists (perhaps unnecessary depending on how vocabularies is defined).

We could put them in the dwcattributes: namespace, which already has other "housekeeping" terms dwcattributes:organizedInClass, dwcattributes:abcdEquivalence, dwcattributes:decision, and dwcattributes:status. These terms should probably be elevated to TDWG-wide use rather than being limited specifically to DwC, and under that scenario we could add other terms necessary to get the job done.

Thoughts?
Steve

Matt Yoder

unread,

Jan 10, 2017, 9:47:11 AM1/10/17

to tdwg...@googlegroups.com, Jonathan A Rees, Paul J. Morris, gtuco...@gmail.com, Stanley Blum, Bob Morris, Joel Sachs, greg whitbread

"I'm wondering if the best path here is to just mint several terms

that do exactly what we want:" -> There is nothing wrong with this in
my opinion. I think it is better to express what you mean/want to
convey, than, as Jonathan alludes to, provide a confusing path
forward. You can always deprecate your concept if it doesn't get
adopted or if it is replaced by a more refined standard. If you do
express something useful/meaningful then it will get adopted, and time
will tell that you picked the right path forward.

Matt

greg whitbread

unread,

Jan 10, 2017, 8:17:16 PM1/10/17

to Jonathan A Rees, tdwg...@googlegroups.com, Bob Morris, Joel Sachs, Paul J. Morris, Stanley Blum, gtuco...@gmail.com

Steve,

Given that we are talking "...available forms of a [term list] resource ..."., is there an argument against simply referencing them as formats using dct:hasformat and have these "formats" characterized using dct:format (text/turtle, application/sparql-query, application/json, etc) with link back to the term list document using dct:isformatof?

It took a while for me to figure it out :) but I think I'm now with Jonathan on this issue.

Greg

pmurray

unread,

Jan 10, 2017, 9:25:51 PM1/10/17

to tdwg...@googlegroups.com

On 9/1/17 7:07 am, Steve Baskauf wrote:
>
> "Regarding the use of dcat:Dataset in several of your examples, I do
> not think that this is semantically correct to type a resource as
> being both a dcat:Dataset and a skos:ConceptScheme or an owl:Ontology
> (e.g. examples 4.5.4.1, 4.4.2.3, 4.4.1.1)"
>
> In the examples cited by the reviewer, it is "term lists" that are
> typed as dcat:Dataset. The reason was to enable the use of the
> property dcat:distribution to link an abstract term list to the
> various forms in which it might be distributed (e.g. html, turtle,
> json) as shown in Fig. 4 (Section 2.2.4). Because dcat:distribution
> has the domain dcat:Dataset, making the link to the distributions
> using dcat:distribution entails that the term list is a dcat:Dataset
> whether we like it or not, so I stated that fact explicitly in the
> examples. I should also note that because dcat:Dataset is
> rdfs:subClassOf dctype:Dataset, it is also entailed that the term
> lists are dctype:Dataset.

If I recall correctly (it's been a while, and I haven't been keeping up ):

In OWL, vocabulary terms Ontology, Class, Property cannot themselves be
reasoned over. They have properties, of course, but those properties are
always annotation properties. They are used *by* the reasoning engine,
not as subject matter *within* the reasoning engine. Anything that is
the subject of a Property or the object of an ObjectProperty is
implicitly an owl:Individual. This strong distinction between vocabulary
and content is drawn, I believe, to deal with some rather nasty logical
problems that arise if you don't draw it.

The dcat:dataset predicate has a domain of dcat:Catalog and a range of
dcat:Dataset - a Dataset is partOf a Catalog. Both datasets and catalogs
must be owl:Individual(s), otherwise the range and domain declarations
wouldn't mean anything.

So if any resource is declared as being both "a dcat:Dataset" and "a
owl:Ontology", for purposes of OWL reasoning this is treated as two
different things: a 'punned' URI.

Steve Baskauf

unread,

Jan 10, 2017, 10:00:33 PM1/10/17

to tdwg...@googlegroups.com, Jonathan A Rees, Bob Morris, Joel Sachs, Paul J. Morris, Stanley Blum, gtuco...@gmail.com

OK, I tried rewriting example 4.4.3.1 to (mostly) follow Greg's suggestion. You can see the result at
https://gist.github.com/baskaufs/47c780931424089d151b3eecfb3cf880
I went ahead and continued to refer to the various forms as "distributions" in the human-readable text, but that could be changed. I didn't include the dcterms:isFormatOf for backlinks, but they would be easy to add.

I used dc:format (http://purl.org/dc/elements/1.1/format) rather than dcterms:format (http://purl.org/dc/terms/format) since the object values were literals rather than IRI references. The URIs at https://www.w3.org/ns/formats/ could be used with dcterms:format, but not every MIME type is represented in that list (e.g. no URI for text/html).

As a substitute for dcat:downloadURL in the example, I used ac:accessURI from Audubon Core (https://terms.tdwg.org/wiki/Audubon_Core_Term_List), which serves a similar purpose to dcat:downloadURL, but is suitably vague about "the underlying resource" that's the subject of the triple. That brings to mind that ac:hasServiceAccessPoint might be another alternative to dcterms:hasFormat for linking the term lists to their various "forms".

I'm not sure about the appropriate types for vocabularies, term lists, and "distributions" if we don't use dctype:Dataset, dcat:Dataset, and dcat:Distribution. In the example, I made up the class "dwcattributes:TermList". For the distributions, I used foaf:Document, which I don't particularly like. I'm open to suggestion there...

Steve

Steve Baskauf

unread,

Jan 10, 2017, 10:29:52 PM1/10/17

to tdwg...@googlegroups.com

Hmmm. The issue Paul raises is a really serious problem with the draft
documentation specification as it is currently written.

One objective of the specification is to allow machines to discover
things about TDWG vocabularies. For example, in example 4.4.2.3,
assertions are made about the Audubon Core term list, such as that it's
preferred namespace prefixe is "ac", etc. The AC term list is the
subject of a property, so by Paul's description below, the AC term list
is an owl:Individual.

Another objective of the specification (also reflected in the Vocabulary
Maintenance Specification) is to allow ontology building by those who
are interested in doing that, while keeping that activity separate from
the minting of basic terms having human-readable definitions and little
more (a "bag of terms").

This "layered approach" is described in section 4.4.2.2 (Vocabulary
extension term lists). In that section, and in the last example of
4.4.2.3, the "enhanced" vocabulary is constructed by combining the basic
(vanilla) term list with the axioms of the ontology by using
owl:imports. Using owl:imports entails that the imported Audubon
Core-defined term list is an owl:Ontology. Doing so, creates the
'punning' to which Paul refers to below, since it makes the Audubon Core
term list both an owl:Individual and an owl:Ontology. Aaack!

If this is truly a problem, then how does one implement the layered
approach? Can we have our cake and eat it, too? There seemed to have
been strong sentiment expressed on TDWG Content in the past to keep the
extensions of vocabularies into ontologies separate from the creation of
basic "bag of terms" themselves, but if it can't be done as in the
specification, I'm out of ideas.

Also, it seems like Vocabulary of a Friend
(http://lov.okfn.org/vocommons/voaf/v2.3/) would introduce this problem
in a big way. Or do people just not care about it???

Steve

Sachs, Joel

unread,

Jan 11, 2017, 9:43:04 AM1/11/17

to tdwg...@googlegroups.com, Paul J. Morris, Steve Baskauf, gtuco...@gmail.com, Stanley Blum, Bob Morris, Joel Sachs, greg whitbread

Empirically, ontologies ARE considered to be data by speakers of English. Hilmar pointed to his own experience. And here’s Markus Kroetzsch, last week, talking about the wikidata ontology: "The ontology is not separated from the data. Schematic information is mostly managed by encoding it in data as well.” [1]. The general notion that ontologies are data is especially true in a language like RDF, where the line between schema and data ranges from blurry to non-existent.

Joel.

From: <jonath...@gmail.com<mailto:jonath...@gmail.com>> on behalf of Jonathan A Rees <re...@mumble.net<mailto:re...@mumble.net>>
Reply-To: "tdwg...@googlegroups.com<mailto:tdwg...@googlegroups.com>" <tdwg...@googlegroups.com<mailto:tdwg...@googlegroups.com>>
Date: Monday, January 9, 2017 at 2:42 PM
To: "Paul J. Morris" <mo...@morris.net<mailto:mo...@morris.net>>
Cc: "tdwg...@googlegroups.com<mailto:tdwg...@googlegroups.com>" <tdwg...@googlegroups.com<mailto:tdwg...@googlegroups.com>>, Steve Baskauf <steve....@vanderbilt.edu<mailto:steve....@vanderbilt.edu>>, John Wieczorek <gtuco...@gmail.com<mailto:gtuco...@gmail.com>>, Stanley Blum <stan...@gmail.com<mailto:stan...@gmail.com>>, Bob Morris <morri...@gmail.com<mailto:morri...@gmail.com>>, Joel Sachs <jsa...@csee.umbc.edu<mailto:jsa...@csee.umbc.edu>>, greg whitbread <whitbre...@gmail.com<mailto:whitbre...@gmail.com>>
Subject: Re: [tdwg-rdf: 385] Is it "bad" for an owl:Ontology or skos:ConceptScheme to also be a dcat:Dataset and dctype:Dataset?

You are in effect saying that any statement can be true, and any statement can be false. If that is your starting point then we have no basis for communication.

There is a population of speakers of English, and the question of how a word like 'data' is used or not used by it is an empirical one - if we're looking at comment properties (such as those given in https://github.com/tdwg/vocab/blob/master/dataset-related-definitions.md ). If we get into an argument over meaning, which should only happen if somebody is being unpleasant or legalistic, the question will have to be settled through empirical means, such as a corpus analysis or a human subject survey.

If you are rejecting in advance any attempt I might make to clarify, then there is no point in my answering.

mo...@morris.net<mailto:mo...@morris.net> AA3SD PGP public key available

--
You received this message because you are subscribed to the Google Groups "TDWG RDF/OWL Task Group" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tdwg-rdf+u...@googlegroups.com<mailto:tdwg-rdf+u...@googlegroups.com>.

Sachs, Joel

unread,

Jan 11, 2017, 9:43:51 AM1/11/17

to tdwg...@googlegroups.com, Paul J. Morris, Steve Baskauf, gtuco...@gmail.com, Stanley Blum, Bob Morris, Joel Sachs, greg whitbread

1. https://lists.wikimedia.org/pipermail/wikidata/2017-January/010162.html

Jonathan A Rees

unread,

Jan 11, 2017, 9:58:39 AM1/11/17

to Matt Yoder, tdwg...@googlegroups.com, Paul J. Morris, gtuco...@gmail.com, Stanley Blum, Bob Morris, Joel Sachs, greg whitbread

Three things.

(1) I think this problem stems from wanting to follow the linked data ideology that you should reuse ontology terms whenever possible. Personally I have always considered this to be bad advice, for two reasons:

1. The reused term may be more or less specific than what is needed for the immediate situation, so reuse tends to either lose information about your data set, or impute properties to it that it doesn't have. Or worse it can cause you to say things that are just wrong when applied to your data.

2. A downstream consumer can always make a synonymy between any term you mint and any other term, if they like, as suits their purpose. So there is little cost in creating new terms. Folding terms together at the end is much easier than tearing them apart.

The goal is to find 'le mot juste' that expresses the meaning you have in mind. In happy circumstances this will be a reused term. But reuse just for the sake of reuse can lead to misery.

(2) Re normative definitions of terms, the term often is part of a specification, in which case the entire specification is normative for the term. E.g. for owl:Ontology the entire OWL 2 spec is normative. Sometimes there is no follow-your-nose link to the spec(rdfs:definedBy); that doesn't matter.

(3) I looked into the question, raised by the draft, of whether every owl:Ontology is a dcat:DataSet (4.4.1.1).

I couldn't find any plausible way to disprove this, even given a (reasonable) third party extension, because the DataSet definition is so weak. In philosophical jargon, both things are 'propositions' (n.b. Tim Berners-Lee had a Proposition class in his early semweb work) more or less, so we have no obvious type clash. (Long story; ontologies might better be seen as sentences, not propositions.) I checked with an author of the DCAT spec and he concurred: the class is very inclusive by intent. Any information-resource-like thing qualifies as a DataSet, including this email message.

Someone is going to have to say *which* set of data any given owl:Ontology is. I think the answer has to be that the set of data is the set of axioms (plus peripheral structure such as the UML). This is under the assumption that data are propositions, not sentences, which is how I would classify them.

So while it is not OK with *me* personally, saying what the vocabulary draft says about owl:Ontology and dcat:DataSet is consistent with the specs and with the intent of the DCAT spec, and I don't see how it could go wrong practically. So I will say your reviewer is wrong, and the draft is OK, if you like it that way. (The last qualifier added in recognition of Greg's comment.)

Same goes for every skos:ConceptScheme being a dcat:DataSet.

The inclusion in the other direction is similar: You could make a case for every dcat:DataSet being a skos:ConceptScheme, since both specs are so weak. In doing so you take on responsibility for saying *which* aggregation of skos:Concepts any given dcat:DataSet is. It didn't look to me like you needed this, but I haven't studied the draft enough to know exactly what's required.

Maybe the above will help in understanding the example (4.5.4.1), which I have not studied very well.

You didn't ask about an owl:Ontology being a skos:ConceptScheme and I can't tell whether that's required, but it appears to be dangerously close to being implied, i.e. you have dcat:DataSets that are owl:Ontologies, and you have dcat:DataSets that are skos:ConceptSchemes, so I can't tell whether the intersection is occupied or empty. I've already written the following text, so will include it, but it might be irrelevant.

There is a definite requirement on what can be a skos:ConceptScheme: it must be an aggregation of one or more SKOS concepts (i.e. "units of thoughts or ideas"). Is every ontology an aggregation of SKOS concepts? If so, which concepts? Here is what my consultant (an OWL 2 editor) said:

'The natural tendency will be to equate classes with skos:Concept. It will be more of a stretch to call the other elements, such as properties, skos:Concepts. I think OWL Spec people will find this offensive. Who knows what SKOS people will thing - they seem to be of the sort "let a thousand flowers bloom". However if it were clearcut that OWL classes were instances of SKOS concepts then I think their spec would have definitively said so. That they didn't, encouragement notwithstanding, means they aren't and I don't think TDWG should say otherwise.'

An owl:Ontology is primarily an 'aggregation' of axioms (the spec says this). I think you'd forced to say that the OWL axioms, not the classes, are the skos:Concepts in the aggregation that is the skos:ConceptScheme. I doubt anyone would find that natural.

The case for an owl:Ontology being a skos:ConceptScheme is very weak and I would judge this "not OK" per spec even after attempting to ignore my own taste.

There definitely exist skos:ConceptSchemes and dcat:DataSets that are not owl:Ontologies, since the requirements on being an owl:Ontology are quite strict. So you should be very careful about what you say is an owl:Ontology.

Jonathan

> email to tdwg-rdf+unsubscribe@googlegroups.com.

Nico Franz

unread,

Jan 11, 2017, 10:00:39 AM1/11/17

to tdwg...@googlegroups.com, Paul J. Morris, Steve Baskauf, gtuco...@gmail.com, Stanley Blum, Bob Morris, Joel Sachs, greg whitbread

I like this view by Leonelli that ontologies are classification theories. I think she means (and I'd agree) that it is good to not downplay the theoretical commitments that go into building them in one manner, over plausible alternatives. If we think of ontologies as reflecting profound theoretical commitments, we can still handle them "like data". Especially if we don't insist on some old school empiricist theory vs. observation dichotomy. I sense a bit of that in Markus K.'s comment (referring to "human intuition" - pretty sure that intuition reflects a lot of theoretical commitments).

http://link.springer.com/article/10.1007/s13752-012-0049-z

Cheers, Nico

On Wed, Jan 11, 2017 at 7:43 AM, Sachs, Joel <Joel....@agr.gc.ca> wrote:

Empirically, ontologies ARE considered to be data by speakers of English. Hilmar pointed to his own experience. And here’s Markus Kroetzsch, last week, talking about the wikidata ontology: "The ontology is not separated from the data. Schematic information is mostly managed by encoding it in data as well.” [1]. The general notion that ontologies are data is especially true in a language like RDF, where the line between schema and data ranges from blurry to non-existent.

Joel.

From: <jonath...@gmail.com<mailto:jonath...@gmail.com>> on behalf of Jonathan A Rees <re...@mumble.net<mailto:rees@mumble.net>>
Reply-To: "tdwg...@googlegroups.com<mailto:tdwg-rdf@googlegroups.com>" <tdwg...@googlegroups.com<mailto:tdwg-rdf@googlegroups.com>>
Date: Monday, January 9, 2017 at 2:42 PM
To: "Paul J. Morris" <mo...@morris.net<mailto:mole@morris.net>>
Cc: "tdwg...@googlegroups.com<mailto:tdwg-rdf@googlegroups.com>" <tdwg...@googlegroups.com<mailto:tdwg-rdf@googlegroups.com>>, Steve Baskauf <steve....@vanderbilt.edu<mailto:steve.baskauf@vanderbilt.edu>>, John Wieczorek <gtuco...@gmail.com<mailto:gtuco...@gmail.com>>, Stanley Blum <stan...@gmail.com<mailto:stanb...@gmail.com>>, Bob Morris <morri...@gmail.com<mailto:morri...@gmail.com>>, Joel Sachs <jsa...@csee.umbc.edu<mailto:jsa...@csee.umbc.edu>>, greg whitbread <whitbre...@gmail.com<mailto:whitbread.greg@gmail.com>>
Subject: Re: [tdwg-rdf: 385] Is it "bad" for an owl:Ontology or skos:ConceptScheme to also be a dcat:Dataset and dctype:Dataset?

You are in effect saying that any statement can be true, and any statement can be false. If that is your starting point then we have no basis for communication.

There is a population of speakers of English, and the question of how a word like 'data' is used or not used by it is an empirical one - if we're looking at comment properties (such as those given in https://github.com/tdwg/vocab/blob/master/dataset-related-definitions.md ). If we get into an argument over meaning, which should only happen if somebody is being unpleasant or legalistic, the question will have to be settled through empirical means, such as a corpus analysis or a human subject survey.

If you are rejecting in advance any attempt I might make to clarify, then there is no point in my answering.

mo...@morris.net<mailto:mole@morris.net> AA3SD PGP public key available

--
You received this message because you are subscribed to the Google Groups "TDWG RDF/OWL Task Group" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tdwg-rdf+unsubscribe@googlegroups.com<mailto:tdwg-rdf+unsubscribe@googlegroups.com>.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "TDWG RDF/OWL Task Group" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tdwg-rdf+unsubscribe@googlegroups.com.

Steve Baskauf

unread,

Jan 11, 2017, 10:55:30 AM1/11/17

to tdwg...@googlegroups.com, Matt Yoder, Paul J. Morris, gtuco...@gmail.com, Stanley Blum, Bob Morris, Joel Sachs, greg whitbread

At this point, I feel that it would be a better direction if we can accomplish the goal (linking term lists to their various forms or formats) in a way that does not have the potential to confuse future users. It seems like that can be done in a straightforward manner by following Greg's suggestion to make the link with dcterms:hasFormat, as long as the object "format" (a.k.a. "distribution") can be linked to an download URI using Audubon Core's ac:accessURI property instead of dcat:distribution.

The point of section 4.1.2 on typing of resources is to increase, not decrease the clarity about the nature of URI-identified components of TDWG standards. If typing as dctype:Dataset, dcat:Dataset, and dcat:Distribution muddy the waters, then we shouldn't do it. But I think it would be good to make it possible to unambiguously type any resources for which TDWG has minted UIRs. I haven't had time to look into an unencumbered class for vocabularies. We could certainly mint a class term for "term lists".

I'm not sure what the type should be for the "formats" or "distributions" or whatever we want to call them. I think letting the user of the specification decide about that in the same way we let them decide how to type documents is probably OK, particularly since we will recommend using a dc:format property to provide the MIME type of the "format". I think the definition of ac:accessURI is vague enough that IRIs for SPARQL endpoints could serve as objects of triples for which ac:accessURI is the predicate. I'm not sure how one would communicate to a machine that access is through an endpoint or API. There might be something in VoID to communicate that, but it might come with unwanted entailments.

With respect to maintaining a distinction between owl:Ontologies and skos:ConceptSchemes, I don't think that there is anything in the spec that suggests or recommends that a resource ever be typed as both. If we create a TDWG-defined class for term lists that has only a human-readable definition, then those term lists could be part of an ontology, a concept scheme, or a vanilla vocabulary without mischief.

In the example of 4.4.1.1, a Darwin Core basic vocabulary is formed by the union of two vanilla term lists, while the Darwin Core "enhanced" vocabulary (i.e. turning Darwin Core into an ontology) is accomplished by the union of the two vanilla term lists and an additional "list" of semantic relationships (axioms). See https://github.com/tdwg/vocab/blob/master/hierarchy-model.md for a diagram of this kind of thing. The vanilla lists aren't typed as ontologies - only the overarching "enhanced vocabulary" and the list of axioms are typed as owl:Ontology. The problem of having a term list being both a skos:ConceptScheme and an owl:Ontology would probably only arise if owl:imports were used to create the "enhanced vocabulary" by importing a term list that was explicitly typed as a skos:ConceptScheme (since the range of owl:imports is owl:Ontology). Similarly, the punning issue that has been raised could also arise through the use of owl:imports to import a term list that is the subject of triples that imply that the term list is an owl:Individual. I don't know how we get around that problem.

To unsubscribe from this group and stop receiving emails from it, send an email to tdwg-rdf+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Sachs, Joel

unread,

Jan 11, 2017, 11:14:53 AM1/11/17

to tdwg...@googlegroups.com

On 2017-01-10, 9:24 PM, "tdwg...@googlegroups.com on behalf of pmurray"

<tdwg...@googlegroups.com on behalf of pmu...@anbg.gov.au> wrote:

>
>
>On 9/1/17 7:07 am, Steve Baskauf wrote:
>>
>> "Regarding the use of dcat:Dataset in several of your examples, I do
>> not think that this is semantically correct to type a resource as
>> being both a dcat:Dataset and a skos:ConceptScheme or an owl:Ontology
>> (e.g. examples 4.5.4.1, 4.4.2.3, 4.4.1.1)"
>>
>> In the examples cited by the reviewer, it is "term lists" that are
>> typed as dcat:Dataset. The reason was to enable the use of the
>> property dcat:distribution to link an abstract term list to the
>> various forms in which it might be distributed (e.g. html, turtle,
>> json) as shown in Fig. 4 (Section 2.2.4). Because dcat:distribution
>> has the domain dcat:Dataset, making the link to the distributions
>> using dcat:distribution entails that the term list is a dcat:Dataset
>> whether we like it or not, so I stated that fact explicitly in the
>> examples. I should also note that because dcat:Dataset is
>> rdfs:subClassOf dctype:Dataset, it is also entailed that the term
>> lists are dctype:Dataset.
>If I recall correctly (it's been a while, and I haven't been keeping up ):
>
>In OWL, vocabulary terms Ontology, Class, Property cannot themselves be
>reasoned over. They have properties, of course, but those properties are
>always annotation properties. They are used *by* the reasoning engine,
>not as subject matter *within* the reasoning engine. Anything that is
>the subject of a Property or the object of an ObjectProperty is
>implicitly an owl:Individual.

That isn¹t quite right, since there *are* properties that are used to
reason over classes, most famously subClassOf and equivalentClass. But I
don¹t see the bigger problem - what¹s wrong with inferring that term
lists, datasets, concept schemes, etc. are individuals? Aren¹t they?
According to the spec [1], the entities of OWL are classes, properties,
datatypes, and individuals. So if something isn¹t a class, property, or
datatype, is it not an individual?

Joel.

1.
https://www.w3.org/TR/2012/REC-owl2-syntax-20121211/#Entities.2C_Literals.2
C_and_Anonymous_Individuals

>This strong distinction between vocabulary
>and content is drawn, I believe, to deal with some rather nasty logical
>problems that arise if you don't draw it.
>
>The dcat:dataset predicate has a domain of dcat:Catalog and a range of
>dcat:Dataset - a Dataset is partOf a Catalog. Both datasets and catalogs
>must be owl:Individual(s), otherwise the range and domain declarations
>wouldn't mean anything.
>
>So if any resource is declared as being both "a dcat:Dataset" and "a
>owl:Ontology", for purposes of OWL reasoning this is treated as two
>different things: a 'punned' URI.
>

Reply all

Reply to author

Forward