What is the meaning of the object of a triple?

19 views
Skip to first unread message

Steve Baskauf

unread,
Feb 1, 2014, 4:22:17 PM2/1/14
to TDWG-RDF TG
The basic question is:
What exactly is the thing to which we refer as the object of an RDF triple?
 
Why does this matter? 
1. It matters because it's key to the purpose DwC RDF guide's creation of the dwcuri: namespace. [1]
2. It matters because it is integral to understanding the meaning of the proposal to redefine the Darwin Core classes [2].
3. It matters because Joel has proposed re-writing/clarifying the Darwin Core abstract model [3] which in its present form seems to blur the distinction between things and records about things.
4. It matters because TDWG and the biodiversity informatics community in general are (in my opinion) blurring the distinction between things and the records about those things, and this is impeding progress on the creation of persistent identifiers.  For example, some people have said that the identifier for a specimen should change with every version of a database containing a description of it, whereas I believe it is important that the identifier for the specimen should never change, while the identifier for the record of the specimen could change with every version of the database. 
 
I hope that I have made a good enough case that this is important so that people who know a lot more about this than me this will weigh in with responses (that means you, Hilmar, Bob, Joel, etc.).  What I want to know is whether what I'm saying below is correct.  If it is not correct, I would appreciate it if my misunderstanding would be corrected.
 
Context:
In the context of writing a paper about the Darwin Core RDF guide for the special issue of the Semantic Web Journal, the question came up about how we talk about the objects of triples.  This snippet of RDF/Turtle (from the DwC RDF guide) was presented in the paper:
 
<http:// arctos.database.museum/guid/MVZ:Mamm:115956>
     dwc:recordedBy "Oliver P. Pearson; Anita K. Pearson";
     dwcuri:recordedBy <http://viaf.org/viaf/263074474>,
                       <http://museum-x.org/personnel/akp>.

 
I made the statement: "For example, the existing Darwin Core term dwc:recordedBy would continue to be used with a value that consisted of a name string for agents who recorded an occurrence, whereas the new term dwcuri:recordedBy would refer to a non-literal object (represented by a URI reference or blank node) that was the agent itself." 

This comment was made about what I wrote: "This isn’t strictly correct, no? The agent would be Anita. The URI is a reference identifying some information about Anita."
 
My understanding about this issue (described below) comes from trying to wade through some of the RDF documents, notably the RDF Schema [4], RDF Concepts and Abstract Syntax [5], and XML Schema Part 2: Datatypes [6] documents.  I'm not completely sure if what I'm saying is true for the rawest level of RDF, but I think it's probably at least true under an RDFS semantic extension, which I think we at least assume at some level in Darwin Core since DwC is based loosely on the Dublin Core Abstract model (which is all about literal and non-literal values, includes the notions of classes, and whose formal syntax is intended to be defined by RDF and RDFS semantics). 
 
My understanding of the answer to the question (short version):
Objects of triples that consist of URI references and typed literals can and often do represent the actual abstract things that are related to the subject by the predicate.  They do NOT represent strings that identify the object (like names).  They do NOT represent records about the object.  They represent the thing that IS the object. 
 
My understanding of the answer to the question (extended version):
1. If a URI reference is presented as the object of the triple, the URI serves as an identifier for the entity that is the actual object of the statement that is being made.  For example, if I assert the triple:
 
<http://bioimages.vanderbilt.edu/baskauf/29118> dcterms:creator <http://viaf.org/viaf/63557389>
 
where http://bioimages.vanderbilt.edu/baskauf/29118 is the URI for a still image and dcterms: is the namespace http://purl.org/dc/terms/, I am saying that the creator of the still image is me, a foaf:Person and a non-literal resource.  I am NOT saying that the object of the triple is my record in the VIAF database.  I am NOT saying that the object of the triple is a set of metadata about me.  If I wanted to talk about the VIAF record about me, I would use a URI reference to <http://viaf.org/viaf/63557389/> which is a foaf:Document and differs from my URI by having a "/" character on the end. 
 
2. Based on my reading of [6], if a literal is presented as the object of a triple, and if that literal is a typed literal using one of the xsd: (http://www.w3.org/2001/XMLSchema) datatypes, then the actual value of the object is an abstract thing in the value space that is defined by the datatype URI.  The string that is presented in the triple is a literal from the lexical space defined by the datatype URI, but the string isn't itself actually the object of the triple.  So for example, if I assert the triple:
 
<http://bioimages.vanderbilt.edu/baskauf/29118#loc> dwc:decimalLatitude "36.31236"^^xsd:decimal
 
I am saying that the object of the triple is the decimal number 36.31236 in its abstract mathematical sense and NOT a sequence of eight characters: "3", "6", ".", "3", etc.
 
3. Since the RDFS specification does not define the class of plain literals [9], if I present the object of a triple as an untyped literal without a language tag, a client cannot infer any particular meaning directly from that literal.  It is essentially a string of characters of that is only known to be an instance of the class rdfs:Literal.  Thus if I assert the triple:
 
<http://bioimages.vanderbilt.edu/baskauf/29118> dc:creator "Steven J. Baskauf"
 
where dc: is the namespace http://purl.org/dc/elements/1.1/, a client cannot infer anything about the object of the triple other than that it is a plain literal, i.e. the string of characters "S","t","e","v", etc.  The programmer of the client may impart some meaning to that literal since dc:creator is typically used with values that are the names of agents that create things, but the RDFS specification itself doesn't give any guidance about how to interpret that string.  Since literals can't presently be used as the subjects of triples, there isn't any direct way to assign additional properties to "Steven J. Baskauf" to describe what it is.  If I use a plain literal with a language tag, I know that it is a string of characters in some particular language, but I don't have any understanding directly from the RDF(S) specs of the meaning of that literal.
 
Conclusion:
I think it's critical to make sure that we get this straight.  If we accept that the URI http://museum-x.org/personnel/akp is a reference to some information about Anita Pearson rather than a reference to Anita Pearson herself, how can we clearly make statements where Anita Pearson herself is the subject of a triple?  We need to be able to describe Anita Pearson and data records about her separately because Anita Pearson will have properties such as an email address, institutional affiliations, etc. while data records about Anita Pearson will have properties such as date issued, format, etc.  Specimens will have properties like the institution that owns them and when they were created, while specimen records will have properties like the institution that created them (spawned the new version of the database) whether they are licensed CC0, etc.  In my opinion, it is bad to mix up these two different kinds of things.
 
[1] http://code.google.com/p/tdwg-rdf/wiki/DwcRdfGuideProposal#2.5_Terms_in_the_dwcuri:_namespace
[2] http://code.google.com/p/darwincore/issues/detail?id=204
[3] http://rs.tdwg.org/dwc/terms/guides/xml/index.htm#abstractmodel
[4] http://www.w3.org/TR/rdf-schema/
[5] http://www.w3.org/TR/rdf-concepts/
[6] http://www.w3.org/TR/xmlschema-2/
[7] http://www.w3.org/TR/rdf-concepts/#section-Graph-Literal
[8] http://www.w3.org/TR/xmlschema-2/#typesystem
[9] http://www.w3.org/TR/rdf-schema/#ch_literal
-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
PMB 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 322-4942
If you fax, please phone or email so that I will know to look for it.
http://bioimages.vanderbilt.edu

John Wieczorek

unread,
Feb 1, 2014, 6:51:42 PM2/1/14
to tdwg...@googlegroups.com
I believe that <http://viaf.org/viaf/63557389> is a reference to you,
but it is not you.

Hence, in the text in question, I would state is this way:

"For example, the existing Darwin Core term dwc:recordedBy would
continue to be used with a value that consisted of a name string for
agents who recorded an occurrence, whereas the new term
dwcuri:recordedBy would be a reference to the actual agent, a
non-literal object (represented by a URI reference or blank node)."
> --
> You received this message because you are subscribed to the Google Groups
> "TDWG RDF/OWL Task Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tdwg-rdf+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Steve Baskauf

unread,
Feb 1, 2014, 8:53:33 PM2/1/14
to tdwg...@googlegroups.com
John,

If the statement is phrased as you stated it in your response below, then I basically don't disagree with it.  But saying "<http://viaf.org/viaf/63557389> is a reference to you" has a different meaning than saying "<http://viaf.org/viaf/63557389>" is a reference to information about you".  The former is true and the latter is not true.  If you said "<http://viaf.org/viaf/63557389/> is a reference to information about you" (note the trailing slash in the URI) then the statement would be true.  The distinction between resources and metadata about resources is what I want to be careful to maintain.

After reviewing [2], I think I may have not been precise enough in my use of the word "object".  I probably should have asked this question: "What exactly is the thing denoted by the object of an RDF triple?"

I'm also thinking that based on the language of [2] the most correct way to phrase the statement in the paper would be:


"For example, the existing Darwin Core term dwc:recordedBy would continue to be used with a value that consisted of a name string for
agents who recorded an occurrence, whereas the new term dwcuri:recordedBy would refer to a non-literal object (URI reference or blank node) that denotes the actual agent."

Thanks for the feedback.
Steve

[1] http://viaf.org/viaf/63557389/rdf.xml
[2] http://www.w3.org/TR/rdf-concepts/#section-data-model sections 3.1 and 3.2

Luca Matteis

unread,
Feb 1, 2014, 9:08:43 PM2/1/14
to tdwg...@googlegroups.com
If we accept that the URI http://museum-x.org/personnel/akp is a reference to some information about Anita Pearson rather than a reference to Anita Pearson herself, how can we clearly make statements where Anita Pearson herself is the subject of a triple?

It’s not about accepting, but about what the URI is. If the URI above is the URI for Anita the person, then it should be treated as such. The same way the URI about you doesn’t have a leading slash. Therefore if I want to talk about you in my RDF I know that I will be using that URI instead of the one with the leading slash.

I believe that <http://viaf.org/viaf/63557389> is a reference to you,
but it is not you.

Actually there's no concept of "reference" in RDF. We've built that ourselves with Linked Data, which of course makes perfect sense and is very powerful. But for a reasoner, that is simply a URI which identifies a resource. That resource being John himself and *not* the document about John. The document about John clearly is the URI with the leading slash.

John Wieczorek

unread,
Feb 2, 2014, 8:20:50 AM2/2/14
to tdwg...@googlegroups.com
I believe that <http://viaf.org/viaf/63557389> is a reference to you,
but it is not you.

Hence, in the text in question, I would state is this way:

"For example, the existing Darwin Core term dwc:recordedBy would
continue to be used with a value that consisted of a name string for
agents who recorded an occurrence, whereas the new term
dwcuri:recordedBy would be a reference to the actual agent, a
non-literal object (represented by a URI reference or blank node)."

On Sat, Feb 1, 2014 at 6:22 PM, Steve Baskauf
<steve....@vanderbilt.edu> wrote:

Bob Morris

unread,
Feb 2, 2014, 3:23:26 PM2/2/14
to tdwg...@googlegroups.com
I generally concur with the entire thread so far. But as below, I
think we should use the wording "identifies X" not "refers to X" or
its grammatical variants. I specifically agree very strongly with
Steve's paragraph labeled "Conclusion," though not on RDF grounds, but
rather on engineering grounds. On RDF grounds, as Luca remarks, it
makes no difference. Such matters are easier to see when the entire
discussion is about graphs, which is all that RDF actually is about.
But the document at issue is not intended for a graph-centric
readership, so (1) more verbiage is needed; (2) interpretation of
controlling W3 documents may be a matter of opinion when one is
interpreting a graph-centric W3 document in a non-graph-centric rubric
and (3) the issues at hand have been the subject of expression in
philosophy, art, and neuroscience throughout the history of those
disciplines. Taken together, it seems to me that we court less pain
and argument if we use "identifies" rather than "refers to."

One indisputable thing about is that the URI of a thing---whether
abstract of concrete--- and the URI of a description of the thing,
e.g. of a physical specimen and a database record about that specimen,
must be different. A rock is never an electronic database record.
Hence the rock and any electronic database record thereof require
different identifiers. From an RDF point of view, it can't make any
difference which of the two identifies the rock and which the
description. But there must be two unless one believes a rock can be
an electronic database record. If one does believe that a rock can be
am electronic database record, what should we make of a statement like
"Bob threw the rock in the lake?"

IMO, part of the problem is that people conflate the usage of the root
"refer" in several different contexts, all of which are useful in
discussion or implementation of RDF applications, except when you
can't tell which context is under discussion, or worse, when one is
under discussion in one sentence in a paragraph, and the context
changes in the next, without any warning. These include:

(a) The English usage of "refer to" and its grammatical forms. But
consider RFC3968 Sec 1.2.2. "Separating Identification from
Interaction" [0] where it is said:
"A common misunderstanding of URIs is that they are only used to
refer to accessible resources. The URI itself only provides
identification; access to the resource is neither guaranteed nor
implied by the presence of a URI."

I find the first clause of the second sentence unambiguously to
declare that a URI does not refer to anything. Nor do I find that the
rest of RFC3968 contradict this, but I'm open for argument
that this is not the case for every URIs or URI scheme. My point is
similar to Luca's, but it is not restricted to RDF. It's true about
any URI. A URI doesn't refer to something---abstract or concrete.

(b) The abuse of the root "refer" as used in discussion of URI
resolution and dereferencing. It's regrettable that this is so often
called simply "resolution." because the thing that is being
dereferenced is indeed a reference to something. But what is being
dereferenced is the result of the URI resolution, not the URI.
Unfortunately, the (standard http URL(sic)) resolution of http URIs
provides the same string for the resolution as it provides for the
URI. For prescient and humourous insight into the problem, see Woody
Allen on the Great Roe [1] or go visit the physical painting by René
Magritte that is identified by the identifier "La trahison des images"
and described at [4-6] and about a dozen other wikipedia entries
linked on each(?) of those. By the way [1] is cited in the section on
name spaces in the incredibly good Lisp book "Common LispCraft" by the
late Robert Willensky [3].

(c) "URI Reference" in W3C Recommendations. The definition of a URI
reference is at [2]. Nowhere there, can I find it says anything about
"referring to" something. The usage there also seems to be that a URI
"identifies" something.

Really, we should banish the usage "refers to" and instead use
"identifies" as in "<http://viaf.org/viaf/63557389> identifies the
human who originated this thread." The fact that this URI has a
resolution and dereference, if it does, is pretty much irrelevant from
a purely RDF point of view. For example, the fact that there might be
a resolution and dereference that returns a foaf graph is of itself
hardly evidence that this URI is or is not a URI for the physical
person. But documenting it as such in some authoritative document
would be helpful for, if not enforceable upon, application developers.


[0] http://tools.ietf.org/html/rfc3986#section-1.2.2
[1] http://www.newrepublic.com/article/113901/fabulous-tales-and-mythical-beasts-woody-allen
[2] http://www.w3.org/TR/2004/REC-rdf-concepts-20040210
[3] Robert Willensky, Common LispCraft W. W. Norton & Company; 2 Sub
edition (September 17, 1986).
[4] http://en.wikipedia.org/wiki/The_Treachery_of_Images
[5] http://fr.wikipedia.org/wiki/La_Trahison_des_images
[6] http://bit.ly/Lp0R9W


Bob Morris

--
Robert A. Morris

Emeritus Professor of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390


Filtered Push Project
Harvard University Herbaria
Harvard University

email: morri...@gmail.com
web: http://efg.cs.umb.edu/
web: http://wiki.filteredpush.org
http://www.cs.umb.edu/~ram
===
The content of this communication is made entirely on my
own behalf and in no way should be deemed to express
official positions of The University of Massachusetts at Boston or
Harvard University.
--
Robert A. Morris

Emeritus Professor of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390


Filtered Push Project
Harvard University Herbaria
Harvard University

email: morri...@gmail.com
web: http://efg.cs.umb.edu/
web: http://wiki.filteredpush.org
http://www.cs.umb.edu/~ram
===
The content of this communication is made entirely on my
own behalf and in no way should be deemed to express
official positions of The University of Massachusetts at Boston or
Harvard University.

Luca Matteis

unread,
Feb 2, 2014, 3:48:18 PM2/2/14
to tdwg...@googlegroups.com
Yes exactly. URIs in RDF are just identifiers. But we should resolve
those identifiers so people can look up information about those
things. This is Linked Data and not RDF: "Use HTTP URIs so that people
can look up those names" [1]. And of course we should use Linked Data
since it's very useful.

Going back to Anita's example, whether or not
<http://museum-x.org/personnel/akp> is the URI for Anita, perhaps we
should ask her. But let's not be too strict about this. If you don't
have Anita's identifier (URI) and you just have URL of a document on a
website that returns RDF, you can use rdf:seeAlso which allows ranges
to be any RDF resource. So still better than nothing.

Not to forget that Anita's URI above, even if not her strict
identifier, could return information about her actual identifier. So
you cloud be making a very useful statement there, even if not
semantically correct with dwcuri:recordedBy.

1. http://www.w3.org/DesignIssues/LinkedData.html

John Wieczorek

unread,
Feb 3, 2014, 9:00:02 AM2/3/14
to tdwg...@googlegroups.com
I'm thoroughly convinced - amending my position.

I believe that <http://viaf.org/viaf/63557389> identifies you (Steve),
but it is not you.

Hence, in the text in question, I would state is this way:

"For example, the existing Darwin Core term dwc:recordedBy would
continue to be used for values that consist of a name string for
agents who recorded an occurrence, whereas the new term
dwcuri:recordedBy would identify the actual agent using a
non-literal object (represented by a URI reference or blank node)."

Luca Matteis

unread,
Feb 3, 2014, 9:09:08 AM2/3/14
to tdwg...@googlegroups.com
On Mon, Feb 3, 2014 at 3:00 PM, John Wieczorek <tu...@berkeley.edu> wrote:
> I believe that <http://viaf.org/viaf/63557389> identifies you (Steve),
> but it is not you.

How can an identifier actually *be* someone?

John Wieczorek

unread,
Feb 3, 2014, 9:38:00 AM2/3/14
to tdwg...@googlegroups.com
Exactly my point.

Steve Baskauf

unread,
Feb 3, 2014, 9:44:40 AM2/3/14
to tdwg...@googlegroups.com
Bob Morris wrote:
> ... Taken together, it seems to me that we court less pain
> and argument if we use "identifies" rather than "refers to."
>
I actually don't agree with this for the simple matter that TDWG seems
to consist of approximately half information scientists and about half
taxonomists. The half who are information scientists (like Bob) will be
less confused if we use "identifies". The half that are taxonomists
will be terribly confused because of the unfortunate choice of the term
"dwc:Identification" which in the context of Darwin Core means taxonomic
determination. It probably would have been much better to have called
the class "dwc:Determination", but that isn't what happened and it is
probably too late to change it. When writing in English phrases, I try
to use the words "determination" and "determiner" rather than
"identification" and "identifier" when I'm talking about taxonomic
determinations. But not everybody does. So routine use of "identifies"
in the TDWG community is likely to result in more, not less confusion.
I understand Bob's concern about the widespread misbelief that one can
expect an HTTP URI to dereference to something. But using "refer to" in
my mind doesn't necessarily imply dereferenceability. I used the term
"refer to" because the technical term for nodes that are identified by
URIs is "URI reference" or "URIref".

I think there is no disagreement that the URI
<http://viaf.org/viaf/63557389> itself is not a person. I also think
that there should be no disagreement that the URI identifies a node. I
think that is probably true in any interpretation of RDF. What is
useful to us in the context of the DwC RDF guide is the ability to let
that node "represent" some real-life thing that we are interested in
making statements about [1]. I put the word "represent" in quotes
because using that term is probably about as explosive of a can of worms
as "refers to" because it could be considered to imply "a
representation" sensu contentent negotiation, and we don't have to
introduce any expectation that content negotiation will be happening
just because we use a URI to stand for something. Perhaps the word
"denotes" is safest (from [2]). I can't think of any way that its use
would imply something incorrect. So technically:

"a URI identifies a node which denotes some real-world thing"

If one does not want to confuse people who don't care about the graph
data model, would it be bad to shorten this by saying:

"a URI denotes some real-world thing"

?

> One indisputable thing about is that the URI of a thing---whether
> abstract of concrete--- and the URI of a description of the thing,
> e.g. of a physical specimen and a database record about that specimen,
> must be different. A rock is never an electronic database record.
> Hence the rock and any electronic database record thereof require
> different identifiers. From an RDF point of view, it can't make any
> difference which of the two identifies the rock and which the
> description. But there must be two unless one believes a rock can be
> an electronic database record. If one does believe that a rock can be
> am electronic database record, what should we make of a statement like
> "Bob threw the rock in the lake?"
>
This is actually the real point of my question, although clear use of
terminology was another aspect.
Steve

[1] "represents" is used in this way in
http://www.w3.org/TR/rdf-concepts/#section-URI-Vocabulary
[2] http://www.w3.org/TR/rdf-concepts/#section-data-model

Matt Yoder

unread,
Feb 3, 2014, 10:05:36 AM2/3/14
to tdwg...@googlegroups.com
Just my two cents here-

Re "determination" and "identification"- we had this exact
conversation and decided to use Steve's proposal in the code/models we
write. It's never too late to change IMO. More importantly, if there
was a URI for the concept that didn't include a label that just
happened to resolve to some human's (oft ambiguous) interpretation,
then we could use whatever labels we wanted for the concept, and TDWG
could support a "preferred" label if they felt they really had to
(though I personally don't like "preferred" labels I begrudgingly
conceded that they have their use).

Keep in mind that that the use of this ontology (at least in my mind)
will be *by machines*. Scientist better not be editing their own
files and typing ""dwc:Identification", or we're (application
developers/standards makers etc.) are not doing it right. Provide
solid definitions for concepts, provide a means to understand those
concepts, then let the application layer label them up.

Cheers,
Matt

p.s. Rod- your question re NOMEN has not been forgotten, I just
haven't got back to it. Someday, soon, ... maybe.

Luca Matteis

unread,
Feb 3, 2014, 10:10:45 AM2/3/14
to tdwg...@googlegroups.com
On Mon, Feb 3, 2014 at 4:05 PM, Matt Yoder <diap...@gmail.com> wrote:
> Scientist better not be editing their own
> files and typing ""dwc:Identification", or we're (application
> developers/standards makers etc.) are not doing it right.

You'd be surprised how much of the web is manually annotated:

http://schema.org/docs/datamodel.html
https://support.google.com/webmasters/answer/146898

With RDFa I've stumbled across various occasions where I typed in the
vocabulary I needed, manually in the HTML. Worked quite well.

Bob Morris

unread,
Feb 3, 2014, 10:54:58 AM2/3/14
to tdwg...@googlegroups.com
You seem to be arguing that the real problem is with the specific term
"dwc:Identification" in that it is inconsistent with the very use that
the 50% whose interest you are trying to protect, and you do so only
because of that single(?) misnamed term. In that case, why not
propose deprecating the term in favor of a proposed new term
dwc:Determination, which is closer to what taxonomists use in the
first place Fix what's broken, not what isn't.

As to "denotes" vs. "identifies", using "denotes" may run explicitly
afoul of the RDFS specification except in the case of datatype URIs.
[1]

[1] "RDF Semantics - datatypes and "identifies" vs "denotes" -
ISSUE-145" http://lists.w3.org/Archives/Public/public-rdf-comments/2013Oct/0096.html

Steve Baskauf

unread,
Feb 3, 2014, 11:11:55 AM2/3/14
to tdwg...@googlegroups.com
Bob,
I don't know at that time if you were being copied on the "ad hoc group" emails regarding the proposals to fix the definitions of DwC classes, but the issue came up about renaming dwc:Identification to dwc:Determination.  The consensus of that group was that it would cause trouble since dwc:Identification has already been in use for a number of years and we would be changing the actual URI, not just a textual definition.  If this is issue is really important, perhaps the it should be raised in a broader forum.  But I would be very sensitive to "breaking" implementations that expect the dwc:Identification URI. 

I'm not actually trying to protect the interests of taxonomists, since I'm not one and it's not a burning issue for me.  I'm simply stating the facts on the ground that using the term "identification" among a group of taxonomists is going to cause confusion whether we like it or not. 

I'll look at the "denotes" ref a little later - I've got 9 cockroaches to prep for this afternoon.  Hmmm.  What is more annoying for me, talking about taxonomy, talking about information science, or prepping cockroaches?  Hard to decide... :-)

Steve

Hilmar Lapp

unread,
Feb 3, 2014, 11:59:18 AM2/3/14
to tdwg...@googlegroups.com

On Feb 2, 2014, at 3:23 PM, Bob Morris wrote:

> Really, we should banish the usage "refers to" and instead use "identifies" as in "<http://viaf.org/viaf/63557389> identifies the human who originated this thread." The fact that this URI has a resolution and dereference, if it does, is pretty much irrelevant from a purely RDF point of view.

Exactly.

--
Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io



joel sachs

unread,
Feb 3, 2014, 12:00:16 PM2/3/14
to tdwg...@googlegroups.com
Steve,

I don't have anything helpful to add. I agree with your analysis, and
suggested wording. In choosing between "denotes", "refers to", and
"identifies", my least favourite is "refers to". I don't think that
taxonomists will be confused by using the term "identifies". (As Bob
points out, the fact that "identifies" is not formally defined might actually
help here.)

More generally, we probably don't want to resurrect the httpRange-14
debate. As Luca and Bob point out, this isn't about RDF semantics so much
as it is about convention, and I don't think anyone in TDWG objects to the
Linked Data conventions regarding distinguishing between information and
non-information resources. (Althought I do think the terms
"information resource" and "non-information resource" are unfortunate.)

I do have a question: why did you introduce this discussion in terms of
the "meaning of the object of a triple", rather than "the meaning of a
URI"? Don't many of the questions you asked apply to subjects of triples
as well as to objects?

Cheers,
Joel.

Steve Baskauf

unread,
Feb 3, 2014, 2:01:10 PM2/3/14
to tdwg...@googlegroups.com
Replies inline

joel sachs wrote:
> Steve,
>
> I don't have anything helpful to add. I agree with your analysis, and
> suggested wording. In choosing between "denotes", "refers to", and
> "identifies", my least favourite is "refers to". I don't think that
> taxonomists will be confused by using the term "identifies". (As Bob
> points out, the fact that "identifies" is not formally defined might
> actually help here.)
I'm still trying to absorb the reference Bob gave. It's going to take
me more than one read. Comforting to know I'm not the first person to
struggle with this.
>
> More generally, we probably don't want to resurrect the httpRange-14
> debate. As Luca and Bob point out, this isn't about RDF semantics so
> much as it is about convention, and I don't think anyone in TDWG
> objects to the Linked Data conventions regarding distinguishing
> between information and non-information resources. (Althought I do
> think the terms "information resource" and "non-information resource"
> are unfortunate.)
Yes, I have no interest in resurrecting httpRange-14. However, I also
sat in on an iDigBio meeting a couple years ago where a person who was
heavily involved in the project stated that specimen identifiers would
be changed for every update of a database. So I think I agree that
nobody who understands the conventions about distinguishing between
information and non-information resources will object to them. The
problem is that there are probably plenty of people in TDWG who DON'T
understand the distinction, and those are the people we need to speak
clearly to about this.
>
> I do have a question: why did you introduce this discussion in terms
> of the "meaning of the object of a triple", rather than "the meaning
> of a URI"? Don't many of the questions you asked apply to subjects of
> triples as well as to objects?
Yes, they do, although one of my points was about typed literals, which
at present can't be the subjects of triples. So that's why I specified
objects. Also, the point of the dwcuri: terms is that we expect their
values (a.k.a. objects of triples for which they serve as predicates) to
be URI references or blank nodes rather than literals. Since the quoted
text was in reference to dcwuri: terms, that's another reason why I
framed it in terms of objects. But what you say is true - most of this
discussions apply to URIs whether they refer to subjects, predicates, or
objects.

Hilmar Lapp

unread,
Feb 3, 2014, 2:32:00 PM2/3/14
to tdwg...@googlegroups.com
I'm with Joel in that the good things have all been said.

I'll just respond to Joel and Steve by asking how this question and problem is not in essence the httpRange-14 debate.

http://dbpedia.org/resource/Berlin identifies the German city of Berlin. It is not Berlin, but identifies it. By convention, DBpedia, the resource that minted the identifier, indicates the fact that the identifier is for something that isn't information by returning a 303 status code upon dereferencing it. You or a machine would not know this before dereferencing. Depending on the Accept header you send when dereferencing, the 303 comes with another URI. For example, if you deference with an "Accept: application/rdf+xml" header, you get http://dbpedia.org/data/Berlin.xml. If you dereference that one, you get an RDF document with information about Berlin, with a 200 response code. http://dbpedia.org/data/Berlin.xml and http://dbpedia.org/resource/Berlin are not the same, and they don't refer to the same. http://dbpedia.org/data/Berlin.xml doesn't refer to Berlin, not does it identify Berlin. It identifies information about Berlin.

Isn't that in essence what we're talking about here?

-hilmar

joel sachs

unread,
Feb 3, 2014, 3:09:39 PM2/3/14
to tdwg...@googlegroups.com
On Mon, 3 Feb 2014, Hilmar Lapp wrote:

> I'm with Joel in that the good things have all been said.
>
> I'll just respond to Joel and Steve by asking how this question and problem is not in essence the httpRange-14 debate.
>
> http://dbpedia.org/resource/Berlin identifies the German city of Berlin. It is not Berlin, but identifies it. By convention, DBpedia, the resource that minted the identifier, indicates the fact that the identifier is for something that isn't information by returning a 303 status code upon dereferencing it. You or a machine would not know this before dereferencing. Depending on the Accept header you send when dereferencing, the 303 comes with another URI. For example, if you deference with an "Accept: application/rdf+xml" header, you get http://dbpedia.org/data/Berlin.xml. If you dereference that one, you get an RDF document with information about Berlin, with a 200 response code. http://dbpedia.org/data/Berlin.xml and http://dbpedia.org/resource/Berlin are not the same, and they don't refer to the same. http://dbpedia.org/data/Berlin.xml doesn't refer to Berlin, not does it identify Berlin. It identifies information about Berlin.
>
> Isn't that in essence what we're talking about here?

Yes. I know that Steve is doing something with cockroaches this afternoon,
so I'll anticipate his answer: A goal of the guide is to summarize the
DwC-related apects of httpRange, rdf semantics, and web architecture in a
way that non-experts can effectively use.

In regards your example below, a 303 re-direct does not indicate that
the requested URI identifies a non-information resource. Further
information (such as the URI being typed via an RDF assertion in the
returned document) is required to know what type of resouce it represents
[1].

Joel.

1. http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html
(Although the issue lives on:
https://www.w3.org/2001/tag/group/track/issues/57)


>
> -hilmar
>
> On Feb 3, 2014, at 12:00 PM, joel sachs wrote:
>
>> Steve,
>>
>> I don't have anything helpful to add. I agree with your analysis, and suggested wording. In choosing between "denotes", "refers to", and "identifies", my least favourite is "refers to". I don't think that taxonomists will be confused by using the term "identifies". (As Bob points out, the fact that "identifies" is not formally defined might actually help here.)
>>
>> More generally, we probably don't want to resurrect the httpRange-14 debate. As Luca and Bob point out, this isn't about RDF semantics so much as it is about convention, and I don't think anyone in TDWG objects to the Linked Data conventions regarding distinguishing between information and non-information resources. (Althought I do think the terms "information resource" and "non-information resource" are unfortunate.)
>>
>> I do have a question: why did you introduce this discussion in terms of the "meaning of the object of a triple", rather than "the meaning of a URI"? Don't many of the questions you asked apply to subjects of triples as well as to objects?
>>
>> Cheers,
>> Joel.
>>
>>
>> On Mon, 3 Feb 2014, Steve Baskauf wrote:
>>
>>> Bob Morris wrote:
>>>> ... Taken together, it seems to me that we court less pain
>>>> and argument if we use "identifies" rather than "refers to."
>>> I actually don't agree with this for the simple matter that TDWG seems to consist of approximately half information scientists and about half taxonomists. The half who are information scientists (like Bob) will be less confused if we use "identifies". The half that are taxonomists will be terribly confused because of the unfortunate choice of the term "dwc:Identification" which in the context of Darwin Core means taxonomic determination. It probably would have been much better to have called the class "dwc:Determination", but that isn't what happened and it is probably too late to change it. When writing in English phrases, I try to use the words "determination" and "determiner" rather than "identification" and "identifier" when I'm talking about taxonomic determinations. But not everybody does. So routine use of "identifies" in the TDWG community is likely to result in more, not less confusion. I understand Bob's concern about the widespread misbelief that one ca!
n expect an HTTP URI to dereference to something. But using "refer to" in my mind doesn't necessarily imply dereferenceability. I used the term "refer to" because the technical term for nodes that are identified by URIs is "URI reference" or "URIref". I think there is no disagreement that the URI <http://viaf.org/viaf/63557389> itself is not a person. I also think that there should be no disagreement that the URI identifies a node. I think that is probably true in any interpretation of RDF. What is useful to us in the context of the DwC RDF guide is the ability to let that node "represent" some real-life thing that we are interested in making statements about [1]. I put the word "represent" in quotes because using that term is probably about as explosive of a can of worms as "refers to" because it could be considered to imply "a representation" sensu contentent negotiation, and we don't have to introduce any expectation that content negotiation will be happening just!
because we use a URI to stand for something. Perhaps the wor!

Bob Morris

unread,
Feb 3, 2014, 3:17:52 PM2/3/14
to tdwg...@googlegroups.com
cockroaches??? I thought he meant rockcoaches. I figured he was
preparing a lesson on paleobotany.

Steve Baskauf

unread,
Feb 3, 2014, 4:30:15 PM2/3/14
to tdwg...@googlegroups.com
joel sachs wrote:
>
> Yes. I know that Steve is doing something with cockroaches this
> afternoon, so I'll anticipate his answer: A goal of the guide is to
> summarize the DwC-related apects of httpRange, rdf semantics, and web
> architecture in a way that non-experts can effectively use.
Electroretinogram; grab the roach, tape it to a petrie dish, and implant
a tiny wire in its eye under a microscope. Very time consuming and
tedious. I was going to draw some comparisons to working with TDWG, but
then decided that would be too snarky.

Yes, Joel is right. It is true that in the future some RDF data may be
produced by nicely written software that has a user interface that keeps
the ugliness hidden from the user who entered the data. But there may
be other data that came from old Excel spreadsheets with DwC terms as
the column headers. Some poor programmer who isn't familiar with either
RDF or Darwin Core may get stuck with writing code to translate that
spreadsheet into RDF. So writing the DwC RDF guide (and the SW Journal
paper about it) in a way that is understandable to non-experts but which
is correct in the technical language is the goal.

The problem was in fixing the unclear statement:

"For example, the existing Darwin Core term dwc:recordedBy would continue to be used with a value that consisted of a name string for agents who recorded an occurrence, whereas the new term dwcuri:recordedBy would refer to a non-literal object (represented by a URI reference or blank node) that was the agent itself."

I wanted to accomplish two things:
1. make it clear that dwcuri:recordedBy should not be used with the
agent's name but rather be used with a URI reference that identified the
agent herself (as opposed to identifying some data about the agent)
[making the point clearly to non-experts]
2. correctly use the terms "refer", "object", "URI reference", etc. [use
the correct technical language].

My confusion revolved around the question:
Is an "object" of a triple the "URI", the "URI reference", or the real
thing that the URI identifies?

I'm currently thinking that it is correct to say these things:
- the object of a triple can BE a URI reference
- a URI identifies a real thing like an agent
- the object of a triple is a node that can REPRESENT a real thing like
an agent

I am currently not sure that it is correct to say these kinds of things:
- the object of a triple can BE a URI
- the object of a triple can BE a real thing like an agent
- the object of a triple is a node that can DENOTE real thing like an agent
- a URI REFERENCE identifies a real thing like an agent

Still haven't re-read Bob's reference. The roaches were calling...
Steve

Hilmar Lapp

unread,
Feb 3, 2014, 10:37:16 PM2/3/14
to tdwg...@googlegroups.com

On Feb 3, 2014, at 4:30 PM, Steve Baskauf wrote:

> Is an "object" of a triple the "URI", the "URI reference", or the real thing that the URI identifies?

The object of a triple is the thing identified by the URI (if the object isn't a literal). As is the subject, BTW.

> I'm currently thinking that it is correct to say these things:
> - the object of a triple can BE a URI reference

You mean, can be a URI.

> - a URI identifies a real thing like an agent

Can identify a real thing.

> - the object of a triple is a node that can REPRESENT a real thing like an agent

Confusing to me. What do you mean by "represent"? And what by "node"? You sure this will be clear to non-experts? Do you mean "the object of a triple can be a list of statements about a real thing, without expressly naming the thing" (if by node you mean blank node).

-hilmar

Steve Baskauf

unread,
Feb 4, 2014, 12:50:03 AM2/4/14
to tdwg...@googlegroups.com
OK, what Hilmar wrote completely confused me.  I have now scoured the URI specification and the W3C recs about RDF yet another time for the purpose of pulling out the sections that describe the terminology we have been discussing.  They are summarized on the wiki page:

http://code.google.com/p/tdwg-rdf/wiki/RdfTerminology

After some careful reading, I wrote out some conclusions about how terminology is used in these specifications (at the bottom of the wiki page).  With regard to previous posts about use of the words "identify", "denote", and "represent", the RDF specs often use "denote" to indicate the relationship between {URIs and literals} and the resources they represent symbolically.  They also often use "identify" to indicate the relationship between URIs and the resources they represent symbolically in RDF.  "Represent" is also used occasionally.  Responses to Hilmar (based on this research) inline:


Hilmar Lapp wrote:
On Feb 3, 2014, at 4:30 PM, Steve Baskauf wrote:

  
Is an "object" of a triple the "URI", the "URI reference", or the real thing that the URI identifies?
    
The object of a triple is the thing identified by the URI (if the object isn't a literal). As is the subject, BTW.
  
This is apparently incorrect.  The object of the triple is the URI (a.k.a. URI reference).  The object of the triple denotes the thing identified by the URI.

  
I'm currently thinking that it is correct to say these things:
- the object of a triple can BE a URI reference
    
You mean, can be a URI.
  
"URI", "URI reference", and "URIref" are apparently equivalent.

  
- a URI identifies a real thing like an agent
    
Can identify a real thing.
  
Yes, that is more accurate.

  
- the object of a triple is a node that can REPRESENT a real thing like an agent
    
Confusing to me. What do you mean by "represent"? And what by "node"? You sure this will be clear to non-experts? 
Now I would use "denotes" instead of "represent" to be consistent with the language of the RDF specs.  "Node" has the meaning described in the graph model of RDF.  The phrasing I listed in that particular section of the email was not directed at non-experts.  It was intended to be a list of statements in which I was using the terminology correctly (or so I thought at the time). 
Do you mean "the object of a triple can be a list of statements about a real thing, without expressly naming the thing" (if by node you mean blank node). 
  
I don't know what you mean here by "list of statements".

Steve
	-hilmar
  

Hilmar Lapp

unread,
Feb 4, 2014, 8:04:43 AM2/4/14
to tdwg...@googlegroups.com

On Feb 4, 2014, at 12:50 AM, Steve Baskauf wrote:

Do you mean "the object of a triple can be a list of statements about a real thing, without expressly naming the thing" (if by node you mean blank node). 
  
I don't know what you mean here by "list of statements".

Yes, not a good (nor accurate) reference. What I meant was the (sub)graph of RDF triples describing a blank node, or the graph of RDF triples you might obtain when dereferencing a URI that is an information resource.

I was looking for something less technical than "graph", but as you experienced with your own emails, trying to replace accurate and precise technical terminology with something more "accessible" is fraught with peril. Probably best to just stick to terminology consistent with W3C specifications, and then have a glossary that tries to translate the more technical terms to common English.

Steve Baskauf

unread,
Feb 4, 2014, 12:11:37 PM2/4/14
to tdwg...@googlegroups.com
I have added an example and some statements about that example that I believe to use terminology correctly:

http://code.google.com/p/tdwg-rdf/wiki/RdfTerminology#Example

Steve

Steve Baskauf wrote:
--
You received this message because you are subscribed to the Google Groups "TDWG RDF/OWL Task Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tdwg-rdf+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Steve Baskauf

unread,
Feb 5, 2014, 11:05:16 AM2/5/14
to tdwg...@googlegroups.com
I've now had a couple days to digest content of this thread, the W3C and IETF documents, and the email that Bob cited in his message below.  It has been useful for me to get straight in my brain several things, notably that there doesn't seem to be any real difference between a "URI" and a "URI reference", and that subjects and objects of triples are the URIs or literals themselves and not the things denoted by them. 


I don't exactly understand what Bob means by "run explicitly afoul of the RDFS specification".  If what we were doing was writing a standard that defined something, then using the term "denotes" might be problematic if our definition conflicted with the use of "denotes" in a W3C Recommendation.  However, that's not what we are doing.  We are writing a paper where we are trying to explain to uninitiated people how something works and the question is the clear and appropriate language for doing that.  Based on my recent reading and pondering that email Bob cited, it seems that "identify", "refers", and "denotes" are all words that people use in technical documents to describe the mapping from URIs and RDF nodes to meanings. 


The most commonly used term and also the one that has the broadest meaning seems to be "denotes".  That's because it can be used with an RDF node that is either a URI/URI reference, literal, or blank node.  "Identify" is also frequently used, particular in the case of URIs since they are designed particularly to identify things, although typed and untyped literals also identify things in their own way (perhaps poorly in the case of untyped literals where names of things are intended to denote non-literal resources but which actually identify the character sequence of the name).  But blank nodes don't identify anything, so "denotes" is a better term there.  "Refers" seems to be the least commonly used term, although it does appear at several points in the W3C documents as a way to express the mapping between RDF nodes and the things they denote.  Since it carries baggage Bob doesn't like, perhaps it's better to go with "denotes" or "identifies".  If we fear confused taxonomists, then "denotes" works.  I am planning to use it preferentially in the future to refer the the mapping between URIs/literals and what they refer to.  If the identification role of URIs is important in what I'm talking about, I'll probably use "identifies".

Steve

John Wieczorek

unread,
Feb 5, 2014, 11:14:35 AM2/5/14
to tdwg...@googlegroups.com
Nice summary. I feared this discussion at the outset (semantics of
semantics). I find these to be clear and reasonable conclusions.
Thanks, Steve.

Luca Matteis

unread,
Feb 5, 2014, 3:22:17 PM2/5/14
to tdwg...@googlegroups.com
Steve, to get more clarity on the subject I'd suggest you communicate
your concerns to the linked data or rdf mailing lists on w3c.

I'm sure they had to write papers in the past and had to confront
these same terminology issues.

Personally, I wouldn't lose sleep if you told me that URIs referred,
denoted or identified something.

Hilmar Lapp

unread,
Feb 5, 2014, 5:43:37 PM2/5/14
to tdwg...@googlegroups.com

On Feb 5, 2014, at 11:05 AM, Steve Baskauf wrote:

Since it carries baggage Bob doesn't like, perhaps it's better to go with "denotes" or "identifies".  If we fear confused taxonomists, then "denotes" works.  I am planning to use it preferentially in the future to refer the the mapping between URIs/literals and what they refer to.  If the identification role of URIs is important in what I'm talking about, I'll probably use "identifies".

Sounds fully reasonable and defensible. And for the record, I don't fear confused taxonomists.

Paul Murray

unread,
Feb 12, 2014, 11:18:24 PM2/12/14
to tdwg...@googlegroups.com

On 02/02/2014, at 8:22 AM, Steve Baskauf wrote:

The basic question is:
What exactly is the thing to which we refer as the object of an RDF triple? 

Late to the party, and at the risk of exposing my laughable high-school knowledge of the topic,  the words 'subject', 'predicate', and 'object' are borrowed from grammar.

A simple declarative sentence has two bits: you identify the thing you are talking about - the subject, and then you say something about it - the predicate. 

In english, a predicate will have a verb. Some verbs are transitive, some are intransitive. Transitive verbs have an 'object'.

"The balloon rises." - the verb 'to rise' is intransitive.
"The ballon raises the flag." - the verb 'to raise' is transitive.

RDF has borrowed these names, but confusingly with a slightly different meaning of the word 'predicate'. 

Usually, RDF models declarative sentences with transitive verbs with a subject/predicate/object triple; and ones with intransitive verbs via class membership. Consider this pair of declarations.

"The balloon is red".
"The ballon has colour red".

'is' is the verb 'to be', and is intransitive, whereas 'has' is the verb 'to have' and is transitive. In the first instance, you'd use a class hierarchy with a root class of 'ColouredThing'; in the second you'd model colours as a set of entities.

As to which is best, the answer - of course - is "it depends".

So my in-a-nutshell answer is "'Object' means the object of the transitive verb which in RDF-speak we call a 'predicate'. A lot of the time, we use the verb 'to have' as a catch-all."

Paul Murray

unread,
Feb 18, 2014, 7:46:03 PM2/18/14
to tdwg...@googlegroups.com

On 02/02/2014, at 12:53 PM, Steve Baskauf wrote:

> John,
>
> The distinction between resources and metadata about resources is what I want to be careful to maintain.

Some information here:

http://www.w3.org/TR/webarch/#id-resources

By design a URI identifies one resource. We do not limit the scope of what might be a resource. The term "resource" is used in a general sense for whatever might be identified by a URI. It is conventional on the hypertext Web to describe Web pages, images, product catalogs, etc. as “resources”. The distinguishing characteristic of these resources is that all of their essential characteristics can be conveyed in a message. We identify this set as “information resources.”
This document is an example of an information resource. It consists of words and punctuation symbols and graphics and other artifacts that can be encoded, with varying degrees of fidelity, into a sequence of bits. There is nothing about the essential information content of this document that cannot in principle be transfered in a message. In the case of this document, the message payload is the representation of this document.
However, our use of the term resource is intentionally more broad. Other things, such as cars and dogs (and, if you've printed this document on physical sheets of paper, the artifact that you are holding in your hand), are resources too. They are not information resources, however, because their essence is not information. Although it is possible to describe a great many things about a car or a dog in a sequence of bits, the sum of those things will invariably be an approximation of the essential character of the resource.

Paul Murray

unread,
Feb 18, 2014, 8:08:31 PM2/18/14
to tdwg...@googlegroups.com

On 02/02/2014, at 1:08 PM, Luca Matteis wrote:

If we accept that the URI http://museum-x.org/personnel/akp is a reference to some information about Anita Pearson rather than a reference to Anita Pearson herself, how can we clearly make statements where Anita Pearson herself is the subject of a triple?


If <http://museum-x.org/personnel/akp> is a *reference to some information about Anita Pearson*, then the document fetchable at <http://museum-x.org/personnel/akp> will (should) contain triples about the person, and triples about itself. One would hope that there are triples in it like:

<http://museum-x.org/personnel/akp> a :personnelDocument .
<http://museum-x.org/personnel/akp> :wasLastUpdated "Sometime in June"  .
<http://museum-x.org/personnel/akp> :physicalLocation "4th shelf, compactus 3A, in the orange folder with the peanut butter stain"  .
<http://museum-x.org/personnel/akp> :isAbout <person:Anita_Pearson> .
<person:Anita_Pearson> a foaf:Person .
<person:Anita_Pearson> :hasName 'Anita Pearson' .
<person:Anita_Pearson> :hasBirthdate '1/1/1980' .

etc.

That is, if <http://museum-x.org/personnel/akp> is a reference to the document, then the person herself will need a separate URI. But what if we just don't have this? Well, given that information about Anita Pearson is not the same as Anita Pearson herself, then the question reduces down to "if we have an identifier for thing A and don't have an identifier for thing B, how can we talk about thing B?" The answer, I suppose, is that we can't.

Paul Murray

unread,
Feb 18, 2014, 8:25:38 PM2/18/14
to tdwg...@googlegroups.com

On 19/02/2014, at 12:08 PM, Paul Murray wrote:

<http://museum-x.org/personnel/akp> a :personnelDocument .
<http://museum-x.org/personnel/akp> :wasLastUpdated "Sometime in June"  .
<http://museum-x.org/personnel/akp> :physicalLocation "4th shelf, compactus 3A, in the orange folder with the peanut butter stain"  .
<http://museum-x.org/personnel/akp> :isAbout <person:Anita_Pearson> .
<person:Anita_Pearson> a foaf:Person .
<person:Anita_Pearson> :hasName 'Anita Pearson' .
<person:Anita_Pearson> :hasBirthdate '1/1/1980' .

Incidentally, the way I'd do this would be to assign the URI <http://museum-x.org/personnel/akp> to Anita herself, and the URI <http://museum-x.org/personnel/akp#personnel-record> to the personnel record. Make use of the fragment part of a http URI. That way, a single HTTP fetch gets all the relevant information.

I'd also include the triples



Steve Baskauf

unread,
Feb 19, 2014, 10:03:36 AM2/19/14
to tdwg...@googlegroups.com
This brings up a point about which I would like to hear some discussion.  One could type the record as a owl:Ontology as Paul suggests, if it were one.  But that would not be true for many records.  More generic would be a http://rdfs.org/ns/void#Dataset if the dataset were a set of triples.  But again, there is no requirement that the record will exist as RDF and we still might want to talk about it.  I'm thinking that the broadest type would be a http://purl.org/dc/dcmitype/Dataset  (dcmitype:Dataset).  A group such as ours (i.e. the RDF Task Group) could recommend that all records be typed as dcmitype:Dataset with additional types if the record is of a more specific type that a client might want to try to retrieve and interpret. 

However, I'm still not sure that this is right because a record is not a dataset.  But I don't know any term specifically for a record within a dataset.  The definition of dcmitype:Dataset is "Data encoded in a defined structure." with  the comment "Examples include lists, tables, and databases. A dataset may be useful for direct machine processing."  So I guess that doesn't exclude particular records within a dataset.  It seems likely to me that providers would like to make statements about particular records (such as lastModified date) because records can be updated without requiring an entire dataset to be updated to a new version.

As a Task Group, it would be good to recommend something about this as a best practice.
Steve

Paul Murray wrote:
--
You received this message because you are subscribed to the Google Groups "TDWG RDF/OWL Task Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tdwg-rdf+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Paul Murray

unread,
Feb 19, 2014, 7:00:53 PM2/19/14
to tdwg...@googlegroups.com

Class: void:Dataset dataset – A set of RDF triples that are published, maintained or aggregated by a single provider.

Property: void:inDataset in dataset – Points to the void:Dataset that a document is a part of.


Well, there's no stipulation about granularity there. Arguably a record is a set of data, and a set of records is also one. But I'm getting the impression that a dataset is meant to be a collection of foaf:Document objects. There's a predicate void:inDataset to say 'this foaf:Document is "in" this void:Dataset". There's also some artifacts for documents that are *about* datasets, but I don't think they mean that every Document has to be, I don't think they mean that all datasets are 'in' a Document by way of the foaf:topic predicate.

The LinkSet class is … interesting. It's not OWL compliant: the subject of the linkPredicate predicate is another predicate. In OWL, you just can't do that except with an annotation property. It seems to be a way of saying 'this is a set of triples, the subjects are from dataset A and the objects are from dataset B'. And if goes against what I just said - that a Dataset is not so much a collection of triples as a collection of foaf:Documents. I really don't know what they are trying to achieve. If a dataset is "The Vocabulary of Interlinked Datasets (VoID) is an RDF Schema vocabulary for expressing metadata about RDF datasets", then what does this have to do with foaf:Document?

What does void:Dataset buy you that dctype:Dataset does not? Seeing as they are talking about "The Vocabulary of Interlinked Datasets (VoID)", I suspect that the main point of the vocabulary is to bolt this LinkSet thing onto the dctype vocabulary. If you are not doing that (and I still cant see what it's for), then dctype:Dataset is probably enough.

On a different topic:

When I used owl:Ontology, I was using the term in a pretty broad sense. An owl ontology may contain property 'axioms', which in the RDF mapping are simply triples. An OWL ontology in that sense is simply 'a set of things that can be reasoned over'.

Steve Baskauf

unread,
Feb 20, 2014, 11:26:10 AM2/20/14
to tdwg...@googlegroups.com
Interesting...  I hadn't looked at the VoID vocabulary closely enough to pick up on the LinkSet thing.  One of the things that got me interested in it was the void:inDataset term which makes it possible to link a document to a dataset.  FOAF is pretty loose about what constitutes a foaf:Document: "The Document class represents those things which are, broadly conceived, 'documents'. "  So I suppose we would have the latitude to consider a database record to be a foaf:Document.  I would see the availability of the void:inDataset property to be a primary benefit of using the VoID vocabulary rather than just typing datasets as dctype:Dataset (which would also happen if void:Dataset were used since it's a subclass of dctype:Dataset) . 

I'm trying to imagine how this all would work with something like the iDigBio use case (which is what got me thinking seriously about this whole thing).  Imagine that some small institution like the Noname State University herbarium is managing their collection with Excel spreadsheets.  They participate in the iDigBio digitization project.  Somehow miraculously iDigBio figures out a way to assign reusable URI guids to the specimens in Noname S.U.'s collection.  Periodically Noname S.U. sends new spreadsheets to iDigBio with metadata updates for their specimens, but the updates only contain the changed records and not all records.  iDigBio even more miraculously decides to expose RDF about the specimens whose metadata they aggregate.  [I don't know if this is actually how iDigBio works, but I'm pretending anyway.] So we have these things:

1. specimens
2. an original Excel spreadsheet with the entire collection's metadata on it.  The provider probably will never expose these metadata as RDF on its own, but probably has some kind of local identifier based on the date generated or something.
2.a. within the original spreadsheet there is a row for each specimen representing the metadata record for that specimen
3. another (later) Excel spreadsheet with an update of metadata for some images.
3.a. within the later spreadsheet, rows ...
4. images of the specimens

RDF/Turtle (many more properties not shown, pretend that dwc:PreservedSpecimen class exists):

<nsu:specimen1>
     a dwc:PreservedSpecimen;
     foaf:depiction <nsu:specImage1>.
<nsu:specImage1>
    a dctype:StillImage.
<nsu:record1>
    void:inDataset <nsu:spreadsheet1>;
    dcterms:created "2013-11-14T21:51:26-06:00"^^xsd:dateTime;
    dcterms:references <nsu:specimen1>.
<nsu:record2>
    void:inDataset <nsu:spreadsheet2>;
    dcterms:created "2014-01-16T09:42:11-06:00"^^xsd:dateTime;
    dcterms:references <nsu:specimen1>.
<nsu:spreadsheet1>
    a void:Dataset;
    dcterms:identifier "excel-update-2013-11-14";
    dc:creator "Noname State University Herbarium".
<nsu:spreadsheet2>
    a void:Dataset;
    dcterms:identifier "excel-update-2014-01-16";
    dc:creator "Noname State University Herbarium".

Note that use of some terms entails types not explicitly declared. 

There are many ways to express this information and there are many questions that this example raises.  How does one capture the fact that iDigBio has generated the RDF?  Is every spreadsheet a different void:Dataset (as shown here) or is the more nebulous database of the Noname State U. Herbarium over time a single void:Dataset?  Do we assign a single dcterms:created property to the dataset or to each record in it?  Is there even a need to establish each record as a separate entity if each spreadsheet update is tracked as an identifiable entity?  I'm not asking these questions to suggest that there is a single right answer to any of them.  Rather, I think best practices should evolve and be use-case driven.  What do we want to be able to "do" with these triples when we generate them? 

But I also think it would be important to establish some kind of conventions about how to do this because data integration would be very messy if every provider just made up their own ad hoc structure for dealing with this.

Steve

Paul Murray

unread,
Mar 24, 2014, 9:52:27 PM3/24/14
to tdwg...@googlegroups.com

On 06/02/2014, at 3:05 AM, Steve Baskauf wrote:

> I've now had a couple days to digest content of this thread, the W3C and IETF documents, and the email that Bob cited in his message below. It has been useful for me to get straight in my brain several things, notably that there doesn't seem to be any real difference between a "URI" and a "URI reference", and that subjects and objects of triples are the URIs or literals themselves and not the things denoted by them.

As far as I know: a URL reference is a URL with a hash in it somewhere.

Thus:

http://en.wikipedia.org/wiki/Indian_Ocean

Is a URL, and

http://en.wikipedia.org/wiki/Indian_Ocean#Hydrology

is a URL reference - a reference to some spot in a resource accessible via url. Both of these kinds of things are URIs - 'URI' is a broader term, and 'URI reference' perhaps a bit of a misnomer. But nobody seems to care about the distinction between URI and URL anymore, and it's probably pointless getting picky about it.

When we are talking about reasoning and ontology, the distinction doesn't matter. A URI is an opaque name for a thing. "URI" even covers things like 'mailto:b...@x.org', LSIDs, or http urls with query strings.

When we are talking about how best to organise the documents that comprise an ontology for purposes of linked data, it matters quite a bit because an HTTP fetch gets back the entire page. If your database has a million records and you use a hash to tack the ID onto the end of the URI, then the only way to get a single record over HTTP would be to get the whole dataset. On the other hand, if you are using (for instance) a SPARQL endpoint to enquire, then it wouldn't matter - you'd do a DESCRIBE <the uri> and get back something useful about only that.

Reply all
Reply to author
Forward
0 new messages