Hi, Folks,
I know I may be walking into a mine field here, but I would like to raise an issue about how Fedora4 handles blank nodes. My immediate use case for this emerges from investigating whether to use MODSRDF to model descriptive metadata for objects in the repository. For those of you familiar with MODSRDF, it uses a large number of blank nodes to represent concept hierarchies, similar to how the existing XML-based representation works. For example, a snippet of an RDF document describing an article published in the Journal "BMC Bioinformatics" is shown here:
https://gist.github.com/acoburn/749624114e08233a0185#file-modsrdf-ttl
Serialized as Turtle, the MODS-based metadata includes five blank nodes. These may be easier to see in the n-triples serialization:
https://gist.github.com/acoburn/749624114e08233a0185#file-mods-nt
If I send this RDF document to fedora. Here's what happens behind the scenes:
The incoming document is parsed, and when each triple is sent to the persistence layer, each blank node is skolemized, meaning that the blank nodes are given URIs under the .well-known/genid space inside fedora. To clients retrieving this RDF document back from fedora, this is serialized as N-Triples like so:
https://gist.github.com/acoburn/749624114e08233a0185#file-ntriples-from-fedora
Or as Turtle:
https://gist.github.com/acoburn/749624114e08233a0185#file-turtle-from-fedora
This is where things start to get strange.
My understanding of blank nodes is that they are, by definition, locally-scoped anonymous structures, but now fedora tells me not only that they have de-referencable URIs but that they have types and all manner of properties that previously didn't exist. Notice, too, that the rdf:type property is fedora:Blanknode. In a word, these nodes are no longer anonymous, they are no longer blank nodes.
From the W3C recommendation on RDF 1.1 [1]
> Blank node identifiers are local identifiers that are used in some concrete RDF syntaxes or RDF store implementations. They are always locally scoped to the file or RDF store, and are not persistent or portable identifiers for blank nodes. Blank node identifiers are not part of the RDF abstract syntax, but are entirely dependent on the concrete syntax or implementation. The syntactic restrictions on blank node identifiers, if any, therefore also depend on the concrete RDF syntax or implementation. Implementations that handle blank node identifiers in concrete syntaxes need to be careful not to create the same blank node from multiple occurrences of the same blank node identifier except in situations where this is supported by the syntax.
My reading of this is that blank node identifiers are either internal to an RDF store or an artifact of a serialization syntax. This is certainly not how a fedora:Blanknode object behaves.
The W3C recommendation continues:
> 3.5 Replacing Blank Nodes with IRIs
>
> Blank nodes do not have identifiers in the RDF abstract syntax. The blank node identifiers introduced by some concrete syntaxes have only local scope and are purely an artifact of the serialization.
>
> In situations where stronger identification is needed, systems may systematically replace some or all of the blank nodes in an RDF graph with IRIs. Systems wishing to do this should mint a new, globally unique IRI (a Skolem IRI) for each blank node so replaced.
>
> This transformation does not appreciably change the meaning of an RDF graph, provided that the Skolem IRIs do not occur anywhere else. It does however permit the possibility of other graphs subsequently using the Skolem IRIs, which is not possible for blank nodes.
>
> Systems may wish to mint Skolem IRIs in such a way that they can recognize the IRIs as having been introduced solely to replace blank nodes. This allows a system to map IRIs back to blank nodes if needed.
>
> Systems that want Skolem IRIs to be recognizable outside of the system boundaries should use a well-known IRI [RFC5785] with the registered name genid. This is anIRI that uses the HTTP or HTTPS scheme, or another scheme that has been specified to use well-known IRIs; and whose path component starts with /.well-known/genid/.
>
> For example, the authority responsible for the domain
example.com could mint the following recognizable Skolem IRI:
>
>
http://example.com/.well-known/genid/d26a2d0e98334696f4ad70a677abc1f6
This is more or less in line with what Fedora is doing with blank nodes, in that it mints URIs under .well-known/genid (though not as an prefix to the URL path -- only as a prefix to the application-scoped context of the web app). However, as the W3C recommendation suggests, once those Skolem URIs are publicly de-referenceable, then the resources are no longer behaving as blank nodes, and it becomes possible to reuse those URIs with other graphs. The recommendation does permit the possibility of letting Skolem URIs be recognizable outside the boundaries of the system, but those identifiers are no longer blank nodes, even though Fedora asserts that they are [2]. And once these RDF graphs are represented outside the system (for example, in a triplestore), then the representation of the "blank nodes" becomes at best confusing and at worst internally inconsistent with what the ontology states.
And yet, for resources marked with fedora:Blanknode, the returned representation (i.e. from a GET request) includes those embedded resources as if they really are blank nodes -- though the serialization suggests something different.
In short, I don't think the current implementation is entirely consistent. If fedora is going to "support blank nodes", it can't accept blank nodes from an RDF document, skolemize them and then turn around and claim they are still blank nodes. If fedora supported blank nodes qua blank nodes, then these skolemized URIs would not escape the application boundary (as they currently do) -- which is a serialization issue; nor would those URIs under .well-known/genid be de-referenceable: you can't make an anonymous structure dereferencable and still claim that it is an anonymous structure.
Just to be clear, I don't have any concerns with the minting of new URIs for blank nodes per se. My concern really boils down to the fact that these fedora:Blanknode objects look and act like blank nodes from one angle, but from another angle they look and behave altogether differently. What are they? Because they don't quite fit the description of a blank node.
Thanks,
Aaron
[1]
http://www.w3.org/TR/rdf11-concepts/
[2]
http://fedora.info/definitions/v4/repository#Blanknode
--
Aaron Coburn
System Administrator / Programmer
Web Services, Amherst College