Fedora4 and blank nodes

131 views
Skip to first unread message

Aaron Coburn

unread,
Feb 18, 2015, 11:56:06 AM2/18/15
to fedor...@googlegroups.com
Hi, Folks,

I know I may be walking into a mine field here, but I would like to raise an issue about how Fedora4 handles blank nodes. My immediate use case for this emerges from investigating whether to use MODSRDF to model descriptive metadata for objects in the repository. For those of you familiar with MODSRDF, it uses a large number of blank nodes to represent concept hierarchies, similar to how the existing XML-based representation works. For example, a snippet of an RDF document describing an article published in the Journal "BMC Bioinformatics" is shown here:

https://gist.github.com/acoburn/749624114e08233a0185#file-modsrdf-ttl

Serialized as Turtle, the MODS-based metadata includes five blank nodes. These may be easier to see in the n-triples serialization:

https://gist.github.com/acoburn/749624114e08233a0185#file-mods-nt

If I send this RDF document to fedora. Here's what happens behind the scenes:

The incoming document is parsed, and when each triple is sent to the persistence layer, each blank node is skolemized, meaning that the blank nodes are given URIs under the .well-known/genid space inside fedora. To clients retrieving this RDF document back from fedora, this is serialized as N-Triples like so:

https://gist.github.com/acoburn/749624114e08233a0185#file-ntriples-from-fedora

Or as Turtle:

https://gist.github.com/acoburn/749624114e08233a0185#file-turtle-from-fedora

This is where things start to get strange.

My understanding of blank nodes is that they are, by definition, locally-scoped anonymous structures, but now fedora tells me not only that they have de-referencable URIs but that they have types and all manner of properties that previously didn't exist. Notice, too, that the rdf:type property is fedora:Blanknode. In a word, these nodes are no longer anonymous, they are no longer blank nodes.

From the W3C recommendation on RDF 1.1 [1]

> Blank node identifiers are local identifiers that are used in some concrete RDF syntaxes or RDF store implementations. They are always locally scoped to the file or RDF store, and are not persistent or portable identifiers for blank nodes. Blank node identifiers are not part of the RDF abstract syntax, but are entirely dependent on the concrete syntax or implementation. The syntactic restrictions on blank node identifiers, if any, therefore also depend on the concrete RDF syntax or implementation. Implementations that handle blank node identifiers in concrete syntaxes need to be careful not to create the same blank node from multiple occurrences of the same blank node identifier except in situations where this is supported by the syntax.

My reading of this is that blank node identifiers are either internal to an RDF store or an artifact of a serialization syntax. This is certainly not how a fedora:Blanknode object behaves.

The W3C recommendation continues:

> 3.5 Replacing Blank Nodes with IRIs
>
> Blank nodes do not have identifiers in the RDF abstract syntax. The blank node identifiers introduced by some concrete syntaxes have only local scope and are purely an artifact of the serialization.
>
> In situations where stronger identification is needed, systems may systematically replace some or all of the blank nodes in an RDF graph with IRIs. Systems wishing to do this should mint a new, globally unique IRI (a Skolem IRI) for each blank node so replaced.
>
> This transformation does not appreciably change the meaning of an RDF graph, provided that the Skolem IRIs do not occur anywhere else. It does however permit the possibility of other graphs subsequently using the Skolem IRIs, which is not possible for blank nodes.
>
> Systems may wish to mint Skolem IRIs in such a way that they can recognize the IRIs as having been introduced solely to replace blank nodes. This allows a system to map IRIs back to blank nodes if needed.
>
> Systems that want Skolem IRIs to be recognizable outside of the system boundaries should use a well-known IRI [RFC5785] with the registered name genid. This is anIRI that uses the HTTP or HTTPS scheme, or another scheme that has been specified to use well-known IRIs; and whose path component starts with /.well-known/genid/.
>
> For example, the authority responsible for the domain example.com could mint the following recognizable Skolem IRI:
>
> http://example.com/.well-known/genid/d26a2d0e98334696f4ad70a677abc1f6

This is more or less in line with what Fedora is doing with blank nodes, in that it mints URIs under .well-known/genid (though not as an prefix to the URL path -- only as a prefix to the application-scoped context of the web app). However, as the W3C recommendation suggests, once those Skolem URIs are publicly de-referenceable, then the resources are no longer behaving as blank nodes, and it becomes possible to reuse those URIs with other graphs. The recommendation does permit the possibility of letting Skolem URIs be recognizable outside the boundaries of the system, but those identifiers are no longer blank nodes, even though Fedora asserts that they are [2]. And once these RDF graphs are represented outside the system (for example, in a triplestore), then the representation of the "blank nodes" becomes at best confusing and at worst internally inconsistent with what the ontology states.

And yet, for resources marked with fedora:Blanknode, the returned representation (i.e. from a GET request) includes those embedded resources as if they really are blank nodes -- though the serialization suggests something different.

In short, I don't think the current implementation is entirely consistent. If fedora is going to "support blank nodes", it can't accept blank nodes from an RDF document, skolemize them and then turn around and claim they are still blank nodes. If fedora supported blank nodes qua blank nodes, then these skolemized URIs would not escape the application boundary (as they currently do) -- which is a serialization issue; nor would those URIs under .well-known/genid be de-referenceable: you can't make an anonymous structure dereferencable and still claim that it is an anonymous structure.

Just to be clear, I don't have any concerns with the minting of new URIs for blank nodes per se. My concern really boils down to the fact that these fedora:Blanknode objects look and act like blank nodes from one angle, but from another angle they look and behave altogether differently. What are they? Because they don't quite fit the description of a blank node.


Thanks,
Aaron


[1] http://www.w3.org/TR/rdf11-concepts/
[2] http://fedora.info/definitions/v4/repository#Blanknode



--
Aaron Coburn
System Administrator / Programmer
Web Services, Amherst College




aj...@virginia.edu

unread,
Feb 18, 2015, 12:02:41 PM2/18/15
to fedor...@googlegroups.com
I think you're quite right to point to inconsistency here. The best possible solution would be to eliminate support for blank nodes. They have almost no place in Linked Data work, and their use in legacy metadata constructions is an unfortunate mistake.

If that's not possible, the next-best choice would be to eliminate the publication of skolemized identifiers, leaving "blank nodes in, blank nodes out". The .well-known/genid URIs created for blank nodes are, as you say, confusing and inappropriate. The crucial remark in your quote from the spec is "In situations where stronger identification is needed, systems may systematically replace some or all of the blank nodes in an RDF graph with IRIs." Fedora sites have no need for such a stronger system of identification, and it should be removed.

---
A. Soroka
the University of Virginia Library





--
You received this message because you are subscribed to the Google Groups "Fedora Tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech...@googlegroups.com.
To post to this group, send email to fedor...@googlegroups.com.
Visit this group at http://groups.google.com/group/fedora-tech.
For more options, visit https://groups.google.com/d/optout.

Benjamin J. Armintor

unread,
Feb 18, 2015, 12:06:53 PM2/18/15
to fedor...@googlegroups.com
Isn't this an artifact of the only way that F4 has to store them? My reading of Aaron's message was that ideally the bnodes wouldn't be dereferenceable outside F4 internals, which implies some sort of filtering on the REST API. 

aj...@virginia.edu

unread,
Feb 18, 2015, 12:11:22 PM2/18/15
to fedor...@googlegroups.com
That's not quite clear to me, Ben.

No matter how they are stored, when they are retrieved (via HTTP or any other API) the JCR representation that encodes them must become RDF. (This happens now in the many classes in the fcrepo-kernel-impl module under the bizarrely-named package org.fcrepo.kernel.impl.rdf.impl.)

It's at that point that (skolem) identifiers would be re-converted to anonymous RDF nodes.

---
 A. Soroka
the University of Virginia Library

Benjamin J. Armintor

unread,
Feb 18, 2015, 12:17:55 PM2/18/15
to fedor...@googlegroups.com
That makes sense to me, and is probably less annoying to do than the filtering would be.

I share Adam's vision of an ideal world without blank nodes, by the bye. My understanding is that the next rev of the MODS-RDF specs will have substantially fewer of them.

- Ben

Robert Sanderson

unread,
Feb 18, 2015, 1:26:54 PM2/18/15
to fedor...@googlegroups.com

My 2c:

* Blank nodes make updating horrific
* I've never heard a strong argument for why a resource must NOT have a URI. Would be happy to hear one, of course.
* Fedora follows the recommendation of skolemizing, and takes the option of adding additional information. +1 to following standards and recommendations.

So is the concern about the assignment of fedora:Blanknode as a class?  Is the fix just to remove that meaningless assignment?  Or am I missing some other feature that's adding to the confusion?

Thanks!

Rob

Rob Sanderson
Information Standards Advocate
Digital Library Systems and Services
Stanford, CA 94305

aj...@virginia.edu

unread,
Feb 18, 2015, 1:49:19 PM2/18/15
to fedor...@googlegroups.com
The provision and maintenance of .well-known/genid URIs is causing nasty headaches:

https://jira.duraspace.org/browse/FCREPO-1258

It's not impossible to make some fix for these kinds of problems. But we will have to keep doing that, forever. That's a real opportunity cost. The question at hand is not "What is the worst problem with publishing Skolem URIs for blank nodes?" The question is "Is there anything at all to be gained by so doing?" and the answer, so far, appears to be "No".

---
A. Soroka
the University of Virginia Library


Robert Sanderson

unread,
Feb 18, 2015, 2:04:13 PM2/18/15
to fedor...@googlegroups.com

Thanks Adam!

I see how creating *every* blank node in the same container would scale very poorly.  The fix then seems to be to create a blank node container per resource that needs it?

So if /rest/resource is created with blank nodes, they could be skolemized as /rest/resource/genid/1 and incrementing. Or a UUID or whatever.  That would limit the number to a manageable scale, and have the added benefit of easier maintenance.

There's no benefit (I can see) to having /rest/.well-known/ ... the purpose of .well-known is to live at the root of the server and provide space for discovery of resources by a consistent path segment.

The benefits of assigning URIs to blank nodes:

* You don't have mindbending issues with updating the resource -- how do you refer to a blank node that by definition has no identity?
* You can refer to them from outside of the current resource.  Just because the creator doesn't think others will want to refer to them doesn't mean that others agree with that position.  The most interesting thing to be done with your data will be done by somebody else ... if they can refer to it :)

Rob

Andrew Woods

unread,
Feb 18, 2015, 2:27:26 PM2/18/15
to fedor...@googlegroups.com
Hello All,
The issue [1] of scalability for blank nodes being created under the top-level ".well-known/genid" container has been resolved by creating pairtree paths for the blank node resources. That is not to say that there will not be other related issues in the future.

Although the use of blank nodes may not be recommended in the context of linked data (as well as potentially other contexts), there will be Fedora 4 users that expect to be able to create resources that contain blank nodes (e.g. MODSRDF). If Fedora 4 removed its support for blank nodes, what would the expected user interaction be in those cases?

The practical issue at hand seems to be one of an inconsistent implementation within Fedora, as stated by Aaron. I will echo Rob's question, "Is the fix just to remove that meaningless [fedora:Blanknode] assignment"?

Andrew

aj...@virginia.edu

unread,
Feb 18, 2015, 3:19:11 PM2/18/15
to fedor...@googlegroups.com
Removing that assignment will do nothing to solve the essential problem, and that assignment is not at all relevant to it. That problem is that blank nodes submitted to the repository do not remain anonymous. They acquire an identity, whether or not the client actually desires that behavior or even if the client would like to avoid it. It is, as Rob said, unclear as to what use such an identity has (he phrased the point in a positive way-- I don't see it as any kind of positive). It is, however, clear that maintaining it is has an expense. To put the matter shortly, if Fedora is going to support the use of blank nodes, let them be actual blank nodes.

---
A. Soroka
the University of Virginia Library

Robert Sanderson

unread,
Feb 18, 2015, 3:32:41 PM2/18/15
to fedor...@googlegroups.com

Hi Adam,

Normally we see things very similarly, so I'm interested to drill down into the use cases behind this situation where we clearly aren't on the same page.  

Is it just MODSRDF? Or similar ontologies which are derived straight from XML rather than returning to the domain and doing the modeling properly? 
I can see that in IIIF (for example) we have some blank nodes that associate a label and a value together ... but I'm not offended by the thought that a system might want to give that little micro-resource a URI.  

Other than the cost of maintaining a URI, which is not very high when you're just assigning UUIDs and have millions of other URIs to maintain, are there any other costs that you see?

Thanks!

Rob


aj...@virginia.edu

unread,
Feb 18, 2015, 3:48:19 PM2/18/15
to fedor...@googlegroups.com
For my money, Rob, I would be happy to disallow all blank node usage in an LDP implementation like Fedora. MODSRDF is an example that has come up in this discussion, but I find it neither stronger nor more interesting than any other. It is a good example, as you say, of failing to "do the modeling properly", and instead taking refuge in the well-worn and comfortable implicit context of XML hierarchy.

As far as the "cost accounting", I'm not particularly concerned with the marginal cost of publishing an URI. I'm concerned with the cost (aggregated over time and possibly scale) of maintaining the resource behind it, once that URI has "escaped" into the wild (which it inevitably will) and has been used (as you applaud) by agents at large on the Web. I'm especially concerned by the doubly-aggregated cost of many such resources.

An URI, of itself, is just an identifier, and as you say, is cheap. An URL (which is what all Fedora URIs now are) is both an identifier and a pointer (from which coincidence arises the interest and power of Linked Data). The life of that pointer, in this context, has a cost for maintenance that is very hard to calculate. I don't want to levy it on Fedora sites who never asked for it, don't want it, and even more, won't want to continue paying for it so that other agents can take advantage of it. As we design Fedora, we are right to encourage people to make in-bound links to their data as easy as possible. We are wrong to require it, or even worse, to unexpectedly make it the default behavior.

---
A. Soroka
the University of Virginia

Egbert Gramsbergen

unread,
Feb 18, 2015, 7:58:11 PM2/18/15
to fedor...@googlegroups.com
I did not follow all of this discussion, so I may be digging up an old point. In that case, please yawn and ignore.

The one thing I think about as a use-case for blank nodes is rdf collections, which imo should be called lists. It is easy to say, assuming that the order of authors is important:
<article> :authorList (<author1> <author2>)
meaning
<article> :authorList [rdf:first <author1>; rdf:rest [rdf:first <author2>; rdf:rest rdf:nil]]
Here, the blank nodes are actually an asset for the integrety of the list as they make it impossible to alter it from outside (other graphs).

Egbert Gramsbergen
TU Delft Library / 3TU Datacentrum

Van: fedor...@googlegroups.com [fedor...@googlegroups.com] namens aj...@virginia.edu [aj...@virginia.edu]
Verzonden: woensdag 18 februari 2015 21:47
Aan: fedor...@googlegroups.com
Onderwerp: Re: [fedora-tech] Fedora4 and blank nodes

Andrew Woods

unread,
Feb 18, 2015, 10:55:31 PM2/18/15
to fedor...@googlegroups.com
Hello Egbert,
In short, you are advocating for the support of blank nodes within Fedora. I appreciate having your opinion in the mix.
Assuming Fedora continues to support blank nodes, the outstanding question is whether they are Skolemized and exposed via dereferencable URIs, or serialized as anonymous resources.
Andrew

aj...@virginia.edu

unread,
Feb 19, 2015, 6:52:10 AM2/19/15
to fedor...@googlegroups.com
To be clear, no one is arguing against skolemization. The question at hand is whether to publish the skolemized identifiers, and Egbert's use case argues directly against so doing.

---
A. Soroka
The University of Virginia Library

On Feb 18, 2015, at 10:55 PM, Andrew Woods <awo...@duraspace.org> wrote:

> Hello Egbert,
> In short, you are advocating for the support of blank nodes within Fedora. I appreciate having your opinion in the mix.
> Assuming Fedora continues to support blank nodes, the outstanding question is whether they are Skolemized and exposed via dereferencable URIs, or serialized as anonymous resources.
> Andrew
>
> On Wed, Feb 18, 2015 at 7:57 PM, Egbert Gramsbergen <E.F.Gra...@tudelft.nl> wrote:
> I did not follow all of this discussion, so I may be digging up an old point. In that case, please yawn and ignore.
>
> The one thing I think about as a use-case for blank nodes is rdf collections, which imo should be called lists. It is easy to say, assuming that the order of authors is important:
> <article> :authorList (<author1> <author2>)
> meaning
> <article> :authorList [rdf:first <author1>; rdf:rest [rdf:first <author2>; rdf:rest rdf:nil]]
> Here, the blank nodes are actually an asset for the integrety of the list as they make it impossible to alter it from outside (other graphs).
>
> Egbert Gramsbergen
> TU Delft Library / 3TU Datacentrum

Neil Jefferies

unread,
Feb 19, 2015, 7:05:33 AM2/19/15
to fedor...@googlegroups.com

Agreed - the cost on maintenance of escaped URI's is a killer. However, blank nodes are permitted so we should support them but keep them as blank as possible.

N

Aaron Coburn

unread,
Feb 23, 2015, 11:20:31 AM2/23/15
to fedor...@googlegroups.com
Hi, Folks,

On last Thursday's tech call, we discussed the issue of blank nodes and came to a consensus about how fedora ought to handle them [1].

With that in mind, below is a proposal for what that would entail:

1) The REST API would continue to accept RDF documents with blank nodes (this is not a change from the current behavior).

2) The fedora:Blanknode class in the fedora ontology would be eliminated.

3) Blank nodes will no longer be published at the .well-known/genid location.

That is, when RDF documents with blank nodes are added to fedora, those blank nodes would remain anonymous and not be made available at any skolemized location. This change would not prevent clients from skolemizing blank nodes before adding them to fedora, but unlike the current behavior, it would not do so automatically.

4) When fedora:Container nodes are requested, any blank nodes contained in that document will be serialized as blank nodes, according to the concrete RDF syntax that is generated.

5) When a fedora:Container (one that contains blank nodes) is deleted, the corresponding blank nodes will also be removed from fedora.


Andrew,
I believe you suggested that this proposal would require a committer vote. If this proposal is in line with what we discussed last week, I will turn things over to you for a vote.

Regards,
Aaron

[1] https://wiki.duraspace.org/display/FF/2015-02-19+-+Fedora+Tech+Meeting

Christopher Johnson

unread,
Jul 1, 2016, 3:55:38 PM7/1/16
to Fedora Tech
Hi Aaron,

I am trying to track the status of this proposal to not automatically skolemize blank nodes on INSERT, which AFAIK remains the current behavior in FCREPO 4.5.1.  This may be relevant to the current import requirement specification here.

Thanks,
Christopher Johnson   

Aaron Coburn

unread,
Jul 1, 2016, 4:19:46 PM7/1/16
to fedor...@googlegroups.com
Hello Christopher,

As I recall, there was quite a bit of work on this over a year ago, with an implementation that got pretty close to working properly.

https://github.com/fcrepo4/fcrepo4/commit/ceacc9832890906c8dee8f4533f5b33679d9b875

However, in the end, this commit was reverted because because there were too many bugs and the amount of code required to make this work properly exceeded anyone's desire to actually work on this.

https://github.com/fcrepo4/fcrepo4/commit/ae5aa5d2b2cafc8ebcb67baf226e57547e81bc5d

The upshot is that blank nodes that are added to Fedora resources are skolemized as before and given the type "fedora:Skolem" instead of "fedora:BlankNode", which addressed my fundamental concern.

There are a lot of opinions about what Fedora should or shouldn't do with blank nodes. Generally speaking, though, the kind of hierarchy people tend to want to model with blank nodes can be achieved with hash URIs. And furthermore, at this point, I don't think anyone is suggesting that the current behavior be changed.

Regards,
Aaron
> Visit this group at https://groups.google.com/group/fedora-tech.

Christopher Johnson

unread,
Jul 1, 2016, 4:52:19 PM7/1/16
to Fedora Tech
Hi Aaron,

Thanks for the info.  I have posted a comment about my bnode use case on the Design+-+Import+-+Export page.  I guess I am suggesting that bnodes are still (albeit problematically) relevant...  

Cheers,
Christopher

A. Soroka

unread,
Jul 1, 2016, 8:29:35 PM7/1/16
to fedor...@googlegroups.com
Aaron's characterization is quite accurate, and I (as one of the authors of that implementation) would add that any such work would have to begin more-or-less afresh.

---
A. Soroka
The University of Virginia Library
Reply all
Reply to author
Forward
0 new messages