Hash URI Fragments

92 views
Skip to first unread message

Esmé Cowles

unread,
Oct 28, 2015, 9:42:53 AM10/28/15
to Fedora Tech
On the tech call last week[1], we discussed the behavior of hash URI fragments, and agreed on two changes to the current behavior:

https://jira.duraspace.org/browse/FCREPO-1800: suppressing server-managed triples

https://jira.duraspace.org/browse/FCREPO-1801: preventing inbound links from other repository resources

This is somewhat related to another issue with hash URI fragments (you can't delete them): https://jira.duraspace.org/browse/FCREPO-1764

If you are using hash URI fragments or would like to use them, we would like to know if these changes would help or hinder your usage of them. The intent is to make them behave like lightweight, transitory resources that are only referenced from within the resource that contains them.

-Esme

1. https://wiki.duraspace.org/display/FF/2015-10-22+-+Fedora+Tech+Meeting

Anna Headley

unread,
Oct 28, 2015, 10:33:23 AM10/28/15
to fedor...@googlegroups.com
+1 This fits my use case. Thanks for taking up this topic.

I see the new issues have priority 'minor.' I'm not sure how trivial the implementations are, but would be interested in whether someone is planning to work on them, or whether a specific release is targeted.

Thanks!
Anna



--
You received this message because you are subscribed to the Google Groups "Fedora Tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech...@googlegroups.com.
To post to this group, send email to fedor...@googlegroups.com.
Visit this group at http://groups.google.com/group/fedora-tech.
For more options, visit https://groups.google.com/d/optout.

Esmé Cowles

unread,
Oct 28, 2015, 10:38:26 AM10/28/15
to fedor...@googlegroups.com
Anna-

The new issues are 'minor' because that's the default priority. We can certainly bump that up if we have general agreement on them. FCREPO-1764 is 'major' since it's a pretty serious issue that you can't currently delete the hash URI fragments without deleting the parent resource.

-Esme

Arif Shaon

unread,
Oct 28, 2015, 10:40:11 AM10/28/15
to fedor...@googlegroups.com
+1 from UNSW Library.



Sent from my Samsung device

Justin Coyne

unread,
Oct 28, 2015, 10:42:44 AM10/28/15
to fedora-tech@googlegroups.com Tech
FCREPO-1800 is responsible for a large percentage of the slowness in PCDM ordered members (see my comment on the ticket). So I'm excited for that to be fixed.

-Justin

Andrew Woods

unread,
Oct 28, 2015, 10:58:35 AM10/28/15
to fedor...@googlegroups.com
Hello Anna,
Thank you for your show of interest in the topic of hash-resources and in these specific tickets. As for whether anyone is currently working these (see: assignee), the answer at the moment is "no". Esmé has been doing an excellent job of representing Hydra-community concerns at the Fedora level and addressing related issues. It would be great if he had additional support from the Hydra developer community.
Do you have any suggestions for specific approaches to encouraging technical users of Fedora to participate in its development?
Regards,
Andrew

Anna Headley

unread,
Oct 28, 2015, 11:50:02 AM10/28/15
to fedor...@googlegroups.com
Hi Andrew,

It's obviously very important and I understand how my inquiry prompted you to raise it -- actually I would have been surprised if that hadn't happened!

I'm happy to share my own thoughts from my own experience; ping me on #projecthydra or /msg me on IRC -- HackmasterA.

Best,
Anna

Robert Sanderson

unread,
Oct 28, 2015, 7:51:31 PM10/28/15
to Fedora Tech
A strong -1 to 1801. Preventing some identities from being linked to is the antithesis, literally, of *Linked* data.
Secondly, you can't prevent links to them from outside of Fedora. By treating them specially within a single piece of software, it ignores one of the fundamental decentralized notions for the web architecture.

-0 to 1800, as "hash URI fragments are intended to be lightweight portions of a resource" takes a very opinionated stance on a part of an opaque string (the URI).
For example, ORE offers an implementation pattern in which the URI of the aggregation is the resource map's URI with a fragment added to it. The Resource Map (or any other serialization, generally) is part of a transport mechanism for a description of an object ... which is really the majority of the information, not a "lightweight portion". This positioning is damaging to the ability to treat Fedora4 as an implementation of a general piece of web infrastructure, rather than a system intended for a single, specific community.

That said, server managed triples in Fedora4 are quite verbose and likely unnecessary for 99% of use cases, and suppressing them for fragment URIs is thus unlikely to be problematic in practice... but do it for the right reasons :)

Rob
 
Rob Sanderson
Information Standards Advocate
Digital Library Systems and Services
Stanford, CA 94305

Aaron Birkland

unread,
Oct 29, 2015, 12:19:45 AM10/29/15
to fedor...@googlegroups.com
Hi all, 

1801 makes me uncomfortable as well.  If my memory serves me correctly, the motivation for preventing links to such resources was tied to a JCR implementation detail that made it difficult to do so.  So viewing 1801 as a limitation rather than an assertion makes it more palatable.   That being said, we have use cases related to ORE that would tend to encourage links to resources identified by hash URIs, so 1801 would be a definite obstacle.  We would need to use something like the API Extension architecture to present a representation of Fedora resources that corresponds to the concept of an ORE Resource Map, and behaves 'normally' with respect to linking to hash URIs.  

Server-managed triples in fragments are undesirable noise in all our use cases, so 1800 is great.

  -Aaron

Esmé Cowles

unread,
Oct 29, 2015, 8:26:55 AM10/29/15
to fedor...@googlegroups.com
There was definitely some ambivalence about fcrepo-1801 in the committers call discussion. One important consideration was thinking of it from an API specification point-of-view, rather than an implementation point-of-view.

In the current Fedora implementation, we can say that we know when anything in the repository links to a node or hash URI fragment (because the underlying JCR API gives us inbound references). But if someone wanted to create another implementation of (this part of) the Fedora API, then it would be onerous to insist that they be able to know if there were any inbound links.

As it turns out, I was reminded yesterday that the current Hydra/CurationConcerns ordering implementation uses links to hash URI fragments in child nodes (e.g., </obj> iana:first </obj/ordered_page_proxies#proxy1>). So fcrepo-1801 would break that.

-Esme

A. Soroka

unread,
Oct 29, 2015, 10:22:12 AM10/29/15
to fedor...@googlegroups.com
I think there’s some confusion about 1801 expressed here. I’ll try to clarify.

I don’t think anyone (or at least, I know of no one) who desires to reduce the ability of outside-the-repo agents to link in to hash-URIs. 1801, far from reducing that ability, would improve it.

After 1801, linking to hash-URIs in a Fedora repo would work exactly the way that you would expect them to work on the wider Web. Retrievals from those URIs would resolve to the underlying (non-hashed) resource and mutations would affect that resource, allowing the semantics of # to be client-defined, as they are on the wider Web. That’s exactly what should happen and what most people would expect. But that’s not what happens now. Currently, retrievals and mutations on a hash-URI affect a mysterious and slippery extra resource that isn’t the underlying resource but is somehow contained by it, but not in an intelligible LDP way. Confusing at best!

The concern that 1801 addresses is the one to which Aaron Coburn has pointed in another email.

---
A. Soroka
The University of Virginia Library

A. Soroka

unread,
Oct 29, 2015, 10:28:30 AM10/29/15
to fedor...@googlegroups.com
1801 is definitely intended to address a limitation. It is yet another situation in which we are trying to avoid building a general triplestore because that is neither what the Fedora architecture is intended to do nor a reasonable proposition for the software project as it is exists.

I appreciate the analysis you’ve presented here with ORE and it really does show the effects in a nice, concrete way. But to be perfectly frank, I think you’ve hit the nail on the head when you say “We would need to use something like the API Extension architecture to present a representation of Fedora resources that corresponds to the concept of an ORE Resource Map” and I have no problem whatsoever with that. It seems to me to be exactly what the API Extension system should be able to do— represent Fedora resources in another model. As important as ORE certainly is, it doesn’t seem to me to rise even close to the level of a core concern for the repository process itself. It is instead an important way to present repository contents to a particular audience— grist for API-X’s mill.

---
A. Soroka
The University of Virginia Library

> On Oct 29, 2015, at 12:19 AM, Aaron Birkland <a...@jhu.edu> wrote:
>

Aaron Birkland

unread,
Oct 29, 2015, 10:57:12 AM10/29/15
to fedor...@googlegroups.com

I don’t think anyone (or at least, I know of no one) who desires to reduce the ability of outside-the-repo agents to link in to hash-URIs. 1801, far from reducing that ability, would improve it.

I think the point of confusion "why would anybody want to specifically eliminate the ability of inside-the-repo resources to link in to hash URIs?"   Could anybody with a use case please comment on  this? 


The concern that 1801 addresses is the one to which Aaron Coburn has pointed in another email.

If it isn't part of this thread, could you link to it?  Thanks!

  -Aaron

Aaron Coburn

unread,
Oct 29, 2015, 10:58:46 AM10/29/15
to fedor...@googlegroups.com
> The concern that 1801 addresses is the one to which Aaron Coburn has pointed in another email.
>
> If it isn't part of this thread, could you link to it? Thanks!

I believe Adam just confused the two of us ;-) He meant your email message.

-Aaron Coburn

A. Soroka

unread,
Oct 29, 2015, 11:10:00 AM10/29/15
to fedor...@googlegroups.com
Aaron (C.) is quite right. Apologies!

---
A. Soroka
The University of Virginia Library

A. Soroka

unread,
Oct 29, 2015, 11:12:46 AM10/29/15
to fedor...@googlegroups.com
> I think the point of confusion "why would anybody want to specifically eliminate the ability of inside-the-repo resources to link in to hash URIs?" Could anybody with a use case please comment on this?

There is, to my knowledge, no use case— it’s not a function we are adding but a limitation we are imposing. See Esmé’s remarks for an explanation of the implementation concerns.

---
A. Soroka
The University of Virginia Library

Esmé Cowles

unread,
Oct 29, 2015, 11:18:01 AM10/29/15
to fedor...@googlegroups.com
The use cases are related to using hash URI fragments, and having them be deleted when they no longer have properties (see FCREPO-1764). The concern is that there are obvious integrity issues with automatically deleting things from the repository, and implementation burden concerns about requiring full referential integrity functionality.

So: if we limit the scope of what can link to hash URI fragments, then it makes it easier to know when a hash URI fragment can be safely deleted.

-Esmé

Aaron Birkland

unread,
Oct 29, 2015, 12:51:23 PM10/29/15
to fedor...@googlegroups.com

The use cases are related to using hash URI fragments, and having them be deleted when they no longer have properties (see FCREPO-1764).  The concern is that there are obvious integrity issues with automatically deleting things from the repository, and implementation burden concerns about requiring full referential integrity functionality.

I'm not sure I fully understand the concerns.  Does the first concern, "Integrity issues with automatically deleting things from the repository," use the word 'automatically' to refer to the implicit deletion of hash resources could occur as a result of PUT or PATCH, rather than just DELETE?   If so, then is the concern that a user may delete such a resource unintentionally?  My own take on this is that implicit deletion of hash resources via PUT or PATCH is well specified and can be easily articulated and understood.

As far as the other concern "implementation burden concerns about requiring full referential integrity functionality," who bears this burden?  Is this a concern that that implementing FCREPO-1764 would be burdensome if it had to consider incoming links when implicitly deleting a hash resource?   If so, then why is it more burdensome than the current situation of deleting any Fedora resource that happens to be linked to by other resources?  The current behaviour is that all objects that link to the deleted resource are all implicitly modified to remove such links.  Given that this is an established pattern, I suppose I don't understand why it would be more burdensome to do this for hash resources too.


  -Aaron

A. Soroka

unread,
Oct 29, 2015, 12:59:08 PM10/29/15
to fedor...@googlegroups.com

> On Oct 29, 2015, at 12:50 PM, Aaron Birkland <ap...@cornell.edu> wrote:
>
>
> The use cases are related to using hash URI fragments, and having them be deleted when they no longer have properties (see FCREPO-1764). The concern is that there are obvious integrity issues with automatically deleting things from the repository, and implementation burden concerns about requiring full referential integrity functionality.
>
> I'm not sure I fully understand the concerns. Does the first concern, "Integrity issues with automatically deleting things from the repository," use the word 'automatically' to refer to the implicit deletion of hash resources could occur as a result of PUT or PATCH, rather than just DELETE? If so, then is the concern that a user may delete such a resource unintentionally? My own take on this is that implicit deletion of hash resources via PUT or PATCH is well specified and can be easily articulated and understood.

No, this is not the concern. The concern is that if the repository automatically deletes an internal resource that persists a hash-URI when there are no more properties on that hash-URI, it may be leaving behind links from elsewhere in the repository into that hash-URI resource. But if it doesn’t do that kind of “garbage collection”, there is a monotonic buildup of invisible hash-URI cruft in the repository forever. This is because it is not possible to have a subject of a triple in the reference implementation without there existing a JCR node that persists that subject. Notice the difference between this and the HTTP understanding of #, for which the use of a hash-URI somewhere in the world as a subject of a triple certainly does not imply the existence of some resource that “backs” that URI. It is this mismatch that we are trying to rectify.

> As far as the other concern "implementation burden concerns about requiring full referential integrity functionality," who bears this burden? Is this a concern that that implementing FCREPO-1764 would be burdensome if it had to consider incoming links when implicitly deleting a hash resource? If so, then why is it more burdensome than the current situation of deleting any Fedora resource that happens to be linked to by other resources? The current behaviour is that all objects that link to the deleted resource are all implicitly modified to remove such links. Given that this is an established pattern, I suppose I don't understand why it would be more burdensome to do this for hash resources too.

We’re not talking about the same implementation. It is certainly possible to do this in the reference implementation. The concern is that it may not be a reasonable burden to put on _other_ implementations.

Aaron Birkland

unread,
Oct 29, 2015, 2:32:54 PM10/29/15
to fedor...@googlegroups.com

 
 The concern is that if the repository automatically deletes an internal resource that persists a hash-URI when there are no more properties on that hash-URI, it may be leaving behind links from elsewhere in the repository into that hash-URI resource. But if it doesn’t do that kind of “garbage collection”, there is a monotonic buildup of invisible hash-URI cruft in the repository forever.

I guess I don't quite understand how that problem differs from the one presented when removing an existing normal (non-hash) resource from the repository, in the case where there are links to it.  There already is an established pattern for handling cleanup of inbound links to a deleted resource...(right?).  1801 does not to follow that pattern, and instead opting for "repository resources that cannot be linked under certain circumstances."  Is the cleanup involved in removing a hash resource really fundamentally different (and/or more difficult) than the cleanup involved with regular repository resources?   


We’re not talking about the same implementation. It is certainly possible to do this in the reference implementation. The concern is that it may not be a reasonable burden to put on _other_ implementations.

Do other implementations have the burden of referential integrity in the first place, or is this something unique to the reference implementation?   In other words, is referential integrity of links between repository resources (as it stands now in Fedora) part of the core specification?  If so, then why is this reasonable, while requiring RI for hash resources isn't?  

   -Aaron

A. Soroka

unread,
Oct 29, 2015, 2:41:30 PM10/29/15
to fedor...@googlegroups.com
I don’t think this is really an issue of referential integrity at base. I may be is disagreement with Esmé there. This is an issue of avoiding filling up the repository with invisible, inaccessible JCR nodes that used to persist hash-URIs but now cannot be deleted because someone, somewhere might be referring to them.

The question of referential integrity within the JCR is straightforward. JCR enforces integrity for property types that we are using, and will not allow a modification that leaves the repository in an invalid state. We don’t really have much choice there.

The idea, then, is to avoid exporting assumptions about what the technology supporting an implementation can do in favor of exporting restrictions on what the API is required to do.

---
A. Soroka
The University of Virginia Library

Aaron Birkland

unread,
Oct 29, 2015, 3:08:43 PM10/29/15
to fedor...@googlegroups.com
Hi Adam,

I totally agree with "avoid exporting assumptions about what the technology supporting an implementation can do in favor of exporting restrictions on what the API is required to do", but it seems that 1801's limitations are shaped exactly to fit a particular implementation detail.  Is JCR (and its associated characteristics that motivate this limitation) a fundamental part of Fedora the specification?  

For that matter, is referential integrity a fundamental part of the Fedora specification, a required part of Fedora APIs?

Thanks,


  -Aaron

A. Soroka

unread,
Oct 29, 2015, 3:22:09 PM10/29/15
to fedor...@googlegroups.com
I think we’ve somehow each got the wrong end of the stick here, Aaron. {grin}

1801’s limitations are indeed meant to meet a particular implementation’s limits, and I have no problem with that. I have no problem exporting _restrictions_ of any kind into the future. I want to avoid exporting _assumptions_, which, from a future perspective, become requirements that future implementations will have to meet. Restrictions, on the other hand, are limitations on what future implementations will have to be able to do to implement the Fedora architecture, and I can’t get enough of those to satisfy me. 1801 declares that future implementations will _not_ have to be able to support arbitrary repository resources linking to arbitrary in-repository hash-URIs, whereas the no-1801 state of affairs (the current state of affairs) is that future implementations _will_ have to be able to support that case. I want the state of affairs that is more restrictive now to avoid burdening later implementors.

Please notice that nothing we’ve been discussing would _prevent_ future implementations from providing for this case, if they so chose. We’re just declaring that they mustn’t be _required_ to support it, by stating outright that the reference implementation won’t support it and that the API won’t be written in such a way as to assume it.

The same logic goes for referential integrity.

---
A. Soroka
The University of Virginia Library

Justin Coyne

unread,
Oct 29, 2015, 3:44:50 PM10/29/15
to fedor...@googlegroups.com
Does this mean we will be unable to upgrade past 4.4.0? We are currently structuring our data model so that our repo managed resources point at hash code resources. Will this break in the future? Big -1 if that's the case.

-Justin

A. Soroka

unread,
Oct 29, 2015, 3:48:52 PM10/29/15
to fedor...@googlegroups.com
The restriction to be placed here is that hash-URI resources can only be the objects of triples for which their “non-hash” versions are the subjects. So,

<x> relatesTo <x#a>

and

<y> relatesTo <y#a>

but not

<x> relatesTo <y#a>

---
A. Soroka
The University of Virginia Library

Aaron Birkland

unread,
Oct 29, 2015, 3:59:46 PM10/29/15
to fedor...@googlegroups.com
Hi Adam,

This response was quite helpful, greatly helps clear up specification vs implementation  So let's see if I have it straight (using somewhat sloppy vernacular).  In a post-1801 world:

The following are part of the specification:
- Support creation, modification, and deletion of hash resources via PUT, PATCH, etc to some containing resource
- Allow any repository resource to express 'outbound' links to another repository resource
- Allow a repository resource to link to a hash resource it contains.
- Allow a hash resource to express outbound links to any repository resource (?)

The following are choices our particular implementation makes (in part due to the fact that it happens to built on top of a JCR repository):
- Enforce referential integrity of links between repository resources
- Disallow linking from repository resources to hash resources that are not contained by it

The following are choices another implementation of Fedora could make:
- Not enforce referential integrity (i.e. like Fedora 3)
- Allow links from any repository resources to any hash resource

  -Aaron

A. Soroka

unread,
Oct 29, 2015, 4:32:15 PM10/29/15
to fedor...@googlegroups.com
Yes, I think we’re on the same page. Now, whether we’re on the same page as everyone else remains to be seen. {grin}

The big question, to my mind, is the second point under “choices our particular implementation makes”, that is "Disallow linking…" If that is a great inconvenience to too many folks, then we will need further discussion.

---
A. Soroka
The University of Virginia Library

Simeon Warner

unread,
Oct 29, 2015, 5:57:22 PM10/29/15
to Fedora Tech
The 1801 restriction of not allowing reference to hash URIs of other local resources seems very odd and likely too restrictive to me (Justin's point about PCDM using them seems significant). One seems to end up with the odd situation of 

<local/a> relatesTo <local/a#1>  # allowed
<local/a> relatesTo <remote/b#1> # allowed
<local/a> relatesTo <local/c#1> # not allowed ?!?

If, as is suggested by 1800, Fedora is to treat local hash URIs as "lightweight resources" (no server-managed triples), then should they also be exempted from the restrictions of referential integrity? Is that possible with the way Fedora is mapped onto JCR?

Cheers,
Simeon

A. Soroka

unread,
Oct 29, 2015, 8:08:57 PM10/29/15
to fedor...@googlegroups.com
The scenario you offer is mistaken:

> <local/a> relatesTo <remote/b#1> # allowed

would not, in fact, be allowed. Only the first triple would be allowed. I think this makes the “odd situation” seem a bit less odd?

As for being “exempted from the restrictions of referential integrity”, that is not a choice we currently have in the reference implementation: we cannot change the JCR standard. The immediate aim here is to accommodate the restrictions on our reference implementation without letting them leak into the Fedora architecture as published. It _is_ possible to think of entirely different ways to use JCR to implement Fedora, and that may be what we need to do. I think that would be unfortunate, though. It would cost a lot of time spent rebuilding functionality that we already have in the current project.

PCDM’s requirements are interesting and important to Fedora, of course. But PCDM is a separate project, which (it seems to me) should be informed by but not dependent on the particular characteristics of one LDP implementation.

---
A. Soroka
The University of Virginia Library

Robert Sanderson

unread,
Oct 29, 2015, 8:23:00 PM10/29/15
to Fedora Tech

A local resource cannot refer to a remote fragment URI??

All the -1s.

A. Soroka

unread,
Oct 29, 2015, 8:42:40 PM10/29/15
to fedor...@googlegroups.com
No, Rob, after 1801:

A hash-URI resource in the repository could link to whatever non-hash-URI it wants, and any hash-URI _outside_ the repo.
A non-hash-URI resource in the repository could link to whatever non-hash-URI it wants, any hash-URI _outside_ the repo, and any hash-URI that is a hash on itself.

I did not understand Simeon Warner’s example to be discussing inside-the-repo and outside-the-repo URIs, but perhaps he can explain more fully?

In any event, the fact that a new rule here might take some grasping isn’t really much of an objection, in my opinion, when compared with this:

https://jira.duraspace.org/browse/FCREPO-1764

(especially see the first comment there) and 1764 is a good example of the kind of confusing nonsense the current situation has created. It’s perhaps important to say that this discussion didn’t arise because the Fedora developers enjoy imposing arbitrary restrictions on users of Fedora. (I’m not saying that I don’t, just that this isn’t an example.) It arose because the current design isn’t working. We have conflated a semantic of containment that came from JCR with the “fragment” semantic that # already possessed on the Web and the result is unworkable and increasingly unintelligible. I’m sure that there are other possibilities than 1801, and I would be happy to hear some.

---
A. Soroka
The University of Virginia Library

Simeon Warner

unread,
Oct 29, 2015, 9:02:31 PM10/29/15
to Fedora Tech


On Thursday, October 29, 2015 at 8:42:40 PM UTC-4, ajs6f wrote:
No, Rob, after 1801:

A hash-URI resource in the repository could link to whatever non-hash-URI it wants, and any hash-URI _outside_ the repo.
A non-hash-URI resource in the repository could link to whatever non-hash-URI it wants, any hash-URI _outside_ the repo, and any hash-URI that is a hash on itself.

I did not understand Simeon Warner’s example to be discussing inside-the-repo and outside-the-repo URIs, but perhaps he can explain more fully?

apologies for not being clear enough. I intended 'local'='inside-the-repo', 'remote'='outside-the-repo'. To rewrite with your terms that would be... one seems to end up with the odd situation of 

<inside-the-repo/a> relatesTo <inside-the-repo/a#1>  # allowed
<inside-the-repo/a> relatesTo <outside-the-repo/b#1> # allowed
<inside-the-repo/a> relatesTo <inside-the-repo/c#1> # not allowed ?!?

and I think you just added

<inside-the-repo/a#1> relatesTo <inside-the-repo/a#2>  # not allowed
 
I think the problem will be that if Fedora implements a very restricted subset of RDF then it will harder to use with practices developed in the full-RDF world and that will limit the use-cases supported. My sense is that with a local-to-resource-only restriction on hash URIs that will essentially mean that hash URIs have to be avoided entirely in most cases (and then you run back into the performance issues that 1800 seeks to get around).

Cheers,
Simeon

Robert Sanderson

unread,
Oct 29, 2015, 9:03:17 PM10/29/15
to Fedora Tech
Top-of-post-ing:

> On Fri, Oct 30, 2015 at 9:08 AM, A. Soroka <aj...@virginia.edu> wrote:
> The scenario you offer is mistaken:
> > <local/a> relatesTo <remote/b#1> # allowed

I assumed that Simeon meant a Fedora4 hosted resource without a hash URI (local/a), can reference a hash URI that lives outside that particular instance (remove/b#1) ... but indeed, perhaps Simeon can clarify.

For further reductio ad absurdum... you could run two Fedora4 instances, and f4a.org/rest/a#1 could reference f4b.org/rest/a#2 and vice versa... but not internally? 

This is just not the right solution for the problem.


On Fri, Oct 30, 2015 at 9:42 AM, A. Soroka <aj...@virginia.edu> wrote:
No, Rob, after 1801:

A hash-URI resource in the repository could link to whatever non-hash-URI it wants, and any hash-URI _outside_ the repo.
A non-hash-URI resource in the repository could link to whatever non-hash-URI it wants, any hash-URI _outside_ the repo, and any hash-URI that is a hash on itself.

So ...

Possible:
  * repo/res1 rels external/res1
  * repo/res1 rels external/res1#a
  * repo/res1#a rels external/res1 
  * repo/res1#a rels external/res1#a 

  * repo/res1 rels repo/res1
  * repo/res1 rels repo/res1#a
  * repo/res1 rels repo/res2
  * repo/res1#a rels repo/res1
  * repo/res1#a rels repo/res2


Not possible:
  * repo/res1 rels repo/res2#a
  * repo/res1#a rels repo/res1#b
  * repo/res1#a rels repo/res2#a

 
In any event, the fact that a new rule here might take some grasping isn’t really much of an objection, in my opinion, when compared with this:

https://jira.duraspace.org/browse/FCREPO-1764

(especially see the first comment there) and 1764 is a good example of the kind of confusing nonsense the current situation has created. It’s perhaps important to say that this discussion didn’t arise because the Fedora developers enjoy imposing arbitrary restrictions on users of Fedora. (I’m not saying that I don’t, just that this isn’t an example.) It arose because the current design isn’t working. We have conflated a semantic of containment that came from JCR with the “fragment” semantic that # already possessed on the Web and the result is unworkable and increasingly unintelligible. I’m sure that there are other possibilities than 1801, and I would be happy to hear some.

+1 to possibilities other than #1801 :)

 Rob

A. Soroka

unread,
Oct 29, 2015, 9:08:49 PM10/29/15
to fedor...@googlegroups.com
On Oct 29, 2015, at 9:02 PM, Simeon Warner <simeon...@gmail.com> wrote:
>
> apologies for not being clear enough. I intended 'local'='inside-the-repo', 'remote'='outside-the-repo'. To rewrite with your terms that would be... one seems to end up with the odd situation of
>
> <inside-the-repo/a> relatesTo <inside-the-repo/a#1> # allowed
> <inside-the-repo/a> relatesTo <outside-the-repo/b#1> # allowed
> <inside-the-repo/a> relatesTo <inside-the-repo/c#1> # not allowed ?!?
>
> and I think you just added
>
> <inside-the-repo/a#1> relatesTo <inside-the-repo/a#2> # not allowed

No, that would be fine, although my explanation didn’t correctly provide for it, for which I apologize.

> I think the problem will be that if Fedora implements a very restricted subset of RDF then it will harder to use with practices developed in the full-RDF world and that will limit the use-cases supported. My sense is that with a local-to-resource-only restriction on hash URIs that will essentially mean that hash URIs have to be avoided entirely in most cases (and then you run back into the performance issues that 1800 seeks to get around).

Certainly this is a restrictive practice, and I don’t feel any need to defend that fact. Fedora, as I never tire of saying, is _not_ a triplestore. It’s an object repository that provides its API as an enhanced form of LDP. It is entirely to be expected that many practices from the “full-RDF world” will not be available with Fedora. That is what real triplestores are for. {grin}

But I will have to disagree with you about avoiding hash-URIs entirely. We’ve already heard from several users that this restricted form would be suitable for their uses.

A. Soroka

unread,
Oct 29, 2015, 9:20:31 PM10/29/15
to fedor...@googlegroups.com
Simeon has clarified that point and that discussion can go on in that part of this thread— I’ll extract the other points here:

> For further reductio ad absurdum... you could run two Fedora4 instances, and f4a.org/rest/a#1 could reference f4b.org/rest/a#2 and vice versa... but not internally?

Certainly. It is indeed absurd to imagine that two independent and unconnected servers in different parts of the Web could somehow without cost manage to align their resources and manage links interdependently— just because they are both implemented with the same brand of software. That is as much as to be surprised that two different websites could accidentally host copies of the same file, when, after all, aren’t they both running Apache httpd? {grin}

> This is just not the right solution for the problem.

I (and I’m sure other Fedora developers) would be happy to hear about other solutions that work within the limits of the reference implementation. As I wrote earlier in this thread, I won’t claim that 1801 is a _necessary_ change, but I do claim that the current state of affairs is intolerable, and that change is necessary. Perhaps you have another design in mind?

---
A. Soroka
The University of Virginia Library

Chris Beer

unread,
Oct 29, 2015, 10:02:54 PM10/29/15
to fedor...@googlegroups.com
To me, Tom Johnson’s approach rdf-ldp [1] is an attractive one. He has (perhaps wisely?) segregated the management triples from user content.

The upside is we don’t have to worry about the content of user data at all, we just manage the server managed triples and the LDP API. The downside is we may lose a lot of the tools that are exposed by Modeshape and/or Infinispan for managing, discovering, and manipulating content (and, if we do lose that, we may want to reconsider using those platforms or JCR in general).

Tom’s approach also implements the LDP notion that resources are named graphs. I’m not a big fan, but it seems to be the standard understanding of the LDP spec, and would allow us to exchange our current problems with new and different ones.

Tom Johnson

unread,
Oct 29, 2015, 10:11:58 PM10/29/15
to fedor...@googlegroups.com
Chipping in for the -1's on #1801.

As Justin pointed out, this restriction would make Fedora virtually unusable for PCDM adopters, and similarly for any number of otherwise reasonable data models.

- Tom

--
-Tom Johnson

Tom Johnson

unread,
Oct 29, 2015, 11:22:36 PM10/29/15
to fedor...@googlegroups.com
Tom’s approach also implements the LDP notion that resources are named graphs. I’m not a big fan, but it seems to be the standard understanding of the LDP spec, and would allow us to exchange our current problems with new and different ones.

I would quibble with this. `rdf-ldp` relies on named graphs because it yields a clear target for `RDF::Repository` compatibility in quad support.  It's a natural implementation in that both RDFSource's and named graphs are represented as a (URI, Graph) pair; but on the client side, I don't think it matters one way or another.

- Tom
--
-Tom Johnson

A. Soroka

unread,
Oct 30, 2015, 7:39:33 AM10/30/15
to fedor...@googlegroups.com
Tom—

You’ve clearly investigated LDP implementation in a pretty serious way. Do you have any alternative-to-1801 suggestions on how to deal with the very real problems in the current Fedora reference implementation?

---
A. Soroka
The University of Virginia Library

A. Soroka

unread,
Oct 30, 2015, 7:54:20 AM10/30/15
to fedor...@googlegroups.com
Chris (and/or Tom)—

Can you explain a little more about what it means to have “segregated the management triples from user content”? Does this mean that in Tom’s approach, what we’ve been calling “server-managed triples” just aren’t available in a request for ordinary content, but must be retrieved separately (perhaps through some magic endpoint, like Fedora’s current fcr:metadata)?

I may be mistaken, but am I right in reading that the approach Tom has taken hasn’t included actually persisting data? If that’s so, this makes it a little harder for me to see how it overcomes the problems we now have using JCR to do just that. I’m not suggesting that we are wedded irrevocably to the JCR-based implementation we currently have, but dispensing with it would cost a great deal of time and effort just to rebuild the work of the last few years. That should give us pause.

I’m now intensely opposed to the notion that resources in a Fedora repository would be named graphs (by which I assume you mean that they could contain arbitrary triples). I can’t see that the fact that it is a common understanding of LDP counts for much— Fedora is an object repository that happens to implement LDP as a convenient way of offering a standardized API. LDP implementation is _not_ the purpose of the Fedora project.

---
A. Soroka
The University of Virginia Library

A. Soroka

unread,
Oct 30, 2015, 8:14:16 AM10/30/15
to fedor...@googlegroups.com
On a second reading, something peculiar jumps out at me in your remark, Tom: how is it that PCDM (a data model that is supposed to be agnostic to the implementing software and usable for modeling any repository content at all) actually collapses in the absence of one very specific type of identifier syntax? Shouldn’t identifiers be opaque in PCDM? That seems extraordinarily fragile, and then PCDM seems like a problematic use case for judging this question of hash-URIs in Fedora.

---
A. Soroka
The University of Virginia Library

> On Oct 29, 2015, at 10:11 PM, Tom Johnson <johns...@gmail.com> wrote:
>

Justin Coyne

unread,
Oct 30, 2015, 8:17:34 AM10/30/15
to fedora-tech@googlegroups.com Tech
We're using hash based URIs because ORE requires a single writer (manipulating a linked list).  Putting all the nodes of the list in a single resource is a simple way of doing that. Otherwise we need to have a more complex locking/transaction strategy that is not commonly available in various LDP implementations.

-Jusint

A. Soroka

unread,
Oct 30, 2015, 8:20:54 AM10/30/15
to fedor...@googlegroups.com
I’m sorry, Justin, but that doesn’t really help me. Are you saying that PCDM requires ORE notions, and ORE requires atomic transactions against linked lists?

---
A. Soroka
The University of Virginia Library

Esmé Cowles

unread,
Oct 30, 2015, 8:48:58 AM10/30/15
to fedor...@googlegroups.com
Adam-

PCDM requires ORE notions (specifically ore:Proxy for ordering), and the PCDM group decided to implement ordering of the proxies with a doubly-linked list, using iana:first/last/prev/next predicates to link to the head/tail of the list and between items in the list. Doing that efficiently requires atomic transactions against the linked list, hence the move to using hash URIs on a resource to store the linked list, instead of having each proxy be a container.

-Esme

A. Soroka

unread,
Oct 30, 2015, 9:29:32 AM10/30/15
to fedor...@googlegroups.com
Thanks, Esmé and Justin, this detail helps.

So my understanding now is: PCDM made several specific implementation decisions that eventually led to using hash-URIs to “imitate” atomic transactions. If that hash-URI facility went away, PCDM would have to remake those decisions, perhaps finding a different way to implement ordering or atomic transactions. I see that it would mean real work for people committed to PCDM, but I don’t see that it would be a terrible expense for solving the problems that are now present in the Fedora reference implementation for _all_ users of Fedora, and I haven’t yet heard any other suggestions for solving those problems…

---
A. Soroka
The University of Virginia Library

Justin Coyne

unread,
Oct 30, 2015, 9:34:01 AM10/30/15
to fedora-tech@googlegroups.com Tech
The major problem I have is that the change you mention would be a backwards incompatible one.  We can and are using Hash based URIs now in Fedora. We've invested significant time into working on this approach. It's going to cost us significantly to have to take another approach.  People who have deployed this solution are going to have to migrate their data to a different structure. That will be costly too.

I'm profoundly disappointed.

-J

A. Soroka

unread,
Oct 30, 2015, 9:41:41 AM10/30/15
to fedor...@googlegroups.com
The back-compatitiblity issue is a real one, and I don’t pretend that I have any simple amelioration for it. But I would like to know how extensive it actually is before assuming that it’s a major blocker or requires a special migration tool.

I am myself disappointed that no one from the PCDM community (which seems, at a glance, to be the only segment of the Fedora community that is speaking up against this idea) has offered any alternative proposal. I wouldn’t mind taking more time over this if there’s a chance of a good alternative that we can consider, but there are outstanding major bugs (1764 is a good example) that are waiting on some resolution to this design problem.

---
A. Soroka
The University of Virginia Library

Aaron Birkland

unread,
Oct 30, 2015, 9:46:54 AM10/30/15
to fedor...@googlegroups.com
There was actually a use case submitted to the APi Extension effort (API-X) related to PCDM and linked lists:

Ultimately, the problems related to 1801 lie in how applications wish to use Fedora as a piece of web architecture.  Neither ORE or PCDM require atomic transactions against linked lists.   Really, all they want to do is issue a PUT or PATCH to modify a document on the web:  one that contains several resources (in the RDF sense) named with hash URIs.  Fedora's hash URI support ostensibly made that possible and elegant.  They also want to be able to link to hash URIs contained within these documents.  With 1801, this is forbidden.  Now if they want to represent their model in Fedora *and* link to resources that had been hash resources, these hash resources need to be first-order Fedora resources.  *This* turns what had been a simple, standard, understandable PUT or PATCH and turns it into sequence of operations on distinct fedora resources.  

In the context of the API-X discussion, we figured there would be a common pattern where an extension would expose a 'nice' document on the web that clients can interact with in standard ways... one that Fedora cannot support via its own APIs.  A PUT or PATCH to that document may imply  a sequence of modifications to Fedora objects to persist the changes, but the extension would be an actor in a transaction that is completely hidden from the user - de-coupling the client from the complexities of interacting with Fedora, or from having to have knowledge of how a particular domain model or RDF construct maps to Fedora resources.  

So API-X I think offers us a way out, if Fedora goes forward with 1801.  That being said, re-thinking 1801 and figuring out some way that Fedora can natively link to hash resources would be a really big win.  

   -Aaron

Esmé Cowles

unread,
Oct 30, 2015, 9:47:43 AM10/30/15
to fedor...@googlegroups.com
Adam-

The alternative is the current behavior: allow resources to link to hash URI fragments in other resources, and use the existing JCR referential integrity functionality to avoid accumulating unused nodes.

I agree with Justin that it would be significant inconvenience to remove features like this, and would punish early adopters. So far, I haven't heard a clear consensus in favor of fcrepo-1801, and the main use case is making second and further implementations of the Fedora API less burdensome. I have a hard time seeing how we can go forward with a breaking change like this unless there is a real use case the requires it.

-Esme

Justin Coyne

unread,
Oct 30, 2015, 9:51:23 AM10/30/15
to fedora-tech@googlegroups.com Tech
Aaron,

  While an API-X option may be a workable alternative for FCR4, it's not preferred as we would like PCDM to work on multiple LDP implementations. Using an API-X solution would tie us to Fedora.

-Justin

David Chandek-Stark

unread,
Oct 30, 2015, 9:58:31 AM10/30/15
to fedor...@googlegroups.com
If it’s of any use to consider the view of someone much less familiar with the technical issues around the relevant systems, standards and implementations ...

1) I would emphasize A. Soroka’s statement "LDP implementation is _not_ the purpose of the Fedora project”.  While the enthusiasm for linked data is I think broadly shared, LDP should not interfere with fundamental uses and functions of the repository without respect to LDP concerns.
 
AND … 

2) In the process of migrating from F3 to F4 I find conversations like this one disconcerting, especially when it raises questions of the underlying architecture.  Since in F3 I put RDF statements in datastreams which are content agnostic, I could use features like blank nodes without issue.  Of course, I can possibly still do that in F4 with files, but then I’m not really taking advantage of what F4 mainly offers over F3, which is the ability to natively handle RDF (or at least in a more general way than F3’s RELS-EXT and RELS-INT).

I hope that we can find a way forward ...

Cheers,
David

Aaron Birkland

unread,
Oct 30, 2015, 10:14:27 AM10/30/15
to fedor...@googlegroups.com
Hi Justin,

Actually, API-X offers a very important alternative to placing features in the Fedora core, in a way that *lessens* the dependence of an application on knowledge of a Fedora repository, and how to interact with it.  Consider a "linked data" extension that allowed the POST, PUT, PATCH to Fedora resources to contain forbidden constructs such as blank nodes or links to hash URIs.  The effect of such an extension would make Fedora appear to the outside world (and to the components of your infrastructure that use its LDP API) to be a more 'normal' LDP implementation like any of the others on the w3c list, free of Fedora's unique constraints on RDF. 

Thought of another way, API-X can be used to reduce the business risk of breaking changes in Fedora's APIs.  If your application depends on a particular behaviour of hash URIs, and Fedora introduces a backwards-incompatible change like 1801, it would be possible to change the apparent behaviour of Fedora resources with an extension, rather than changing your infrastructure to adapt to Fedora.  

  -Aaron

A. Soroka

unread,
Oct 30, 2015, 10:19:45 AM10/30/15
to fedor...@googlegroups.com
+1 to what Aaron is saying here.

I think there are probably at least a few graybeards on this list who are nodding right now and thinking, “Yes, this is why earlier versions of Fedora were equipped with a disseminator architecture, to avoid too-close coupling between Fedora and other systems.” There were plenty of difficulties with the implementation of that architecture, but the idea was and remains sound. I’m hopeful that API-X will bring the idea forward with a really excellent implementation.

---
A. Soroka
The University of Virginia Library

A. Soroka

unread,
Oct 30, 2015, 10:28:57 AM10/30/15
to fedor...@googlegroups.com
Esmé—

That’s not the current behavior. The current behavior is broken, and doesn’t avoid accumulating unused nodes. They just accumulate and can never be removed. We can commit to using JCR’s functions for this; but it’s one more leak from JCR into the Fedora API.

Getting other Fedora implementations into play (and thereby reducing the importance of the current reference codebase) is by far the most important priority I have for this project. The use case is simple: reduce (and eventually eliminate) the dependence of the Fedora community on a single piece of software maintained largely by volunteer effort, which is continually being tugged in different directions.

---
A. Soroka
The University of Virginia Library

Justin Coyne

unread,
Oct 30, 2015, 10:29:09 AM10/30/15
to fedora-tech@googlegroups.com Tech
I guess I haven't given up on trying to push Fedora so that it doesn't need a extension architecture.  I think Fedora may need some refactoring to get us there, but I think it's possible to do everything we need in an LDP focused direction.  I suspect the current design of Fedora has limited your capability to get there.  Is it too early to begin talking about Fedora 5?


-Justin

PS: As a programmer on an OSS project I realize you may resent me for telling you what I think you should do and not for doing it myself.  But I really would love to pitch in if I had the availability.  So, apologies.


Benjamin Armintor

unread,
Oct 30, 2015, 10:33:55 AM10/30/15
to fedor...@googlegroups.com
All-

I don't really want to wade into the PCDM side of things (though severing the early adopters seems cruel), nor do I want to dwell too much on the folly of pushing JCR implementation characteristics into the network API in a rather unspecified way. However, I would like to note that prohibiting resolution of URIs with a hash:

1. Makes little sense to me, as allowing them seems like correction of an oversight/inadequacy in FCR 3
2. Makes my FCR 4 use case of managing and providing Linked Open Data more annoying, enough that I would probably just look at a triplestore backed LDP implementation and restrict FCR 4 to blob management.

 API-X is a non-starter for me as a *Hydra* committer: If we're not comfortable articulating it as part of a API re-implementable outside of the reference implementation, I'm -1 to writing against it in the Hydra stack.

I think FCREPO-1764 has a conversation that has moved on from the description, which is very frustrating. But it also has an implicit feature request, and that too is annoying. SPARQL update shouldn't be a trap that ensnares you because of magic triples. That's a bug, and it should be resolved by either restricting the selected triples to non-"system" ones or by allowing all the triples to be updated.

FCREPO-1764(A) requests that a resource should be deleted when "its" graph is empty. I can see where an implementation of something might want that as a convenience, but I am -1 on that feature request. Send a DELETE to the resource or its container if you want it removed. Doing otherwise tramples my ability to have identified resources without description.

The reporter needs to encouraged (looking at you, Trey) to split that ticket.

Some of the other concerns seem like they might be addressable by understanding some predicates to be invertible- there's going to be a similar transactionality issue in any LDP implementation with moving a resource between containers, right? Unless you can characterize all the updates in a way POSTable to a single resource?

Groggily yours,
Ben

A. Soroka

unread,
Oct 30, 2015, 10:39:12 AM10/30/15
to fedor...@googlegroups.com
Ben, I may be misunderstanding you, but I don’t see how this is possible in HTTP.

You can’t send a message to <foo#bar>. It goes to <foo>. You literally can’t send a message to a hash-URI, because from HTTP’s point of view, it’s not a resource. From RDF’s point of view, it _is_ an independent subject, so it can be addressed in SPARQL, which is why this ticket came up in the form that it did.


---
A. Soroka
The University of Virginia Library

Benjamin Armintor

unread,
Oct 30, 2015, 10:40:16 AM10/30/15
to fedor...@googlegroups.com
I respect what you're saying here, but I have to disagree: If our response to LDP at this point is to require another less idiosyncratic LDP implementation that we commit to writing around FCR 4 and porting forward, I have to ask why I don't just wire ISPN into Blazegraph and call it a day.

To borrow David's and Adam's point, if "LDP implementation is _not_ the purpose of the Fedora project” that's fine, but what *is* the purpose? Because "Fedora 4: LDP with arbitrary URI restrictions and the word 'Preservation'" is not compelling.

Esmé Cowles

unread,
Oct 30, 2015, 10:41:45 AM10/30/15
to fedor...@googlegroups.com
Adam-

You're right: only the first part of what I wrote was the current behavior (allow resources to link to hash URI fragments in other resources). So the alternatives we're talking about are:

1a. current behavior: repository nodes can link to hash URIs of other nodes
1b. fcrepo-1801: disallow that behavior

Regardless of which of these we choose, we still have a separate problem of how unwanted hash URI nodes get removed. So far, I've heard:

2a. automatically delete them when they have no properties and/or no references
2b. develop some HTTP convention for deleting them

My vote is 1a and 2b.

I think the best way to square 1a with an API specification is to not specify referential integrity. The reference implementation is using a platform that has referential integrity, so it makes sense to use it. But there's no reason why some other implementation couldn't make other choices, and enjoy the benefits and limitations of those choices.

-Esme

Benjamin Armintor

unread,
Oct 30, 2015, 10:45:12 AM10/30/15
to fedor...@googlegroups.com
Adam,

Do you mean HTML? Every HTTP implementation I know allows messages to <foo#bar>. It's true that if <foo> is a plain old file it has the behavior of collapsing the resources, but HTTP is not unintermediated access to storage, it's requests sent to a process listening on TCP.

Esmé Cowles

unread,
Oct 30, 2015, 10:49:41 AM10/30/15
to fedor...@googlegroups.com
Ben-

But the problem is that the #bar isn't included in the HTTP request sent to Fedora (or any web service):

$ curl -v http://localhost:8983/fedora/rest/foo#bar
* Trying ::1...
* Connected to localhost (::1) port 8983 (#0)
> GET /fedora/rest/foo HTTP/1.1
> Host: localhost:8983
> User-Agent: curl/7.43.0
> Accept: */*

If the hash was included in the request, then it would be simple to just update our DELETE implementation. But we're not getting it at all.

-Esme

Benjamin Armintor

unread,
Oct 30, 2015, 10:53:36 AM10/30/15
to fedor...@googlegroups.com
Esme-

Hmm. Right. This is true. I see that we are basically replaying

Benjamin Armintor

unread,
Oct 30, 2015, 11:07:33 AM10/30/15
to fedor...@googlegroups.com
A quick search of my email tells me I have a long-term inability to remember that fragment identifiers aren't included in the HTTP request. So I probably can't blame it all on grogginess.

That said, SPARQL-UPDATE 4.2.2 http://www.w3.org/Submission/SPARQL-Update/ suggests another, less ambiguous syntax for deleting the hash URI resources. Is that an option?

Robert Sanderson

unread,
Oct 30, 2015, 11:11:55 AM10/30/15
to Fedora Tech

A question for y'all to discuss while I'm in the air...

Can local/res1, an RDF Source, refer to local/res2#a, a NonRDFSource?

According to the current description, no.
According to sanity, yes.

You have 12 hours... go!

Rob

Sent from S6 Edge, please excuse any typos or brevity

A. Soroka

unread,
Oct 30, 2015, 11:20:34 AM10/30/15
to fedor...@googlegroups.com
Rob, <local/res2#a> cannot exist as an independent resource in the current implementation, so the problem doesn’t seem to arise. I would be very happy to make that the case for all hash-URIs (make them just resolve to their “underlying” resource), which would eliminate _all_ of the problems under discussion at a stroke, but that would mean discarding the ability to use them as the subjects of triples in the current design.

You, amongst others, seem to have some considerable desire to keep doing that.

---
A. Soroka
The University of Virginia Library

A. Soroka

unread,
Oct 30, 2015, 11:33:38 AM10/30/15
to fedor...@googlegroups.com
I appreciate the idea here, Esmé, but I’m not convinced that doing one thing with the reference implementation and just ignoring it in a specifying document is a healthy approach, given the early stages of specification. But that may very well be the best option.

I’m very much against 2b. For those who are concerned for Fedora to avoid idiosyncratic behaviors, that would seem to me to be a real red flag. “Tunneling” the special nature of a particular kind of URI through an HTTP message is going to be hard to port, by anyone, to anything.

---
A. Soroka
The University of Virginia Library

---
A. Soroka
The University of Virginia Library

Benjamin Armintor

unread,
Oct 30, 2015, 11:51:11 AM10/30/15
to fedor...@googlegroups.com
I agree that some bespoke tunneling is strange and unsustainable. I don't have FCR 4 fired up here, but can someone answer off hand whether sending a SPARQL-UPDATE:

DROP GRAPH <foo#bar>

... is a workable approach?

- Ben

Tom Johnson

unread,
Oct 30, 2015, 11:56:57 AM10/30/15
to fedor...@googlegroups.com
I'm struggling to keep up with this fast moving thread early in the morning, but I'm going to take a hack at responding usefully to the `rdf-ldp` questions. I may jump into the fray on other issues further down later.

You’ve clearly investigated LDP implementation in a pretty serious way. Do you have any alternative-to-1801 suggestions on how to deal with the very real problems in the current Fedora reference implementation?

Not really. It seems to me like the problem originates with JCR; and specifically with the practical restriction on the allowable subjects of statements. This is what causes problems for PCDM and other "otherwise reasonable data models". I'd like to suggest a specific requirement that is somewhat looser than "atomic transactions against linked lists": Fedora *really* needs a mechanism that allows repository resources to be ordered (and for one resource to belong to multiple ordered groups) that doesn't explode the number of required HTTP requests as the list grows. How would people propose ordering after 1801?

There are certainly other cases in graph modeling where we would want to use intermediary nodes to create an indirect relationship between resources; many of those will suffer similar scalability problems. That said, I think I would be happy just for a vaguely reasonable ordering proposal that would be possible in Fedora after 1801.

> Can you explain a little more about what it means to have “segregated the management triples from user content”? Does this mean that in Tom’s approach, what we’ve been calling “server-managed triples” just aren’t available in a request for ordinary content, but must be retrieved separately (perhaps through some magic endpoint, like Fedora’s current fcr:metadata)?

This is not quite right. `rdf-ldp` makes a distinction between internal server triples (those that are required for the server to know things like the Resource's interaction model), and LDP-server-managed triples[0]. The former are maintained separately from the Resource's state. Because the goal is to require only an quad-enabled `RDF::Repository` implementation, these are stored in a named graph. I think that's totally incidental, though, and those graphs are not exposed to the client at all. The point is that the server doesn't require any special triples in resource representations to do its work. The LDP-server-managed triples (membership and containment triples) are handled as required by the spec. 

I think Fedora developers should take note that any triples with special status in Fedora that is not required by LDP *are not* "LDP-server-managed triples"; it would probably be a good idea to call them something unlikely to be confused with this bit of specification jargon.

I may be mistaken, but am I right in reading that the approach Tom has taken hasn’t included actually persisting data?

The implementation persists data to an abstract interface: `RDF::Repository`[1]. That interface has a reference implementation that runs in memory. There's a lack of feature complete, production ready implementations that use a persistent datastore, and while that state of affairs persists, I'm leaving the in memory implementation as the default one. It's fairly trivial to swap in one of the dozen-or-so non-feature complete, non-production ready implementations at your own risk. My current favorite is `RDF::Blazegraph`[2], which is ready to ship apart from some caveats about blank node handling on PATCH requests.

 If that’s so, this makes it a little harder for me to see how it overcomes the problems we now have using JCR to do just that. I’m not suggesting that we are wedded irrevocably to the JCR-based implementation we currently have, but dispensing with it would cost a great deal of time and effort just to rebuild the work of the last few years.

I think a starting point would be to reconsider whether internal server properties need to be visible to clients. The `rdf-ldp` approach is informed by the goal of being a generic LDP implementation; it want's to be un-opinionated about what clients can express in their resource representations. I understand that Fedora has other goals, but I'm not sure there's been enough consideration about which "server managed" triples are helping clients with their repository use cases and which are implementation cruft. The client shouldn't need to see the cruft, much less dance around it in PUT and PATCH updates.

- Tom


-Tom Johnson

Esmé Cowles

unread,
Oct 30, 2015, 11:59:38 AM10/30/15
to fedor...@googlegroups.com
Ben-

I like that -- existing standard we're already supporting, with explicit delete syntax that has support for hash URIs builtin.

Doing that SPARQL Update command against fcrepo4 doesn't do anything now, but I hope we can add support for it in the SPARQL Update handling.

-Esme

Tom Johnson

unread,
Oct 30, 2015, 12:08:23 PM10/30/15
to fedor...@googlegroups.com
>  To borrow David's and Adam's point, if "LDP implementation is _not_ the purpose of the Fedora project” that's fine, but what *is* the purpose? Because "Fedora 4: LDP with arbitrary URI restrictions and the word 'Preservation'" is not compelling.

It would be really great if this question could be resolved clearly; it seems like there's a severe lack of unity on this point.  It's OK with me that Fedora is not a generic LDP implementation, but when Fedora's functionality seems to run counter to general LDP use cases the reasons given feel terribly vague to me. "LDP implementation is _not_ the purpose of the Fedora project", said in the absence of real discussion about the use cases in question, reads a bit like "don't care".

If we can be articulate about how Fedora's repository modelling requirements differ from general RDF modelling requirements, how Fedora's CRUD needs differ from LDP's CRUD needs, etc... conversations like this would be much easier.

Independent of that larger ask, I do hope that 1801 can be considered with respect to how it affects ordering. This seems to me to be a pretty basic repository requirement.

- Tom

A. Soroka

unread,
Oct 30, 2015, 12:11:48 PM10/30/15
to fedor...@googlegroups.com
I think that’s a fair question, Ben. In the absence of the “traditional” Fedora model and the claims it made about the connection between durability and public ontology, I would have to say that there really isn’t much left to advertise. If Fedora is just a idiosyncratic LDP implementation, it surely isn’t very attractive, either to use or to work on.

---
A. Soroka
The University of Virginia Library

A. Soroka

unread,
Oct 30, 2015, 12:25:50 PM10/30/15
to fedor...@googlegroups.com
Tom—

To be very explicit, when it comes to LDP implementation for its own sake, I (speaking only for myself here) really don’t care. If I wanted to work on an LDP implementation, there are already several to choose from and I would work on one of them.

I feel like this topic has already been hashed out (see what I did there?) pretty thoroughly in other threads on this list, so I’ll keep my response relatively brief (for me):

Fedora is an object repository. Until the adoption of an LDP-centric API during the evolution of Fedora 4, that meant a repository for objects in the “traditional” Fedora data model. After that change, it’s not yet become totally clear what that means, as various kinds of object models are considered and discussed (from one point of view, PCDM could be considered an example of this). If that ceases to have any meaning, if Fedora really goes into the future without any object model shared through the whole community, then it really isn’t an object repository any more. I don’t know what it would be at that point. It certainly wouldn’t be a very interesting project.

But let’s say that we do successfully adopt a new object model. What’s the engagement with LDP? The API for that object model should conform to LDP. That is to say, an LDP client should be able to interact with a Fedora API instance. But that’s a different claim than it would be to say that Fedora is an LDP implementation. The Fedora API (as hopefully it will be specified and published) exists for the purpose of publishing objects in that object model. LDP is a useful tool to that end. That’s my understanding of the relationship. That means that implementing LDP for its own sake is simply not a goal of the project.

One of the reasons this seems vague to you might be because the object model is so unclear. I sympathize. I feel to this day that discarding the “traditional” Fedora model was a tremendous mistake. I hope we can rectify it by developing a new and more modern model, but that’s where we are, to my eye.

---
A. Soroka
The University of Virginia Library

A. Soroka

unread,
Oct 30, 2015, 12:27:20 PM10/30/15
to fedor...@googlegroups.com
That is at least pleasantly explicit. Then for the case of FCREPO-1764, the idea would be to say that deleting all of the triples involving a #-URI does not delete that resource, and that if you want to do that, you need to explicitly delete it? That would also help Ben create his named but content-free resources…

It seems like an improvement, but it does require the user to do something that seems very strange from the HTTP point of view: it requires them to manage the lifecycle of a #-URI independently of its non-# “parent”. It’s hard for me to think of another HTTP situation where I have to remember to delete a hash-URI while leaving the URI alone.

I keep coming back to the nagging thought that we don’t really understand yet what #-URIs represent in Fedora.

---
A. Soroka
The University of Virginia Library

Tom Johnson

unread,
Oct 30, 2015, 12:55:07 PM10/30/15
to fedor...@googlegroups.com
To be very explicit, when it comes to LDP implementation for its own sake, I (speaking only for myself here) really don’t care.

I don't think I'm arguing for "LDP implementation for its own sake". I think I'm simply calling out a set of use cases involving repository objects with internal structure (e.g. ORE proxy nodes) where the structural elements have relationships to other resources in the repository. Ordering is the main example; and *that* is what I'm concerned I'm hearing "don't care" about. A large user community (Hydra/PCDM) does care about that use case very much, and has been engaged in modelling and implementation for it actively over the last year.

From the rest of what you've said I'm still not able to identify what the goal of the project is or what definition of an "object repository" motivates the idiosyncrasies in the LDP implementation (can we at least agree that Fedora _has_ an implementation; even if it's not true that it _is_ one?). I have the creeping sense that if I did understand, it would be immediately obvious that LDP is a bad fit for those goals.

I think LDP is a good fit for my use cases, and for Hydra's, so I'm eager to understand the disconnect.

- Tom

Mike Giarlo (Google Groups)

unread,
Oct 30, 2015, 12:58:39 PM10/30/15
to fedor...@googlegroups.com
> After that change, it’s not yet become totally clear what that
> means, as various kinds of object models are considered and
> discussed (from one point of view, PCDM could be considered
> an example of this).

Based on what I'm seeing in both the Hydra and Islandora communities -- and what I expect to see in the Hydra-in-a-Box project -- PCDM *will be* the object model that you and many of us wish to see in Fedoraland. Development atop PCDM is quickly become mainstream development in Hydra and will be the basis of the next major version of Islandora. It's the de facto object model at the moment and we should possibly consider formalizing that.

It doesn't seem strategic (to me!) for Fedora committers to consider making changes to the codebase that punish the PCDM community (which may, let's face it, become the sustaining core of Fedora 4 given the growth of both Hydra and Islandora).

For now I'm -1 to 1801, and hopeful that further discussion in the thread can turn up alternatives that are good enough to fix what's broken without breaking ordering in PCDM.

--
Michael J. Giarlo
Technical Manager, Hydra-in-a-Box project
Software Architect, Digital Library Systems & Services
Stanford University Libraries
mjgi...@stanford.edu
+1 (206) 402-4473

________________________________________
From: fedor...@googlegroups.com <fedor...@googlegroups.com> on behalf of A. Soroka <aj...@virginia.edu>
Sent: Friday, October 30, 2015 09:25
To: fedor...@googlegroups.com
Subject: Re: [fedora-tech] Hash URI Fragments

A. Soroka

unread,
Oct 30, 2015, 1:05:17 PM10/30/15
to fedor...@googlegroups.com
Mike—

I appreciate your point, and PCDM is a great example of the kind of discussion of object models that I personally think the Fedora community very much needs, but I think it’s really premature to call it the "de facto object model" for Fedora. It is a growing and important effort shared by a good portion of the Fedora community, but that’s all at the moment, and in my opinion it is much too early to even think about formalizing any such designation.

And to honest, I don’t think language like "punish the PCDM community” is helpful. No one is trying to punish anyone here. There are some tricky problems and limitations in the current reference implementation, and we’re trying to find good ways to work with/around them and get good functionality to the Fedora users.

---
A. Soroka
The University of Virginia Library

A. Soroka

unread,
Oct 30, 2015, 1:06:25 PM10/30/15
to fedor...@googlegroups.com
Tom—

I’m getting a bit of a sense of deja vu here, because we’ve been over this ground before. Let’s take a simple example:

Why does Fedora not permit arbitrary subjects in triples, a restriction about which you have vociferously complained? Because the purpose of Fedora is not to publish arbitrary RDF, it is to publish representations of objects, and when you are doing that, the triples you publish at an identifier/URI represent the properties of the object with that identifier/URI, and it makes no sense to publish a property of object B at the identifier for object A. And we could give similar explanations for most of the other restrictions that Fedora imposes. (Some, admittedly, come out of implementation limitations or simple lack of resources.)

---
A. Soroka
The University of Virginia Library

A. Soroka

unread,
Oct 30, 2015, 1:08:36 PM10/30/15
to fedor...@googlegroups.com
Ordering, as I know you know well, Tom, has always been a tricky issue with RDF of any kind. There have been a few proposals for making ordering a first-class function in Fedora, but they have met objections of one kind or another, or simply never gained much priority.

In fact the JCR underneath the current reference implementation does allow some kinds of ordering, and we could develop ways to expose that, but it’s not immediately clear to me how we do that in an LDP-sensible way, which I would think is important. Are you thinking of some kind of system of headers or the like? Are you thinking that a single total order is associated to a single container?

---
A. Soroka
The University of Virginia Library

David Chandek-Stark

unread,
Oct 30, 2015, 1:09:34 PM10/30/15
to fedor...@googlegroups.com
FWIW, to the extent that I hitched my wagon to “LDP implementation is not the purpose of Fedora”, Adam’s clarification is useful.  The intention on my part was certainly not to imply “don’t care”. I do care :).  But I think, as Ben also indicated, if LDP is the *principal* driver, then one has to wonder what’s the case for F4 (a question BTW that I take seriously).  We didn’t get into Fedora (or Hydra) for linked data; we needed a digital asset preservation environment, and that’s still the primary concern.  The evolution of linked data is, from my point of view, all to the good, but doesn’t eclipse this fundamental purpose of the repository.

—D

From: <fedor...@googlegroups.com> on behalf of "A. Soroka" <aj...@virginia.edu>
Reply-To: "fedor...@googlegroups.com" <fedor...@googlegroups.com>
Date: Friday, October 30, 2015 at 12:25 PM
To: "fedor...@googlegroups.com" <fedor...@googlegroups.com>
Subject: Re: [fedora-tech] Hash URI Fragments

Neil Jefferies

unread,
Oct 30, 2015, 1:11:23 PM10/30/15
to fedor...@googlegroups.com
Some comments on Adam's statements below...with which I have a lot of
sympathy.

Fedora was conceived as an object repository first and foremost, its
use case is to store things. LDP, frankly, considers real things a bit
of an inconvenience and I'm really not convinced that the LDP API is an
efficient way to carry out preservation-like operations or file-like
operations or any number of things that I really want to do with real
things.

However, if I can paraphrase Adam's words a little...

What’s the engagement with LDP? *An* API for that object model should
conform to LDP. That is to say, an LDP client should be able to
interact with a Fedora instance. That’s a different claim than
it would be to say that Fedora is an LDP implementation.

Neil

Mike Giarlo (Google Groups)

unread,
Oct 30, 2015, 1:37:25 PM10/30/15
to fedor...@googlegroups.com

Hi Adam,

> I appreciate your point, and PCDM is a great example of the kind of discussion
> of object models that I personally think the Fedora community very much needs,
> but I think it’s really premature to call it the "de facto object model" for Fedora.

I grant you that PCDM is fairly young, yes, and there aren't dozens of production implementations in the wild. (Digression: how many *total* production implementations are there of Fedora 4? And how many of those won't be based on PCDM in the next year or so, I wonder.)

Given that dozens of people have been working on PCDM over the past year; that there is working code in both Hydra and Islandora for PCDM; and that major versions of both Hydra and Islandora will be based on PCDM, I'm not sure how premature it is. Are there other efforts to define a common Fedora object model that I'm not aware of? It'd be good to know about those.

> It is a growing and important effort shared by a good portion of the Fedora
> community, but that’s all at the moment, and in my opinion it is much too early
> to even think about formalizing any such designation.

A fair point. And I appreciate the change in subject line here, since this is a bit of a tangent from the original discussion. To make the tie-back explicit: merging 1801 will specifically inconvenience a large swath of the Fedora community that's already got a common object model (which, as you pointed out, many of us feel we sorely need for Fedora to be an interesting project).

-Mike

________________________________________
From: fedor...@googlegroups.com <fedor...@googlegroups.com> on behalf of A. Soroka <aj...@virginia.edu>
Sent: Friday, October 30, 2015 10:05
To: fedor...@googlegroups.com
Subject: Object models Was: [fedora-tech] Hash URI Fragments

Aaron Coburn

unread,
Oct 30, 2015, 1:40:40 PM10/30/15
to fedor...@googlegroups.com
As a fedora committer who hasn't spoken up on this thread, I would say that, while I also have a lot of sympathy with Adam's comments, I am -1 on fcrepo-1801.

In my opinion, any *named* resource that exists in fedora should be addressable from any other resource.

The ticket mentions that hash URI nodes are similar to blank nodes. That is true insofar as their representation and lifespan is tied to the parent resource. But unlike blank nodes, these resources are not anonymous; they have a URI and are therefore addressable. I don't think fedora should get in the way of that. If we really wanted to prevent inbound links to these resources, then we should implement support for blank nodes (which is something I am not advocating).

Clearly, there is some complexity w/r/t referential transparency, JCR, inbound links, and how, when or whether these hash URIs are cleaned up. But most of that, I think, is orthogonal to this discussion.

Aaron

--
Aaron Coburn
System Administrator / Programmer
Web Services, Amherst College

Benjamin Armintor

unread,
Oct 30, 2015, 1:45:17 PM10/30/15
to fedor...@googlegroups.com
I feel like I should offer a brief clarification to something I said earlier:

Fedora's API ought to be a refinement of LDP in support of acting as a repository. It should be implementable over other LDP implementations, and not controvert LDP expectations.

PCDM ought to strive not to refer to LDP or Fedora.

Hydra, inasmuch as it ought to strive for compatibility with other LDP implementations, ought not to implement PCDM in ways that make Fedora the only possible LDP option.



A. Soroka

unread,
Oct 30, 2015, 1:59:56 PM10/30/15
to fedor...@googlegroups.com
I think it’s pretty clear by now that 1801 isn’t going through any time soon.

That having been said, I’d like to dig into the principle you are offering here, Aaron. I think it’s attractive, but I’m unsure about its viability in general. Let me give some examples:

1) If I set up a repository with some “private” spaces (sections that, by use of Fedora authZ components, aren’t accessible to any but one user) are resources in those sections really addressable? Isn’t globally disallowing certain kinds of resources to be linked to just another flavor of that?

2) How does your principle play out against versioning? Versions are named, of course, and that naming is independent of #. Suppose a user makes a link to a hash-URI (from a resource that is not its unhashed “parent”) and then the unhashed “parent" is mutated a few times. Is the link to a version of the hash-URI JCR node? To an unversioned hash-URI JCR node, and if so, where does that node live? Is either case actually capturing what the user intended, or can we even be sure about what that was? Suppose two different users made the link and the versioning mutations?

3) Fixity results, as I understand it (and I don’t pretend to understand it too well) include transient resources, resources that aren’t persisted in the repository but have names/URIs which are used in triples. Are those resources to be addressable?

---
A. Soroka
The University of Virginia Library

David Chandek-Stark

unread,
Oct 30, 2015, 2:05:16 PM10/30/15
to fedor...@googlegroups.com
+1 Ben.  Nicely said.

From: <fedor...@googlegroups.com> on behalf of Benjamin Armintor <armi...@gmail.com>
Reply-To: "fedor...@googlegroups.com" <fedor...@googlegroups.com>
Date: Friday, October 30, 2015 at 1:45 PM
To: "fedor...@googlegroups.com" <fedor...@googlegroups.com>
Subject: Re: [fedora-tech] Hash URI Fragments

A. Soroka

unread,
Oct 30, 2015, 2:21:29 PM10/30/15
to fedor...@googlegroups.com, Mike Giarlo (Google Groups)
> On Oct 30, 2015, at 1:37 PM, Mike Giarlo (Google Groups) <michael...@stanford.edu> wrote:
>> I appreciate your point, and PCDM is a great example of the kind of discussion of object models that I personally think the Fedora community very much needs, but I think it’s really premature to call it the "de facto object model" for Fedora.
>
> I grant you that PCDM is fairly young, yes, and there aren't dozens of production implementations in the wild. (Digression: how many *total* production implementations are there of Fedora 4? And how many of those won't be based on PCDM in the next year or so, I wonder.)

That is a really good question for which I think you know as well as I do that we have literally no hope of getting an answer. One of my favorite memories of Fedora 3 is from an Open Repositories conference a few years back. A group that I want to say (Swiss-cheese memory banks!) was from the Czech Republic showed up to talk about their experiences scaling Fedora 3 to many tens of millions of objects and an incredible amount of data and RDF. Their techniques were clever and worked well. Their results were fantastic. They were running one of the largest, possibly _the_ largest Fedora repository in the world. And no one in the dev group had ever heard of them or heard from them. They had never participated in the mail lists, hadn’t advertised their project, nothing. I’ve never heard of them since.

That learnt me once and for all how very dangerous it is to make assumptions about the Fedora community (which is large, globally-distributed, and polyglot) based on your neighbors.

> Given that dozens of people have been working on PCDM over the past year; that there is working code in both Hydra and Islandora for PCDM; and that major versions of both Hydra and Islandora will be based on PCDM, I'm not sure how premature it is. Are there other efforts to define a common Fedora object model that I'm not aware of? It'd be good to know about those.

I’m not aware of any, but I haven’t been following the question very carefully, and the fact that we aren’t aware of them, as I tried to show with my story of the Czechs, doesn’t mean very much.

PCDM may very well be the next Fedora object model, or at least the basis of it, but for my money it is _way_ too early to commit that decision. For one thing, I suspect we’ll need a lot more community experience with migration from 3 -> 4 to get the information people will need for an informed discussion of what that next model should look like.

Another point: even if Hydra and Islandora between them represented more of the community using Fedora Commons than they do (less than half of respondents to the last survey I saw) that would still not represent all of the Fedora community with a stake in this kind of decision. I think here of Oxford with its Databank project as a good example of folks with an interest in participating in shared modeling but no particular stake in a particular scripted Web framework.

I’m not trying to rain on your parade, Mike. I just don’t want us to try to predict the weather too far into the future, either.

Trey Terrell

unread,
Oct 30, 2015, 2:26:26 PM10/30/15
to Fedora Tech
+1.

Trying to catch up here. I'm totally okay with "don't start removing named resources because of SPARQL queries scoped to a difference resource which doesn't drop the whole graph" (totally okay with having to issue DROP GRAPH, though!) I'd like to be okay with "you have to issue a DELETE", but then the entire conversation is moot because we're going to have to use Fedora transactions in a pretty serious way at that point anyways.

Robert Chavez

unread,
Oct 30, 2015, 2:31:34 PM10/30/15
to fedor...@googlegroups.com
+1 for graybeards.

Rob

On Friday, October 30, 2015, A. Soroka <aj...@virginia.edu> wrote:
+1 to what Aaron is saying here.

I think there are probably at least a few graybeards on this list who are nodding right now and thinking, “Yes, this is why earlier versions of Fedora were equipped with a disseminator architecture, to avoid too-close coupling between Fedora and other systems.” There were plenty of difficulties with the implementation of that architecture, but the idea was and remains sound. I’m hopeful that API-X will bring the idea forward with a really excellent implementation.


---
A. Soroka
The University of Virginia Library

> On Oct 30, 2015, at 10:13 AM, Aaron Birkland <ap...@cornell.edu> wrote:
>
> Hi Justin,
>
> Actually, API-X offers a very important alternative to placing features in the Fedora core, in a way that *lessens* the dependence of an application on knowledge of a Fedora repository, and how to interact with it.  Consider a "linked data" extension that allowed the POST, PUT, PATCH to Fedora resources to contain forbidden constructs such as blank nodes or links to hash URIs.  The effect of such an extension would make Fedora appear to the outside world (and to the components of your infrastructure that use its LDP API) to be a more 'normal' LDP implementation like any of the others on the w3c list, free of Fedora's unique constraints on RDF.
>
> Thought of another way, API-X can be used to reduce the business risk of breaking changes in Fedora's APIs.  If your application depends on a particular behaviour of hash URIs, and Fedora introduces a backwards-incompatible change like 1801, it would be possible to change the apparent behaviour of Fedora resources with an extension, rather than changing your infrastructure to adapt to Fedora.
>
>   -Aaron
>
>
> On Fri, Oct 30, 2015 at 9:51 AM, Justin Coyne <jus...@curationexperts.com> wrote:
> Aaron,
>
>   While an API-X option may be a workable alternative for FCR4, it's not preferred as we would like PCDM to work on multiple LDP implementations. Using an API-X solution would tie us to Fedora.
>
> -Justin
>
>
> On Fri, Oct 30, 2015 at 8:47 AM, Esmé Cowles <esco...@ticklefish.org> wrote:
> Adam-
>
> The alternative is the current behavior: allow resources to link to hash URI fragments in other resources, and use the existing JCR referential integrity functionality to avoid accumulating unused nodes.
>
> I agree with Justin that it would be significant inconvenience to remove features like this, and would punish early adopters.  So far, I haven't heard a clear consensus in favor of fcrepo-1801, and the main use case is making second and further implementations of the Fedora API less burdensome.  I have a hard time seeing how we can go forward with a breaking change like this unless there is a real use case the requires it.
>
> -Esme
>
> > On Oct 30, 2015, at 9:41 AM, A. Soroka <aj...@virginia.edu> wrote:
> >
> > The back-compatitiblity issue is a real one, and I don’t pretend that I have any simple amelioration for it. But I would like to know how extensive it actually is before assuming that it’s a major blocker or requires a special migration tool.
> >
> > I am myself disappointed that no one from the PCDM community (which seems, at a glance, to be the only segment of the Fedora community that is speaking up against this idea) has offered any alternative proposal. I wouldn’t mind taking more time over this if there’s a chance of a good alternative that we can consider, but there are outstanding major bugs (1764 is a good example) that are waiting on some resolution to this design problem.
> >
> > ---
> > A. Soroka
> > The University of Virginia Library
> >
> >> On Oct 30, 2015, at 9:34 AM, Justin Coyne <jus...@curationexperts.com> wrote:
> >>
> >> The major problem I have is that the change you mention would be a backwards incompatible one.  We can and are using Hash based URIs now in Fedora. We've invested significant time into working on this approach. It's going to cost us significantly to have to take another approach.  People who have deployed this solution are going to have to migrate their data to a different structure. That will be costly too.
> >>
> >> I'm profoundly disappointed.
> >>
> >> -J
> >>
> >> On Fri, Oct 30, 2015 at 8:29 AM, A. Soroka <aj...@virginia.edu> wrote:
> >> Thanks, Esmé and Justin, this detail helps.
> >>
> >> So my understanding now is: PCDM made several specific implementation decisions that eventually led to using hash-URIs to “imitate” atomic transactions. If that hash-URI facility went away, PCDM would have to remake those decisions, perhaps finding a different way to implement ordering or atomic transactions. I see that it would mean real work for people committed to PCDM, but I don’t see that it would be a terrible expense for solving the problems that are now present in the Fedora reference implementation for _all_ users of Fedora, and I haven’t yet heard any other suggestions for solving those problems…
> >>
> >> ---
> >> A. Soroka
> >> The University of Virginia Library
> >>
> >>> On Oct 30, 2015, at 8:48 AM, Esmé Cowles <esco...@ticklefish.org> wrote:
> >>>
> >>> Adam-
> >>>
> >>> PCDM requires ORE notions (specifically ore:Proxy for ordering), and the PCDM group decided to implement ordering of the proxies with a doubly-linked list, using iana:first/last/prev/next predicates to link to the head/tail of the list and between items in the list.  Doing that efficiently requires atomic transactions against the linked list, hence the move to using hash URIs on a resource to store the linked list, instead of having each proxy be a container.
> >>>
> >>> -Esme
> >>>
> >>>> On Oct 30, 2015, at 8:20 AM, A. Soroka <aj...@virginia.edu> wrote:
> >>>>
> >>>> I’m sorry, Justin, but that doesn’t really help me. Are you saying that PCDM requires ORE notions, and ORE requires atomic transactions against linked lists?
> >>>>
> >>>> ---
> >>>> A. Soroka
> >>>> The University of Virginia Library
> >>>>
> >>>>> On Oct 30, 2015, at 8:17 AM, Justin Coyne <jus...@curationexperts.com> wrote:
> >>>>>
> >>>>> We're using hash based URIs because ORE requires a single writer (manipulating a linked list).  Putting all the nodes of the list in a single resource is a simple way of doing that. Otherwise we need to have a more complex locking/transaction strategy that is not commonly available in various LDP implementations.
> >>>>>
> >>>>> -Jusint
> >>>>>
> >>>>> On Fri, Oct 30, 2015 at 7:14 AM, A. Soroka <aj...@virginia.edu> wrote:
> >>>>> On a second reading, something peculiar jumps out at me in your remark, Tom: how is it that PCDM (a data model that is supposed to be agnostic to the implementing software and usable for modeling any repository content at all) actually collapses in the absence of one very specific type of identifier syntax? Shouldn’t identifiers be opaque in PCDM? That seems extraordinarily fragile, and then PCDM seems like a problematic use case for judging this question of hash-URIs in Fedora.
> >>>>>
> >>>>> ---
> >>>>> A. Soroka
> >>>>> The University of Virginia Library
> >>>>>
> >>>>>> On Oct 29, 2015, at 10:11 PM, Tom Johnson <johns...@gmail.com> wrote:
> >>>>>>
> >>>>>> Chipping in for the -1's on #1801.
> >>>>>>
> >>>>>> As Justin pointed out, this restriction would make Fedora virtually unusable for PCDM adopters, and similarly for any number of otherwise reasonable data models.
> >>>>>>
> >>>>>> - Tom
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Oct 29, 2015 at 6:20 PM, A. Soroka <aj...@virginia.edu> wrote:
> >>>>>> Simeon has clarified that point and that discussion can go on in that part of this thread— I’ll extract the other points here:
> >>>>>>
> >>>>>>> For further reductio ad absurdum... you could run two Fedora4 instances, and f4a.org/rest/a#1 could reference f4b.org/rest/a#2 and vice versa... but not internally?
> >>>>>>
> >>>>>> Certainly. It is indeed absurd to imagine that two independent and unconnected servers in different parts of the Web could somehow without cost manage to align their resources and manage links interdependently— just because they are both implemented with the same brand of software. That is as much as to be surprised that two different websites could accidentally host copies of the same file, when, after all, aren’t they both running Apache httpd? {grin}
> >>>>>>
> >>>>>>> This is just not the right solution for the problem.
> >>>>>>
> >>>>>> I (and I’m sure other Fedora developers) would be happy to hear about other solutions that work within the limits of the reference implementation. As I wrote earlier in this thread, I won’t claim that 1801 is a _necessary_ change, but I do claim that the current state of affairs is intolerable, and that change is necessary. Perhaps you have another design in mind?
> >>>>>>
> >>>>>> ---
> >>>>>> A. Soroka
> >>>>>> The University of Virginia Library
> >>>>>>
> >>>>>>> --
> >>>>>>> You received this message because you are subscribed to the Google Groups "Fedora Tech" group.
> >>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech...@googlegroups.com.
> >>>>>>> To post to this group, send email to fedor...@googlegroups.com.
> >>>>>>> Visit this group at http://groups.google.com/group/fedora-tech.
> >>>>>>> For more options, visit https://groups.google.com/d/optout.
> >>>>>>
> >>>>>> --
> >>>>>> You received this message because you are subscribed to the Google Groups "Fedora Tech" group.
> >>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech...@googlegroups.com.
> >>>>>> To post to this group, send email to fedor...@googlegroups.com.
> >>>>>> Visit this group at http://groups.google.com/group/fedora-tech.
> >>>>>> For more options, visit https://groups.google.com/d/optout.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> -Tom Johnson


--
Sent from Gmail Mobile

David Chandek-Stark

unread,
Oct 30, 2015, 2:52:05 PM10/30/15
to fedor...@googlegroups.com
Mike, 

FWIW, as you know we are in the process of migrating from F3 to F4, and have not planned at this point to implement PCDM as part of the migration.  Obviously as a Hydra partner we are very interested in PCDM and believe that it’s probably on our roadmap, but I don’t think we have a solid sense yet of what that means in terms of development effort and priorities.

—D

From: <fedor...@googlegroups.com> on behalf of "Mike Giarlo (Google Groups)" <michael...@stanford.edu>
Reply-To: "fedor...@googlegroups.com" <fedor...@googlegroups.com>
Date: Friday, October 30, 2015 at 1:37 PM
To: "fedor...@googlegroups.com" <fedor...@googlegroups.com>

Mike Giarlo (Google Groups)

unread,
Oct 30, 2015, 2:56:14 PM10/30/15
to fedor...@googlegroups.com
Hi Adam,

(Office 365 doesn't seem to provide a clean way to do in-line responses, sadly, so here I am top-posting my response again...)

Your points are well taken, and my soggy parade has more to do with living in a rain forest than what you've written. :)

To be clear: I am not advocating for fast-tracking PCDM as *the* Fedora object model. There are good reasons -- discussed earlier this year -- why PCDM is named PCDM and not the Fedora Common Data model. My point was that investment in PCDM in the Fedora community is significant and growing, and that Fedora should not, where avoidable, make their lives harder. I believe that concern is now moot since no one is in a rush to approve 1801 and this thread seems likely to surface more PCDM-friendly alternatives.

-Mike


________________________________________
From: fedor...@googlegroups.com <fedor...@googlegroups.com> on behalf of A. Soroka <aj...@virginia.edu>
Sent: Friday, October 30, 2015 11:21
To: fedor...@googlegroups.com; Mike Giarlo (Google Groups)
Subject: Re: Object models Was: [fedora-tech] Hash URI Fragments

Andrew Woods

unread,
Nov 13, 2015, 1:30:03 PM11/13/15
to fedor...@googlegroups.com
Hello All,
Over the last two weeks, "hash URI fragments" has been a central topic on the Fedora Tech calls. The initial objective has simply been the establishment of a shared understanding of:
- what hash URI fragments are in the Fedora context and 
- how do we expect them to behave.

Some of this conversation is consolidated in the wiki:
https://wiki.duraspace.org/display/FF/Design+-+Hash+URI+Fragments

As a starting point, it would be helpful if we could agree on the expected behaviors of Fedora-related hash URI fragments. Below is a draft of such behaviors:

1. Triples whose subject is a hash URI can be added or removed via standard Fedora CRUD operations (POST, PATCH or PUT) on the resource from which the hash URI is derived
1a. When a resource is updated to include properties on hash URI fragments, they are automatically created.
2. Triples whose object is a hash URI can be added or removed via standard Fedora CRUD operations (POST, PATCH or PUT)
3. The RDF representation of a repository resource includes all triples from hash fragments associated with it.  That is to say, all triples whose subject is a hash URI derived from that resource's URI
4. Any repository resource may link to any hash URI regardless of the presence of any triples associated with the hash fragment
5. A hash fragment should automatically be deleted when all of the triples that have that fragment as their subject have been deleted
6. Hash fragments should be permitted to include zero or more '/' characters

Please comment directly on the above mentioned wiki page about these behaviors or other behavioral expectations that are not listed. Once we have a common understanding, it will be much easier to direct Fedora's implementation.
Regards,
Andrew

Reply all
Reply to author
Forward
0 new messages