HC3 Side Event: Hydra/Islandora Common Practices

58 views
Skip to first unread message

Andrew Woods

unread,
Aug 18, 2015, 6:29:41 PM8/18/15
to fedor...@googlegroups.com, Hydra-Tech, island...@googlegroups.com
Hello All,
Taking advantage of the face-time afforded by Hydra Connect in September, I would like to propose a technical meetup Monday evening (9/21) to advance towards Hydra/Islandora interoperability. Given the facts that:
- Islandora is currently in the process of porting content models into PCDM
- Hydra has groundwork in place (the hydra-pcdm and hydra-works gems)
- Hydra has some LDP practices to share
- Islandora has some asynchronous workflow practices to share
- There is common interest in WebAC (and a working implementation in hydra-access-controls)

it seems to be an opportune time to define the minimal requirements for having both Hydra and Islandora be able to make sense of resources in the same underlying Fedora.

DCE has offered a room (TBD) that can fit 16 people. Please add your name to the Google-Doc to reserve a spot:

Regards,
Andrew

Tom Cramer

unread,
Aug 18, 2015, 9:44:40 PM8/18/15
to fedor...@googlegroups.com, Hydra-Tech, island...@googlegroups.com
Andrew,

Thanks for organizing this. Has there been any substantial discussion in either community yet that has crossed your radar on possibly converging on a common / interoperable solr indexing method? At HC2, e.g. there was some discussion that F4 might possibly let us deprecate solrizer. If similar discussions are underway within Islandorlandia, it seems like we should talk it through further. 

Likewise, I wonder if there isn't potential interest on interoperable (common, modular) approaches to loading a triple store for sparql querying. 

Finally, it would be great to see if IIIF-based interoperability could and should be added to your shortlist of touchpoints. 

- Tom



Typed with my thumbs.
--
You received this message because you are subscribed to the Google Groups "Fedora Tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech...@googlegroups.com.
To post to this group, send email to fedor...@googlegroups.com.
Visit this group at http://groups.google.com/group/fedora-tech.
For more options, visit https://groups.google.com/d/optout.

Nick Ruest

unread,
Aug 19, 2015, 9:47:47 AM8/19/15
to fedor...@googlegroups.com, hydra...@googlegroups.com, island...@googlegroups.com
Hi Tom-

Solr and triplestore indexing are definitely on our radar in Islandorlandia.

We've been taking advantage of the fcrepoo-camel-toolbox[1] in Islandora
7.x-2.x, and discussions around commons ways of implementing it would be
great. Danny Lamb will be at Hydra Connect, and can talk about it. I
wish I could be there too, but y'all are in excellent hands with Danny :-)

Big +1 to IIIF-based interoperability!

-nruest

[1] https://github.com/fcrepo4-exts/fcrepo-camel-toolbox

On 2015-08-18 09:44 PM, Tom Cramer wrote:
> Andrew,
>
> Thanks for organizing this. Has there been any substantial discussion in
> either community yet that has crossed your radar on possibly converging
> on a common / interoperable solr indexing method? At HC2, e.g. there was
> some discussion that F4 might possibly let us deprecate solrizer. If
> similar discussions are underway within Islandorlandia, it seems like we
> should talk it through further.
>
> Likewise, I wonder if there isn't potential interest on interoperable
> (common, modular) approaches to loading a triple store for sparql querying.
>
> Finally, it would be great to see if IIIF-based interoperability could
> and should be added to your shortlist of touchpoints.
>
> - Tom
>
>
>
> Typed with my thumbs.
>
> On Aug 18, 2015, at 4:29 PM, Andrew Woods <awo...@duraspace.org
>> <mailto:fedora-tech...@googlegroups.com>.
>> To post to this group, send email to fedor...@googlegroups.com
>> <mailto:fedor...@googlegroups.com>.
>> Visit this group at http://groups.google.com/group/fedora-tech.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Fedora Tech" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to fedora-tech...@googlegroups.com
> <mailto:fedora-tech...@googlegroups.com>.
> To post to this group, send email to fedor...@googlegroups.com
> <mailto:fedor...@googlegroups.com>.

Andrew Woods

unread,
Aug 19, 2015, 10:35:44 AM8/19/15
to Hydra-Tech, fedor...@googlegroups.com, island...@googlegroups.com
Hello Tom,
I think the notion of "Hydra/Islandora inter-operability" comes in levels or categories.
The first, most basic, level of interop is having a single F4 repository with container and binary resources which both Hydra and Islandora can make sense of and expose through the respective applications. This level is hopefully achievable with shared PCDM (and other ontologies) modeling approaches.

A second category of interop is likely a common approach for interacting with the standards-based API that F4 provides:
- common LDP interactions/expectations for CRUD operations
- common WebAC interactions/expectations for authorization
- common Memento (to-be-implemented) interactions/expectations for versioning

A third category of interop likely includes common recipes for using F4's Apache Camel-based external integrations, such as:
- (re)populating an external triplestore for SPARQL-Queries
- (re)populating an external Solr index
- generating derivatives
- maintaining a raw, external RDF serialization of resources
- running repository-wide fixity checks
- exposing image resources via IIIF

Specifically with respect to having a shared approach for populating a Solr index, my understanding from Hydra and Islandora discussions is that this may be one of the trickier nuts to crack... but understanding those details is the objective of this initiative. 

It would be helpful to begin the detailed discussions, here, in advance of HC3. For example, in terms of common modeling approaches, a Hydra-based modeling example of an image collection to juxtapose with the one recently posted from Islandora would be constructive:

Regards,
Andrew


--
You received this message because you are subscribed to the Google Groups "Hydra-Tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hydra-tech+...@googlegroups.com.

aj...@virginia.edu

unread,
Aug 21, 2015, 10:57:38 AM8/21/15
to fedora-tech@googlegroups.com Tech, Hydra-Tech, island...@googlegroups.com
I think you have these levels confused, Andrew. Committing to consuming shared standards-based APIs is a considerably _lighter_ and _easier_ burden than committing to shared ontology.

What is more, it's not obvious to me that the third category really has anything to do with UI frameworks for Fedora like Hydra or Islandora. It's an independent area of work. If both Hydra and Islandora cease to exist, that kind of workflow automation will be no less interesting.

---
A. Soroka
The University of Virginia Library
> To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech...@googlegroups.com.
> To post to this group, send email to fedor...@googlegroups.com.

Andrew Woods

unread,
Aug 21, 2015, 2:33:08 PM8/21/15
to Hydra-Tech, fedora-tech@googlegroups.com Tech, island...@googlegroups.com
Hello Aaron,
Thanks for the input. 

Which exact points of "common practice" across the community turn out to be easier or harder will become evident with time. 

On your second point, agreed, the aim of this thread is not so much about UI frameworks but rather to highlight the types of community efforts that may represent entry-points for collaboration and shared practice.
Regards,
Andrew

Justin Coyne

unread,
Aug 21, 2015, 2:46:31 PM8/21/15
to hydra...@googlegroups.com, fedora-tech@googlegroups.com Tech, island...@googlegroups.com
I have to think that "populating an external Solr index" is not something that is easily shared.  Every Hydra application I write has a different indexing procedure.  Primarily because I'm aggregating multiple data sources besides what is in Fedora to create the Solr record.  For example, I might be pulling the creator URI out of Fedora, grabbing the skos:prefLabel for that resource from Getty or LOC (or a local cache), and adding a URI to the image server where the thumbnail is located. Additionally, I might be aggregating several Fedora resources into one solr record for display in the front end.  Then I'll probably have a different indexing procedure for each administrative collection or object type. 

Dealing with this sort of variability is pretty easy in Hydra. Here's an example of an indexer for images:  https://github.com/curationexperts/alexandria-v2/blob/master/app/indexers/image_indexer.rb

I don't see how moving this into Fedora/Java will make it better. Please share your ideas about it.

-Justin

aj...@virginia.edu

unread,
Aug 21, 2015, 2:55:03 PM8/21/15
to fedor...@googlegroups.com
I don't see how moving that into Fedora would make it easier, either, or even how one would so do, or what that would mean.

I also don't see that anyone has suggesting so doing. The suggestion was to use "F4's Apache Camel-based external integrations": processes explicitly external to the repository, based on software (Camel) supported by a community that dwarfs that of Hydra, or Islandora, or Fedora itself, for that matter.

---
A. Soroka
The University of Virginia Library

Nick Ruest

unread,
Aug 23, 2015, 4:52:24 PM8/23/15
to hydra...@googlegroups.com, fedor...@googlegroups.com, island...@googlegroups.com
Can folks double check their address books. It seems we keep on having
issues sending messages to the Islandora list. The correct address is:
island...@googlegroups.com

NOT: island...@fedora-commons.org

-nruest

On 2015-08-21 03:29 PM, Stefano Cossu wrote:
> FWIW, at the AIC we are in a R&D phase to implement a Solr index for
> Hydra built on top of a triplestore index. This gives us the advantage
> of having the fast full-text search, faceting and aggregation features
> of Solr with reasoning, federation and other capabilities offered by the
> triplestore.
>
> Also this allows us to use Fedora for what it shines at the most, i.e.
> preservation. We want to limit read access to Fedora (possibly only to
> retrieving non-RDF sources) because it is slower than Solr, and also
> avoid storing RDF properties that can be inferred by a common triplestore.
>
> To Justin's issue, if you could federate the external vocabularies at
> the triplestore level, with this method you would have that information
> in Solr.
>
> Here is a draft document:
> https://docs.google.com/a/artic.edu/document/d/1766zCRcH1ymNmL9xTPfltqYYW0gd0u8GiY5aVMWSm8o/edit?usp=sharing
>
> Stefano
>>>>> an email tofedora-tec...@googlegroups.com
>>>>> <mailto:fedora-tech...@googlegroups.com>.
>>>>> To post to this group, send email tofedo...@googlegroups.com
>>>>> <mailto:fedor...@googlegroups.com>.
>>>>> Visit this group athttp://groups.google.com/group/fedora-tech.
>>>>> For more options, visithttps://groups.google.com/d/optout.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Fedora Tech" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email tofedora-tec...@googlegroups.com
>>>>> <mailto:fedora-tech...@googlegroups.com>.
>>>>> To post to this group, send email tofedo...@googlegroups.com
>>>>> <mailto:fedor...@googlegroups.com>.
>>>>> Visit this group athttp://groups.google.com/group/fedora-tech.
>>>>> For more options, visithttps://groups.google.com/d/optout.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google Groups "Hydra-Tech" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an email tohydra-tech...@googlegroups.com.
>>>>>
>>>>> For more options, visithttps://groups.google.com/d/optout.
>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google Groups "Fedora Tech" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an email tofedora-tec...@googlegroups.com.
>>>>> To post to this group, send email tofedo...@googlegroups.com.
>>>>> Visit this group athttp://groups.google.com/group/fedora-tech.
>>>>> For more options, visithttps://groups.google.com/d/optout.
>>>> --
>>>> You received this message because you are subscribed to the Google Groups "Hydra-Tech" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an email tohydra-tech...@googlegroups.com.
>>>> For more options, visithttps://groups.google.com/d/optout.
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups "Fedora Tech" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an email tofedora-tec...@googlegroups.com.
>>>> To post to this group, send email tofedo...@googlegroups.com.
>>>> Visit this group athttp://groups.google.com/group/fedora-tech.
>>>> For more options, visithttps://groups.google.com/d/optout.
>
> --
>
> Stefano Cossu
> Director of Application Services, Collections
>
> The Art Institute of Chicago
> 116 S. Michigan Ave.
> Chicago, IL 60603
> 312-499-4026
>
> --
> You received this message because you are subscribed to the Google
> Groups "Hydra-Tech" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to hydra-tech+...@googlegroups.com
> <mailto:hydra-tech+...@googlegroups.com>.

Hector Correa

unread,
Sep 8, 2015, 1:57:47 PM9/8/15
to Fedora Tech, hydra...@googlegroups.com, island...@googlegroups.com
I second the idea of avoiding using the repository (Fedora/Solr) from becoming the integration point. I think we will be better served in the long term if we allow our applications to communicate via APIs (as Adam suggested.) This will allow each application to evolve at its own pace (as Justin has already experienced.)  The downside of not using the repository as the integration point is that there will be some duplication of efforts, but that it's a cheaper price to pay than forcing everybody to use the same repo and having a repository that cannot be change without breaking anything.

Martin Fowler has a small piece on which he talks about the disadvantages of using the "database as an integration point" in which he states:

    > On the whole integration databases lead to serious problems becaue the database becomes
    > a point of coupling between the applications that access it. This is usually a deep coupling
    > that significantly increases the risk involved in changing those applications and making it
    > harder to evolve them. As a result most software architects that I respect take the view that
    > integration databases should be avoided.

I suppose somebody could make the case that Fedora is an API given that it is exposed as an LDP server, but for the most part we use it as a repository (e.g. hidden behind web apps, without public access) as opposed to a public API.

[1] http://martinfowler.com/bliki/IntegrationDatabase.html

Ben Cail

unread,
Sep 8, 2015, 2:52:58 PM9/8/15
to hydra...@googlegroups.com, Fedora Tech, island...@googlegroups.com
I like the sound of applications communicating via APIs, instead of using Fedora/Solr as an integration database.

That's something we've worked on some - we have APIs for reading/accessing the data in our repository, so different applications that view repository content can hit the API instead of Fedora/Solr. We also have an API for ingesting items into the repository, so if an app wants to create an item, they can just send the information to the API.

It's worked well for us so far.

-Ben
--
You received this message because you are subscribed to the Google Groups "Hydra-Tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hydra-tech+...@googlegroups.com.

aj...@virginia.edu

unread,
Sep 8, 2015, 3:04:24 PM9/8/15
to fedor...@googlegroups.com, hydra...@googlegroups.com, island...@googlegroups.com
Some thoughts on this question:

Martin Fowler is quite rightly concerned with the problem of evolving the _schema_ in an integration database that supports many applications. He's thinking of a SQL database. If Solr and Fedora are being used properly (dynamic index fields, proper content modeling based on intellectual arrangement and not aping physical layouts), this should not be a difficult issue.

Coupling via API dissolves the problem of a brittle center of integration and crystallizes a new combinatorial problem. The more applications you need to integrate, the more cross-connections you must make between APIs until it becomes impossible to evolve any given API at all, because it is tied to too many things at once.

This is why integration toolchests like ESBs became popular many years ago. I'm not suggesting that many institutions in this conversation have the money or time to invest in that way. I am suggesting that there is no "perfect choice" here. You get to decide which costs you are going to avoid, and which you are going to accept, but integrating a large number of software services is going to cost you something. The right choice here is going to depend on what kind of skills an institution expects to be able to bring to bear _over time_.

---
A. Soroka
The University of Virginia Library

> You received this message because you are subscribed to the Google Groups "Fedora Tech" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech...@googlegroups.com.
> To post to this group, send email to fedor...@googlegroups.com.
> Visit this group at http://groups.google.com/group/fedora-tech.

Aaron Paul Birkland

unread,
Sep 8, 2015, 4:16:36 PM9/8/15
to fedor...@googlegroups.com, hydra...@googlegroups.com, island...@googlegroups.com
Hi Ben, Hector, all,

There is a relatively new effort going on right now to produce an
architecture/framework for exposing public APIs or endpoints on top of
Fedora resources. This "API Extension Architecture" is conceptualized
as a layer outside of Fedora that can bind services (e.g. an ingest API)
to objects (e.g a container resource to ingest into), exposed at URIs
that can be reasoned about. It was created as a response to issues that
sound very similar to those discussed in this thread.

Here is the original proposal that started this work:
https://docs.google.com/document/d/1UHQMXgnGoIXfe33a6CM6k3IkL3SsYqzkkWBRMaJxMew/edit

This evolved into a community effort, which is now in the stage of
collecting and reviewing use cases in order to come up with design
requirements:
https://wiki.duraspace.org/x/f5ApB

A couple months ago Stefano Cossu (Art Institute of Chicago) floated
some initial thoughts on the fedora tech list that suggested that this
framework had the potential be of use to the Hydra and Islandora
communities:
https://groups.google.com/d/msg/fedora-tech/UovyLGZeToE/lNRtWZZ4JM4J

Please feel free to take a look at the above documents, particularly the
list of use cases on the wiki. If it looks at all interesting, then
there is a good chance the shared API needs of the Hydra and Islandora
communities could inform the trajectory of this this nascent effort, and
produce something really useful.

Thanks,

-Aaron

Daniel Lamb

unread,
Sep 10, 2015, 12:52:28 PM9/10/15
to Fedora Tech, hydra...@googlegroups.com, island...@googlegroups.com
I'd like to echo what ajs6f said.  Sharing the organization in Fedora should be enough to allow us all to do what needs to be done.  I'd even go so far as to say that sharing Solr (while nice) shouldn't be neccessary for integration.  Maybe folks don't even want to use Solr at all and prefer Elastic Search.

Sharing API's would be icing on the cake, but there's still a lot of unknowns out there.  I think we'll have a lot of interesting conversations about about core things we do (e.g. CRUD operations on PCDM Objects and Collections) and how that may or may not fit into the application extension architecture Aaron Birkland is driving forward.

But giving application developers a consistent standard from which to make decisions should be the starting point and may very well be enough.  At the end of the day, we might all go off and re-code our various wheels differently (I'd hope not, but that's a real possibility) and not share any sort of API whatsoever.  Having the standard at Fedora level will still allow us all to inter-operate to some extent, and PCDM *should* buffer the instability we're worried about.

Having pushed through the foundation of our PCDM implemenation in Islandora in the past week or so, I'm really starting to see its benefit and also what we haven't covered yet.  Manipulating non-RDF descriptive metadata certainly warrants a common practice, IMO. 

Can't wait to talk to everyone about it in person at Hydra Connect!

~Danny

Andrew Woods

unread,
Sep 10, 2015, 5:17:10 PM9/10/15
to fedor...@googlegroups.com, Justin Coyne, Hydra-Tech, island...@googlegroups.com
Hello All,
Following on from Danny's comments, we will probably make better use of our limited time on Monday evening if we establish the details of the agenda in advance. I have updated the Google-Doc sign-up [1] sheet (spots still available) adding initial agenda items, mentioned inline:
- How are we defining “minimal interoperability”?
- What modeling approaches can we generalize?
- What metadata approaches can we generalize?
- What goals should we set for the short term? (Hartford, Conn, Oct 23)
- ...

Please update the Google Doc agenda directly or brainstorm agenda items in this thread.

Justin,
From a logistics point of view, please pass along the time and location of the meeting room when you have the details.

Regards,
Andrew


--
You received this message because you are subscribed to the Google Groups "Fedora Tech" group.

To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech...@googlegroups.com.
To post to this group, send email to fedor...@googlegroups.com.
Visit this group at http://groups.google.com/group/fedora-tech.

Andrew Woods

unread,
Sep 16, 2015, 5:43:05 PM9/16/15
to fedor...@googlegroups.com, Hydra-Tech, island...@googlegroups.com
Hello All,
Mark your calendars. The logistics have been confirmed for the HydraConnect3 "Side Event: Hydra/Islandora Common Practices" as follows:
- Time: 6 to 8pm
- Location:  212 3rd Ave N. Ste. 390 ( https://goo.gl/maps/RRyHF )
- Dinner: Wholefoods or Pizza Luce for take-out ( http://pizzaluce.com/locations/downtown-minneapolis )
- Getting there: 
-- Walk: 20 minute walk from the University of St. Thomas. 

See you on Monday.
Andrew

Michael J. Giarlo

unread,
Sep 16, 2015, 5:59:23 PM9/16/15
to fedor...@googlegroups.com, Hydra-Tech, island...@googlegroups.com

Thank you, Andrew!

Reply all
Reply to author
Forward
0 new messages