Fedora 4 resolver for IIIF server

268 views
Skip to first unread message

Stefano Cossu

unread,
Mar 30, 2016, 1:26:03 PM3/30/16
to iiif-d...@googlegroups.com, fedor...@googlegroups.com
Hello,
I am trying to make images stored in Fedora 4 available to a IIIF-compliant image server.

The simplest solution I could think of was to have some asynchronous process generate a PTIFF or JP2 derivative when a master image is created or updated in Fedora, and push it to the image server. This is however not ideal because those images would not be managed by Fedora and they would take extra management e.g. for deleting.

Therefore I have been looking at how the image server can resolve an image identifier into a Fedora URI.

My first question is: has any work been done in this direction?

To my understanding, IIPImage (which we are currently using) is only able to search within the local file system. Loris seems much more flexible in resolving identifiers to any local or remote resources.

I have thought about a few solutions (assuming Loris is the image server):

1. Use the SimpleHTTPResolver to build the Fedora URI from an identifier common to all the derivative of that image. E.g. if my image identifier is 1234abcd, the URI would be https://myrepo.edu/fcrepo/rest/12/34/ab/cd/1234abcd/pyramidal_deriv.jp2 . This may require extending the SimpleHttpResolver and may be a brittle implementation because it relies on path conventions that may change in the future.

2. Use the vanilla SimpleHttpResolver and Hydra as a proxy to simplify the URI building. Our Sufia head returns the original image for a URI such as http://myhydrahead.edu/downloads/<UID> . This is a pre-PCDM implementation however, and I am not sure how the URIs would look in a multi-file asset setup.

3. Write a custom resolver that sends a SPARQL query looking for an image with a particular relationship with the parent asset, given a unique identifier. This seems to be the most flexible and durable solution but may add some overhead due to the additional SPARQL query. Not sure if this overhead would be relevant when downloading large images though, which would be likely cached anyway.

Thoughts?


Thanks,
Stefano

--
Stefano Cossu
Director of Application Services, Collections

The Art Institute of Chicago
116 S. Michigan Avenue
Chicago, IL 60603

Esmé Cowles

unread,
Mar 30, 2016, 1:40:46 PM3/30/16
to iiif-d...@googlegroups.com
Stefano-

Plum is currently doing the simple method you describe: using a hydra-derivatives background worker to generate the JP2 and push it to Loris' storage. Though we've just started thinking about cache expiration issues, and pulling images directly from Fedora could help make that easier.

I would think option #2 would still be a good approach with Works. If the default download behavior doesn't work, it should be pretty easy to write a custom controller to use Solr to find and deliver the JP2.

-Esmé
> --
> -- You received this message because you are subscribed to the IIIF-Discuss Google group. To post to this group, send email to iiif-d...@googlegroups.com. To unsubscribe from this group, send email to iiif-discuss...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/iiif-discuss?hl=en
> ---
> You received this message because you are subscribed to the Google Groups "IIIF Discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to iiif-discuss...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Stefano Cossu

unread,
Mar 30, 2016, 2:17:09 PM3/30/16
to Esmé Cowles, iiif-d...@googlegroups.com, fedor...@googlegroups.com
Esmé, thanks for the feedback.

One thing I forgot to mention: we use Hydra only internally as a CMS and we have a dedicated image server for internal images. Our web-facing area has only a Fedora repository, its triplestore + Solr indexes and a public image server.

So #3 may be the best choice for our case, while #2 would be simpler for a completely Hydra-managed repository.

Thanks,
Stefano

Kevin S. Clarke

unread,
Mar 30, 2016, 2:30:25 PM3/30/16
to Stefano Cossu, iiif-d...@googlegroups.com, fedor...@googlegroups.com
Hi,

I haven't done any work on this, but I've been thinking about it as well. My thought has been to have a IIIF server component that hooks into the Camel message queue to watch for creates, updates, and deletes of the master images. The image server component would then pull the master from Fedora and do the JP2 (or submaster) conversion within the image server. It might then push the JP2 back to Fedora but would not do so with any of the other images that an image server would generate. I've not been as concerned with Fedora being able to resolve the IIIF requests (expecting to point to the image server directly for that -- though discovery of the image server through Fedora is a possibility). In my arrangement, I think I'm giving the control to the image server rather than to Fedora (I see the component as being more of an image server component rather than a Fedora component -- it's just consuming Fedora's events). It keeps itself in sync through the events sent out through the message queue.

I'm interested in your #1 example as a pattern for URI.

I'm also interested in hearing what others have been thinking or doing with IIIF and Fedora.

We've dipped our toes in the Fedora 4 waters but aren't using it yet (which is why my ideas are all just ideas at this point).

Kevin

Benjamin Armintor

unread,
Mar 30, 2016, 2:39:28 PM3/30/16
to fedor...@googlegroups.com, Stefano Cossu, iiif-d...@googlegroups.com
Hello,

We're doing almost exactly what Kevin describes, tho on the Fedora 3 stack. Updates to content push messages to Resque, which creates the json info and JP2 for the master image and pushes them to Fedora. These jobs in turn process the tiling determined by the json info, but the resulting tiles are only cached, not pushed to Fedora. The IIIF service fills in the gaps if the image is requested while the tiling jobs run.

- Ben

You received this message because you are subscribed to the Google Groups "Fedora Tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech...@googlegroups.com.
To post to this group, send email to fedor...@googlegroups.com.
Visit this group at https://groups.google.com/group/fedora-tech.

Stefano Cossu

unread,
Mar 30, 2016, 3:03:05 PM3/30/16
to Kevin S. Clarke, iiif-d...@googlegroups.com, fedor...@googlegroups.com
Hi Kevin,
I think you are in line with my thought. There should be no special arrangements on the Fedora side to integrate with the image server, only establishing a pattern to find that specific derivative.

My issues with #1 are: 1) it is not a simple prefix + suffix solution, since the Fedora path minter breaks up the identifier into pairtree elements. This would require a custom resolver but may not be a big deal; but, most importantly, 2) if the path minting pattern is changed at any time, the image server has to be aware of the old and the new one (unless you want to migrate all your resources to the new path scheme and face the breakage that this would entail). This can add up to a point where it becomes unmanageable.

That is why I think #3 is the one requiring most set up but also the most future-proof one. The relationship URIs may change, but it would be much easier to adapt the SPARQL query to the changes to include new and old patterns.

Stefano

On Wed, 30 Mar 2016 18:30:23 +0000
Kevin S. Clarke <kscl...@ksclarke.io> wrote:

Matt McGrattan

unread,
Mar 30, 2016, 3:22:48 PM3/30/16
to IIIF Discuss, kscl...@ksclarke.io, fedor...@googlegroups.com
On Digital.Bodleian we use a Fedora-like repository, and we have a custom resolver that:

i) parses the image request and extracts the UUID
ii) does the pairtree conversion on the ID and uses that to get the absolute filepath to the image file in the pairtree store.
iii) 'caches' the filepath in a Redis store, so that the resolution only happens once.
iv) stores the info.json for the images as a blob in Redis, which massively minimises the amount of round-trips to the file store (which is slow) and makes building manifests on the fly nice and quick
v) redirects to a vanilla IIP service which provides the jpeg tiles from the jp2 at the absolute filepath generated by the resolver.
vi) pre-warms the cache with a thumbnail

There's load-balancing and caching throughout. The resolver only resolves the file paths and stores the info when an image is requested for the first time. We use a messaging and queuing based system for ingesting images into the repository and we generate JP2s as part of the ingest workflow. Much like the model Kevin describes below. The latest version, not in deployment, also does all of the resolver 'stuff' as part of the handling of the queue, so it gets done in advance of any image requests being served and any changes to the image content in the repository triggers an update to the resolver service.

What we do do, which is maybe less standard, is we have the storage that has the pairtree store on it, mounted via read-only mounts to the pool of image servers. So we don't need an additional tier of storage with JP2s sitting on it. They can just serve the images directly.

We are moving the repository to Fedora 4 and we are in the process of rebuilding the resolver and image delivery stack, so I'm very interested in this discussion.

Matt


> > options, visit this group at
> > https://groups.google.com/d/forum/iiif-discuss?hl=en --- You
> > received this message because you are subscribed to the Google
> > Groups "IIIF Discuss" group. To unsubscribe from this group and
> > stop receiving emails from it, send an email to
> > iiif-discuss+unsubscribe@googlegroups.com. For more options, visit

Kevin S. Clarke

unread,
Mar 30, 2016, 5:41:16 PM3/30/16
to fedor...@googlegroups.com, Kevin S. Clarke, iiif-d...@googlegroups.com
On Wed, Mar 30, 2016 at 3:03 PM, Stefano Cossu <sco...@artic.edu> wrote:
2) if the path minting pattern is changed at any time, the image server has to be aware of the old and the new one (unless you want to migrate all your resources to the new path scheme and face the breakage that this would entail). This can add up to a point where it becomes unmanageable.

Hmm, yes. So I plan to have a copy of the JP2 on the image server in addition to what's pushed back up to Fedora. As a daily course, I wouldn't expect the image server to need to pull the JP2 from Fedora, unless something catastrophic has happened (then having it there just saves the expensive TIFF -> JP2 conversion).

The image server would know the object by its ID so I think it would just be in the component that watches the Fedora message queue that you'd need to update a pattern used to retrieve the object -- i.e., how do I get from this ID to a pattern (or query) than I can use to extract from Fedora? I'd need a way to tell it to get everything, too, in case that catastrophic event has happened.
 
That is why I think #3 is the one requiring most set up but also the most future-proof one. The relationship URIs may change, but it would be much easier to adapt the SPARQL query to the changes to include new and old patterns.

I might be able to be swayed towards this though. My initial preference for #1 is more for its simplicity since my image server is not currently SPARQL aware (and I'm not that knowledgeable about it... unlike many others in this community -- so I may be overlooking it for that reason). Were you thinking of periodically pulling the JP2 from Fedora (keeping a particular sized cache of them on the image server) or something else?

Kevin


Stefano Cossu

unread,
Mar 30, 2016, 6:10:37 PM3/30/16
to Kevin S. Clarke, fedor...@googlegroups.com, Kevin S. Clarke, iiif-d...@googlegroups.com
Hi Kevin,
I guess that if you are dealing with Fedora4 it will be hard to evade SPARQL :)

I have not dug into the Loris resolver interface but I assume that it should not be unreasonable for one or two individuals to set up a resolver based on a SPARQL query and put it in a shared repository.

Again, I am not a Loris implementer but I recall talking with Jon about Loris caching source images. If that is true, there should be no reason to set up periodic pulls. The image would be retrieved on demand and cached for X time, so next request to the same image would bypass the SPARQL query and the Fedora pull. Anybody stop me if I am talking nonsense.

I also support Matt's idea of caching the queries themselves so the resolution is done once. If the source image changes or its cache expires, its URI will not, so at least you save the SPARQL part.

Stefano

Glen Robson

unread,
Mar 30, 2016, 6:13:18 PM3/30/16
to fedor...@googlegroups.com, Stefano Cossu, iiif-d...@googlegroups.com
Hi,

We are also using Fedora 3 and a slightly outdated infrastructure diagram is available at:


We’ve now changed to a custom Java proxy which sits alongside Fedora and in front of two IIP servers. The Fedora storage is mounted on the IIP server so there is only one copy of the jp2. The Java Proxy converts a public ID to a file path the IIP server can use, it then proxies the image request so the IIP servers are not accessible to the public. The Java Proxy also checks the rights for the image on first access and if its open access caches the result for 24 hours. The file path to pid mapping is also cached.

I’d be interested to hear how other people are managing access rights and if this is made any easier with Fedora 4.

Cheers

Glen

Robert Sanderson

unread,
Mar 30, 2016, 6:13:48 PM3/30/16
to Fedora Tech, Kevin S. Clarke, iiif-d...@googlegroups.com
On Wed, Mar 30, 2016 at 12:03 PM, Stefano Cossu <sco...@artic.edu> wrote:

My issues with #1 are: 1) it is not a simple prefix + suffix solution, since the Fedora path minter breaks up the identifier into pairtree elements.

This could be fixed at the Fedora4 level -- including the pairtree in the URI seems like bad practice of exposing underlying technology.  There's no actual (as far as I can tell) value in having those path segments.  If that were done ... would #1 be acceptable?

Rob



You received this message because you are subscribed to the Google Groups "Fedora Tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech...@googlegroups.com.
To post to this group, send email to fedor...@googlegroups.com.
Visit this group at https://groups.google.com/group/fedora-tech.
For more options, visit https://groups.google.com/d/optout.



--
Rob Sanderson
Information Standards Advocate
Digital Library Systems and Services
Stanford, CA 94305

Kevin S. Clarke

unread,
Mar 30, 2016, 6:21:55 PM3/30/16
to Stefano Cossu, fedor...@googlegroups.com, Kevin S. Clarke, iiif-d...@googlegroups.com
On Wed, Mar 30, 2016 at 6:10 PM, Stefano Cossu <sco...@artic.edu> wrote:

I guess that if you are dealing with Fedora4 it will be hard to evade SPARQL :)

True :-)

I know I need to dig in (or find a way to get it to sink in).
 
Again, I am not a Loris implementer but I recall talking with Jon about Loris caching source images. If that is true, there should be no reason to set up periodic pulls. The image would be retrieved on demand and cached for X time, so next request to the same image would bypass the SPARQL query and the Fedora pull. Anybody stop me if I am talking nonsense.

Right, by periodic I just meant it would continue to be pulled as needed vs. what I'm doing of keeping the JP2s on the image server (I didn't mean pulled on a specific schedule or anything -- sorry, I should have phrased that better). I think you're accurately describing what Loris does (though someone should feel to correct me as well).

Kevin

Stefano Cossu

unread,
Mar 30, 2016, 6:25:56 PM3/30/16
to Robert Sanderson, Fedora Tech, Kevin S. Clarke, iiif-d...@googlegroups.com
Rob,
The unfortunate pairtree splitting is mandated by a current limitation at the Modeshape level. If that issue were removed, #1 may be acceptable but it would still require that all the resources are at the same level, under the same root (i.e. no multi-tenancy) and all the source PTIFF/JP2s have the same sub-path. I still see a linked-data approach as a more solid solution.

Stefano

Trey Pendragon

unread,
Mar 30, 2016, 6:50:05 PM3/30/16
to IIIF Discuss, fedor...@googlegroups.com
These all seem like pretty involved options - especially #1! I wonder what the performance of something like this would be:

A) Resolver gets first request for an ID (if it's in Hydra-land, it's probably a NOID), application expands the NOID through some mapping to get the Fedora Pairtree URL. Resolver sends a GET, obtains the JP2, caches it with the etag.

B) Resolver gets second request for that ID. Resolver sends a GET to Fedora with the e-tag. 304 is returned, use the cached JP2. If 200 is returned (e-tag mismatch), re-download, cache, and use that.

No complicated message systems or extra architecture. It's predicated on 304 being fast enough, but I really hope it is or what's the point of having 304?

Jason Ronallo

unread,
Mar 30, 2016, 8:02:05 PM3/30/16
to iiif-d...@googlegroups.com, fedor...@googlegroups.com
Very interesting to hear all the current approaches and consider the
proposed approaches to resolving JP2s as well as other workflow and
caching considerations.

Since a pairtree resolver was mentioned before, I don't know if this
helps, but writing a pairtree resolver for an image server ought to be
trivial. Here's the pairtree resolver implementation in my
iiif-image-server-node:
https://github.com/jronallo/iiif-image-server-node/blob/master/src/resolver.coffee#L26

I'd also be willing to accept either pull requests or pseudo-code for
other more complicated resolver implementations. I'm willing to
incorporate new implementations even just to have a proof of concept
resolver for working with Fedora or otherwise.

I've also found the discussion about caching very interesting.
iiif-image-server-node does not handle JP2 caching yet, but I am doing
some caching based on a profile document [1] that may be interesting
to others. The profile document contains which different image sizes
and crops get created to warm the cache. These would be image regions
and sizes commonly used for search results pages and show views across
various applications. This profile is used both to warm the cache and
to clean the cache. Any IIIF path that's in the profile can get cached
for longer than images (like tiles) that are not in the profile
document. And the info.json gets cached as well.

For the info.json and JPG derivatives my approach (though not the only
approach that iiif-image-server-node can work with) is to deploy the
application behind nginx and Passenger and to cache everything to the
public directory of the application (actually a symlink). This means
that for images that are cached the image server is never hit and
nginx serves up the info.json and images directly. So it won't give
the performance of having this all in RAM but is simple and does
improve performance.

I figured that different folks would have different approaches to
resolving files and caching, so I started with creating Node modules
which could be the building blocks for creating your own image server.
If you like that idea and don't mind Node you can check out the
modules here: https://github.com/jronallo/iiif-image Maybe this could
give you a head start if you really need something custom for
Fedora/Hydra.

I also want to thank folks for this discussion as it has given me the
idea to create an API endpoint in the image server to warm the cache
for a particular identifier. This will allow my JP2 processing script
to hit a URL to create at least the info.json as soon as the JP2 is
created.
https://github.com/jronallo/iiif-image-server-node/issues/1

Best,

Jason

[1] Here's an example:
https://github.com/jronallo/iiif-image/blob/master/config/profile.yml

Stefano Cossu

unread,
Mar 31, 2016, 6:42:02 PM3/31/16
to Trey Pendragon, IIIF Discuss, fedor...@googlegroups.com
Trey,
+1 for using e-tags.

I am not very familiar with Hydra's NOIDs, but are they something that you can infer a Fedora URI from without an additional HTTP call?

Stefano

Stefano Cossu

unread,
Apr 21, 2016, 11:43:38 PM4/21/16
to Trey Pendragon, IIIF Discuss, fedor...@googlegroups.com
I found a couple spare hours and wrote a simple triplestore-based Fedora resolver for Loris. It needs several improvements to be production-worthy, but it works. I have submitted a pull request: https://github.com/loris-imageserver/loris/pull/208

Stefano
--

116 S. Michigan Ave.
Chicago, IL 60603
312-499-4026

Robert Sanderson

unread,
May 4, 2016, 11:49:51 AM5/4/16
to Fedora Tech, iiif-d...@googlegroups.com

Hi Christopher,

You could try using the JSON-LD framing algorithm to turn the expanded form into the right compacted form.
There are frames provided in the JSON LD implementation notes document:

With a (very) minimal python example of how to use them.

Please let us know if that approach works for you, as it will doubtless be a common question!

Rob


On Wed, May 4, 2016 at 8:08 AM, Christopher Johnson <chjoh...@gmail.com> wrote:
I am using the fcrepo camel triplestore indexer in apache karaf with fedora 4.  It seems that it could work very well as a method to sync repository changes with a IIIF front end presentation api served by djatoka simply via SPARQL CONSTRUCT query with a JSON-LD response.  The client could then control the construction of the manifest directly with a query, which as far as I can tell, is not that easy to do with other pipelines.  The client request that generates the IIIF formatted manifest could even specify the image server or a fallback location depending on the query.  This means that all of the manifest metadata for the object is pushed from fcrepo to the triplestore where it is available for the client on demand.  I think that the image server does not need to know anything about SPARQL, just how to handle subsequent api queries to the manifest.  

The main problem that I am having is that the IIIF presentation API does not conform to the "standard" Jena RIOT format of JSON-LD.  For example in this malformed result, @context is a root element of its own (without values) and the row results are returned in an array of @graph.
{
"@graph" : [ {
"@id" : "http://iiif.io/api/presentation/2/context.json/resource",
"http://iiif.io/api/presentation/2/context.json/format" : [ "image/jpeg" ],
"http://iiif.io/api/presentation/2/context.json/service" : [ "AIG_546__OSmitUK_42.jpg" ]
}, {
"@id" : "http://www.example.org/iiif/book1/annotation/p0001-image",
"@type" : "http://www.w3.org/ns/oa#Annotation",
"motivation" : "http://iiif.io/api/presentation/2#painting"
} ],
"@context" : {
"motivation" : {
"@id" : "http://iiif.io/api/presentation/2/context.json/motivation",
"@type" : "@id"
},
"format" : {
"@id" : "http://iiif.io/api/presentation/2/context.json/format",
"@type" : "http://www.w3.org/2001/XMLSchema#string"
},
"service" : {
"@id" : "http://iiif.io/api/presentation/2/context.json/service",
"@type" : "http://www.w3.org/2001/XMLSchema#string"
}
}
}

 I do not know if it is possible to make a custom output formatter for Jena RIOT for IIIF, to parse and reformat multiple SPARQL requests at the client side, to modify the query somehow (or abandon this idea all together!)

Cheers,
Christopher


On Friday, April 22, 2016 at 5:43:38 AM UTC+2, Stefano Cossu wrote:
I found a couple spare hours and wrote a simple triplestore-based Fedora resolver for Loris. It needs several improvements to be production-worthy, but it works. I have submitted a pull request: https://github.com/loris-imageserver/loris/pull/208

Stefano

On 03/31/2016 05:41 PM, Stefano Cossu wrote:
Trey,
+1 for using e-tags. 

I am not very familiar with Hydra's NOIDs, but are they something that you can infer a Fedora URI from without an additional HTTP call? 

Stefano


On Wed, 30 Mar 2016 15:50:04 -0700 (PDT)
Trey Pendragon <tr...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "Fedora Tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech...@googlegroups.com.

To post to this group, send email to fedor...@googlegroups.com.
Visit this group at https://groups.google.com/group/fedora-tech.
For more options, visit https://groups.google.com/d/optout.



--
Rob Sanderson
Semantic Architect
The Getty Trust
Los Angeles, CA 90049

Christopher Johnson

unread,
May 10, 2016, 7:32:18 AM5/10/16
to IIIF Discuss, fedor...@googlegroups.com
Hi Rob,

Thank you for the clues and help.  I may have a solution, but a few questions about a canonical format for the client remain.  This gist shows a normalized N-Triples version of the "Test 1 Manifest: Minimum Required Fields" JSON-LD sample.  This would be a basic template for the SPARQL CONSTRUCT query.  You will notice that when the JSON-LD tree is normalized, the position of the list elements are noted with rdf:first and rdf:rest and bnodes are added.  When the SPARQL results are returned as JSON-LD and compacted with the context they look like this.  Then, they need to be framed and they look like this.  So, the process (construct normalized JSON-LD result from SPARQL, compact, and frame) seems to work.  

I do not yet know how the client (I am probably using Mirador) handles the graph element and the traces of normalization.  I assume that it should be aware of this variation.  I will use the jsonld npm and do this with node.js with some kind of extension in the client.  

For my use case, the question of synchronizing binaries in Fedora with the Image Server seems solved with fcrepo-serialization 

Cheers,
Christopher
Reply all
Reply to author
Forward
0 new messages