Putting a cache in front of IIIF image servers

150 views
Skip to first unread message

Dan Field

unread,
Nov 8, 2017, 11:59:31 AM11/8/17
to iiif-d...@googlegroups.com
Is there any benefit to running image servers behind a reverse proxy with a disk based cache like Apache Traffic Server? Our current IIIF infrastructure has a small cluseter of IIP servers with shared fibre channel storage behind an Apache reverse proxy but I'm looking to possibly put the cache right at the top of the stack. Does this work well in a zoomable image scenario where each tile request would be cached upstream? Curious to hear experiences or theoretical issues.

Also is there a way to extract each tile at each level from a jp2 in order to get an idea of the total storage requirement per jp2 if caching tiles? I have access to kakadu and openjpeg tools if that helps. 

-- 
Dan Field <d...@llgc.org.uk>                   Ffôn/Tel. +44 1970 632 582
Pennaeth Isadran Datblygu                     Head of Development Section
Llyfrgell Genedlaethol Cymru                  National Library of Wales

Adams, Chris

unread,
Nov 8, 2017, 1:28:44 PM11/8/17
to iiif-d...@googlegroups.com
Yes: if your image server doesn’t aggressively cache tiles, putting a cheap storage pool in front of it will help your most active content substantially. If your image server does cache tiles but the cache isn’t shared across image servers, even a caching IIIF server would benefit because you wouldn’t see the same image master being pulled off of storage by multiple servers because a load-balancer evenly distributed the incoming requests across the backend pool (that's also an argument for having load-balancers use something like the client IP or the image identifier to favor sending requests to the same backend server). Zoomable viewers like OpenSeadragon are good at generating requests for multiple tiles across multiple zoom levels, so having a warm cache for the image master is usually a win.

One aspect to consider: when do you need to purge the cache? You'd want to automate the process of purging tiles when the upstream image changes if you can't incorporate something like a last-modified timestamp or fixity into the image URL itself. Some caches & CDNs support a Cache-Key header which could be set to something like the image identifier and would then be used to purge all of the cached objects sharing a cache-key when content is updated.

Chris

From: iiif-d...@googlegroups.com [mailto:iiif-d...@googlegroups.com] On Behalf Of Dan Field
Sent: Wednesday, November 8, 2017 11:59 AM
To: iiif-d...@googlegroups.com
Subject: [IIIF-Discuss] Putting a cache in front of IIIF image servers

Is there any benefit to running image servers behind a reverse proxy with a disk based cache like Apache Traffic Server? Our current IIIF infrastructure has a small cluseter of IIP servers with shared fibre channel storage behind an Apache reverse proxy but I'm looking to possibly put the cache right at the top of the stack. Does this work well in a zoomable image scenario where each tile request would be cached upstream? Curious to hear experiences or theoretical issues.

Also is there a way to extract each tile at each level from a jp2 in order to get an idea of the total storage requirement per jp2 if caching tiles? I have access to kakadu and openjpeg tools if that helps. 

-- 
Dan Field <mailto:d...@llgc.org.uk>                   Ffôn/Tel. +44 1970 632 582
Pennaeth Isadran Datblygu                     Head of Development Section
Llyfrgell Genedlaethol Cymru                  National Library of Wales

--
-- You received this message because you are subscribed to the IIIF-Discuss Google group. To post to this group, send email to mailto:iiif-d...@googlegroups.com. To unsubscribe from this group, send email to mailto:iiif-discuss...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/iiif-discuss?hl=en
---
You received this message because you are subscribed to the Google Groups "IIIF Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mailto:iiif-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrew Hankinson

unread,
Nov 8, 2017, 5:08:19 PM11/8/17
to iiif-d...@googlegroups.com
Yes; no matter how good your tile server is, it can still be helped by implementing caching. We have two levels of cache; one on the image server (we use IIP with Memcached) and one on our front-end server (we use NginX's built-in caching mechanisms). However, we assume the front-end cache is volatile, so it will automatically be purged after a certain amount of time. The IIP cache will also be purged, but with a longer timeout.

I would estimate that, should you want to "pre-warm" your cache with all your tiles, then the theoretical limit would be somewhere around 1/2 to 1/3 the size of the JP2, assuming a ~90% JPEG tile quality. Since it's the same quality per tile as it is for the whole image, I would think that you could get a good estimate by requesting the entire image at a given zoom level, and then adding a few KB for overhead.
> --
> -- You received this message because you are subscribed to the IIIF-Discuss Google group. To post to this group, send email to iiif-d...@googlegroups.com. To unsubscribe from this group, send email to iiif-discuss...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/iiif-discuss?hl=en
> ---
> You received this message because you are subscribed to the Google Groups "IIIF Discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to iiif-discuss...@googlegroups.com.

David Beaudet

unread,
Nov 9, 2017, 9:40:01 AM11/9/17
to IIIF Discuss

I think this greatly depends on the traffic you're seeing as well as how you're using the IIIF Image API and on the sizes and formats of your images and on the JP2 library you're using (since you seem to be using JP2 sources).  It's possible that you might find IIP's caching alone is sufficient without having to introduce additional layers of caching, but you'll really only know by running performance tests that simulate a load greater than the maximum you've experienced to date, then relieve the bottleneck and repeat.

A CDN can also come in handy and eliminate the need to roll your own secondary cache.  A CDN can bring your images closer to the origin of the request and you might get H2 support, IPV6, and other network enhancements as added benefits.

I have released some enhancements to IIP that might not have made it back into the general release yet.  Some of those are caching enhancements that allow memcache to be used for the tile caching rather than just the responses.  I find that cuts down tremendously on the overall memory consumption of IIP since the tile cache is no longer duplicated in each IIP process.  In a multi-server config, you could have a single hefty memcache server that functions as a cache for multiple IIP servers with caching of the responses and tiles.  The only remaining per-process cache is the image header cache which is generally pretty small compared to the other caches.  There are also some significant quality improvement options, color profile support, PNG support, native IIIF URL support, progressive JPEG support, and some other tweaks.  If you're using IIP, you might want to give it a whirl although I have not tested exhaustively with JP2s since we use pyramidal tiffs as IIIF source files. 

Dave Beaudet
National Gallery of Art, Washington
Reply all
Reply to author
Forward
0 new messages