IIIF image server implementations for large collections

993 views
Skip to first unread message

Stefano Cossu

unread,
Nov 27, 2018, 8:12:30 PM11/27/18
to iiif-d...@googlegroups.com

Hello,

The Getty (yes, I’m working there now) is seeking to adopt a IIIF image server capable of handling a very large collection of images (starting with 100K and ramping up to many millions over the course of our project timeline). Our main selection criteria are efficiency, scalability, and maturity/stability of the code base. We are looking for something that we don’t have to fork and maintain.

 

If you are running large or very large public IIIF image collections, I’d love to hear your experience about your server setup, image formats, etc. I will also be in Edinburgh next week so I hope there will be a chance to talk in person.

 

Thanks!

Stefano

Edwin DR

unread,
Mar 5, 2019, 9:32:09 AM3/5/19
to IIIF Discuss
I'd also like to add that I'm interested if anyone has experiences with large collections. We also have millions of images with sizes ranging from 50 MB to 12 GB.  
I'm also in the process of figuring out which server to use for best performance and am currently only in the beginning stages of my test lab environment.

We started out our test with Cantaloupe IIIF server which is incredibly user friendly to setup compared to some others. I quickly managed to figure out how to generate our manifests dynamically pulling metadata from our collection database. 
A few pythonscripts to ingest images and convert them to pyramidal tifs and afterwards place them into a folderstructure based on a naming logic.

And I currently run tests with 2 viewers :
-leaflet-iiif which uses the image API
-mirador 3 (unfinished development version) which uses the presentation API to load the images.

Everything works as expected, but I am let down by the slowness at which tiles get loaded when zooming in either leaflet or mirador, so i'm guessing the performance issues are either :
- Test Server too slow ? : 8 vcpu , 16 gb ram
- we use pyramid-tiffs, created with vips using 512*512 tiling , should I use different sizes of tiles ? Or use JP2?
- or cantaloupe can maybe not perform faster than what i'm currently experiencing


My next test will be with IIPimage hoping it has better performance, but it's a bit more complicated to setup and i'm not really sure yet how i'm going to implement a solution to resolve my identifiers in the IIIF url to the correct path on the server with IIPimage. Cantaloupe had a delegate script in ruby for this which could return the correct path for any identifier based on a logic you could program.
Since I can't dump 1.000.000 files in 1 folder, I split them up in subfolders based on their first 3 characters so ZR42423.tif , ends up in basepath/Z/R/4/ZR42323.tif

We currently have a projectsite made by a third party running on IIPimage, although I don't think it uses IIIF protocol and the performance on that site is exactly as it should be, so I'm hoping to get a better result than with cantaloupe.
It also seems there is a partnership for a IIPimage package with kakadu JP2 incorporated for which you can pay an initial fee and afterwards a lower yearly one (www.iiifserver.com), if they have a free trial I might try out that one as well.

Anyone else already running a very large image collection with IIIF ?

Stefano Cossu

unread,
Mar 5, 2019, 1:11:43 PM3/5/19
to iiif-d...@googlegroups.com
Hi Edwin,
I'm glad to hear that you are performing some performance tests as well
and I'd be curious to compare our results.

As of now our tests are complete and I am compiling a results report
that, if it gets accepted, I intend to share at the next Community
Meeting in Göttingen. In short, IIPImage with pyramidal TIFFs performed
better across the board, not only in terms of performance but also of
resilience to stress.

Pyramidal TIFFs with internal JPEG compression (90%) are overall faster
decoding (maybe encoding too, but the tools I had available at the time
were single-threaded versus a 4-thread Kakadu) and, surprisingly, yield
a better quality for a similar (+/- 10%) file size. I am of course
interested in hearing from anyone who had a different experience.

I agree that the ease of writing a resolver in Cantaloupe is a major
advantage of that platform, and the reason why we used it in my previous
institution. However, for a large deployment, I see that as a one-time
effort that will be counterbalanced by performance and maintenance
factors in the mid- to long term. Our sources are in an S3 bucket, which
IIPImage cannot access directly. Instead of shoehorning some S3
functionality into IIPImage, we waant to either use the AWS Storage
Gateway [1] that promises to mount S3 buckets as a remote filesystem
with good performance; or use a shim (which we have to build anyway for
auth and other uses) that intercepts requests to the IIIF servers and
fetches uncached sources from S3 and caches them on a local volume that
IIPImage can access. Neither of these solutions has been detailed yet.

I am also in the middle of writing a Python script to convert original
TIFF to pyramidal TIFF (thanks to extensive discussion with Dave
Beaudet). I can share that too once I test it out.

Happy to discuss further if you have more questions.

Best,
Stefano

[1]
https://docs.aws.amazon.com/storagegateway/latest/userguide/WhatIsStorageGateway.html

> --
> -- You received this message because you are subscribed to the
> IIIF-Discuss Google group. To post to this group, send email to
> iiif-d...@googlegroups.com. To unsubscribe from this group, send
> email to iiif-discuss...@googlegroups.com. For more options,
> visit this group at https://groups.google.com/d/forum/iiif-discuss?hl=en
> ---
> You received this message because you are subscribed to the Google
> Groups "IIIF Discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to iiif-discuss...@googlegroups.com
> <mailto:iiif-discuss...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Edwin DR

unread,
Mar 5, 2019, 2:08:38 PM3/5/19
to IIIF Discuss
Good to hear about IIPimage and Pyramidal Tiffs. Over the coming weeks I will concentrate my tests on IIPimage.
As for a python script to convert tif to pyr-tif.:

import pyvips
import os
'''Run through all the files in a specific directory and convert them all to pyramidal tif format to a new folder'''

def convertImagesDirectory(inputDirectory,outputDirectory):
   for dirpath,_,filenames in os.walk(inputDirectory):
       for f in filenames:
           fullpath = os.path.abspath(os.path.join(dirpath, f))
           onlyname = os.path.splitext(f)[0]
           image = pyvips.Image.new_from_file(fullpath, access='sequential')
           image.tiffsave(outputDirectory+onlyname +'.tif',compression='jpeg',Q='100',tile=True,tile_width=256,tile_height=256,pyramid=True)


convertImagesDirectory('/IIIF/ingest','/IIIF/output/')

Stefano Cossu

unread,
Mar 5, 2019, 2:29:22 PM3/5/19
to iiif-d...@googlegroups.com
Yeah, my script is very similar at the core. I just added some quality
control and normalization steps such as color profile, layers, alpha
channel, etc. but that depends on where QC sits in your pipeline.

Stefano

> <http://www.iiifserver.com>), if they have a free trial I might

> > iiif-d...@googlegroups.com <javascript:>. To unsubscribe from
> this group, send
> > email to iiif-discuss...@googlegroups.com <javascript:>. For more
> options,


> > visit this group at
> https://groups.google.com/d/forum/iiif-discuss?hl=en
> <https://groups.google.com/d/forum/iiif-discuss?hl=en>
> > ---
> > You received this message because you are subscribed to the Google
> > Groups "IIIF Discuss" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send

> > an email to iiif-discuss...@googlegroups.com <javascript:>
> > <mailto:iiif-discuss...@googlegroups.com <javascript:>>.


> > For more options, visit https://groups.google.com/d/optout

> <https://groups.google.com/d/optout>.


>
> --
> -- You received this message because you are subscribed to the
> IIIF-Discuss Google group. To post to this group, send email to
> iiif-d...@googlegroups.com. To unsubscribe from this group, send
> email to iiif-discuss...@googlegroups.com. For more options,
> visit this group at https://groups.google.com/d/forum/iiif-discuss?hl=en
> ---
> You received this message because you are subscribed to the Google
> Groups "IIIF Discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to iiif-discuss...@googlegroups.com

> <mailto:iiif-discuss...@googlegroups.com>.

Stefano Cossu

unread,
Mar 5, 2019, 2:30:09 PM3/5/19
to iiif-d...@googlegroups.com
Also, do you need to set Q to 100 or is it just for testing?

On 3/5/19 11:08 AM, Edwin DR wrote:

> <http://www.iiifserver.com>), if they have a free trial I might

> > iiif-d...@googlegroups.com <javascript:>. To unsubscribe from
> this group, send
> > email to iiif-discuss...@googlegroups.com <javascript:>. For more
> options,


> > visit this group at
> https://groups.google.com/d/forum/iiif-discuss?hl=en
> <https://groups.google.com/d/forum/iiif-discuss?hl=en>
> > ---
> > You received this message because you are subscribed to the Google
> > Groups "IIIF Discuss" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send

> > an email to iiif-discuss...@googlegroups.com <javascript:>
> > <mailto:iiif-discuss...@googlegroups.com <javascript:>>.


> > For more options, visit https://groups.google.com/d/optout

> <https://groups.google.com/d/optout>.


>
> --
> -- You received this message because you are subscribed to the
> IIIF-Discuss Google group. To post to this group, send email to
> iiif-d...@googlegroups.com. To unsubscribe from this group, send
> email to iiif-discuss...@googlegroups.com. For more options,
> visit this group at https://groups.google.com/d/forum/iiif-discuss?hl=en
> ---
> You received this message because you are subscribed to the Google
> Groups "IIIF Discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to iiif-discuss...@googlegroups.com

> <mailto:iiif-discuss...@googlegroups.com>.

Andrew Hankinson

unread,
Mar 5, 2019, 2:33:23 PM3/5/19
to iiif-d...@googlegroups.com
Hi Stefano,

Thanks for sharing your results on image server testing. I look forward to hearing your paper in Göttingen!

You said that Pyramid TIFFs with JPEG compression "surprisingly, yield a better quality for a similar (+/- 10%) file size." How did you measure the quality? Did you use some sort of image comparison / diffing, or was it 'by eyeball'?

-Andrew
> To unsubscribe from this group and stop receiving emails from it, send an email to iiif-discuss...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Stefano Cossu

unread,
Mar 5, 2019, 2:43:10 PM3/5/19
to iiif-d...@googlegroups.com
Hi Andrew,
I worked with our Imaging department which ran the resulting images
through some overlay filters. See attached. On the left is the original
image, on the right, as labeled, the pixel-wise delta of the
derivatives. Black pixels are exactly the same as the original, white
ones differ.

One thing I noticed is that the PTIFF/JPEG compression varies its
quality with ligthness, i.e. highlights are less faithful than shadows,
but overall, the PTIFF image looks "darker", i.e. closer to the original.

Stefano

qc_b_s.png

Andrew Hankinson

unread,
Mar 5, 2019, 2:50:09 PM3/5/19
to iiif-d...@googlegroups.com
That's really interesting; thanks! It looks like there was a significant difference in them. Did you notice any perceptible difference between the two output images as well? Or was it just down to machine comparison to find the differences?
> <qc_b_s.png>

Stefano Cossu

unread,
Mar 5, 2019, 3:48:50 PM3/5/19
to iiif-d...@googlegroups.com
To be honest, I cannot notice a naked-eye difference, so that may be
just a measurable, but not quite perceivable, difference.
Stefano

Simone Ceccolini

unread,
Mar 6, 2019, 10:03:14 AM3/6/19
to IIIF Discuss
Hi Stefano,

we're working with Mirador + Cantaloupe Image Server and we're managing about 20K++ images, about 500 images for each collection. We adopt Cantaloupe because of the flexibility of its configuration, the support of plenty formats and the deployments modes which it offers. If you want to find out more, don't hesitate to ask.

Regards,
Simone


On Wednesday, November 28, 2018 at 2:12:30 AM UTC+1, Stefano Cossu wrote:

Edwin DR

unread,
Mar 6, 2019, 10:38:17 AM3/6/19
to IIIF Discuss
I was testing our cantaloupe setup with tiling and quality variations for pyramidal tif

I made 6 versions of the same image (2.4 GB)

256 x 256
100% quality
90%
80%

512 x 512
100% quality
90%
80%

None of it made any difference to get the performance to an acceptable level for my pyramidal tif , until I switched out the tif processor for cantaloupe with JaiProcessor instead of Java2dProcessor ...

The performance of zooming and loading of tiles went up by 1000 %, I can smoothly zoom in leaflet , mirador and IIPMooviewer now, with any of the tile or quality versions.

However I read somewhere that jaiprocessor is no longer being developed or maintained, so not sure if this will pose a problem in the future. But the difference atleast for my test file is incredible.

Stefano Cossu

unread,
Mar 6, 2019, 10:59:53 AM3/6/19
to iiif-d...@googlegroups.com, Edwin DR
> However I read somewhere that jaiprocessor is no longer being developed
> or maintained, so not sure if this will pose a problem in the future.
> But the difference atleast for my test file is incredible.
>

I noticed that in the Cantaloupe documentation and that is why I decided
not to pursue any testing with Cantaloupe and PTIFFs. Stability is more
important than speed for our project.

The 10x speedup is impressive, though! My results with IIPImage are
somewhat different. I wonder if the 100% compression quality gives PTIFF
an extra edge (I am not as expert as to know whether that corresponds to
no compression at all, a lossless compression, or what else).

Stefano
> --
> -- You received this message because you are subscribed to the
> IIIF-Discuss Google group. To post to this group, send email to
> iiif-d...@googlegroups.com. To unsubscribe from this group, send
> email to iiif-discuss...@googlegroups.com. For more options,
> visit this group at https://groups.google.com/d/forum/iiif-discuss?hl=en
> ---
> You received this message because you are subscribed to the Google
> Groups "IIIF Discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to iiif-discuss...@googlegroups.com
> <mailto:iiif-discuss...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Stefano Cossu
Software Architect
J. Paul Getty Trust

Stefano Cossu

unread,
Mar 6, 2019, 8:46:20 PM3/6/19
to iiif-d...@googlegroups.com
Thanks all for your feedback.

A question for all who are using Cantaloupe: have you experienced any
dropped requests (HTTP 502)?

Errors come from the HTTP server which is dropping the connection after
timing out (TTL is at least a minute). Under very heavy load, the error
rate becomes quite significant (18% for 1K concurrent connections
requesting uncached images).

I saw this happen at the Art Institute of Chicago where I used to work,
as well as in my recent stress tests, usually under a certain load. At
AIC, in a production setup, failures would keep going up until we had to
restart the server periodically. In the stress test setup, the server
keeps running at a constant error rate.

Stefano

On 3/6/19 7:38 AM, Edwin DR wrote:

Simone Ceccolini

unread,
Mar 7, 2019, 3:42:03 AM3/7/19
to IIIF Discuss
Hi Stefano,

at the beginning of the adoption of Cantaloupe we experienced some errors. However, after a proper tuning and configuration now it's a solid part of our iiif solutions. If you need I can share with you the configuration details.

Thanks
Simone

Andrew Hankinson

unread,
Mar 7, 2019, 4:53:34 AM3/7/19
to iiif-d...@googlegroups.com

...snip...
>
> The 10x speedup is impressive, though! My results with IIPImage are
> somewhat different. I wonder if the 100% compression quality gives PTIFF
> an extra edge (I am not as expert as to know whether that corresponds to
> no compression at all, a lossless compression, or what else).

JPEG is always lossy compression, so if you are choosing TIFF with JPEG compression you are saving a lossy version, even at Q=100.
> To unsubscribe from this group and stop receiving emails from it, send an email to iiif-discuss...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Kevin Ford

unread,
Mar 7, 2019, 7:59:43 AM3/7/19
to iiif-d...@googlegroups.com, Stefano Cossu
What Stefano described about our experience at AIC is accurate, but
permit me to add a few details for anyone who may benefit:

Everything has been quiet for some time, as I understand, but we
encountered this under specific conditions: primarily it was when our
website suddenly started requesting a new size and secondarily that size
was on the large side. The first time we saw this was with full/full
requests, which, after some research, Cantaloupe might not cache (I was
never clear about this and my memory is fuzzy at this point). (Also, we
use JP2s, which may or may not exacerbate this, I dunno.) From what I
can tell, the server would struggle with generating the large, requested
derivative when having to handle many simultaneous requests.

This was ameliorated, but not eliminated, by adding memory to the
machine and configuring Cantaloupe to make use of it. Once Cantaloupe
had cached derivatives, this issue subsides.

All the best,
Kevin
Reply all
Reply to author
Forward
0 new messages