I am preparing some benchmark tools to compare retrieval and handling of
JP2 and PTIFF images with the goal of deciding which format the Getty
will use for the source images that are handled by the image server.
In preparation to that, I'd like to better understand how the JPEG2000
"partial retrieval" fetaure works and how it may affect my measurements.
1. I understand that the JPEG2000 specs allow to retrieve a region of an
image rather than loading the whole file (likely from slow storage or
over the network) for each transformation. That sounds like a great
advantage, however in most cases, region retrieval is used for tiles,
which are often retrieved in larger groups at a time. Therefore it seems
to me that the advantage over retrieving a whole image once, caching it
on fast storage and/or memory cache, and generating tiles sequentially
would be less dramatic. Thoughts?
2. Does PTIFF have anything comparable to this region retrieval feature?
3. I foresee the main processing and UX bottleneck being generating a
thumbnail from a large source image, which will not be displayed as a
larger derivative in the immediate term. In this case, the full-size
image must be retrieved and cached, and the process ggets slow e.g. in
an index page with many uncached thumbnails. I have heard of "wavelet"
sampling with JP2. How does it work? Does it save data retrieval? I am
not sure how that can be since my understanding is that if you do any
kind of interpolation when resizing an image, you still need all the
information to interpolate the pixels. Does JP2 offer any concrete
advantage over TIFF in this regard?
Thanks for any insight you may provide.
Stefano
--
-- You received this message because you are subscribed to the IIIF-Discuss Google group. To post to this group, send email to iiif-d...@googlegroups.com. To unsubscribe from this group, send email to iiif-discuss...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/iiif-discuss?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "IIIF Discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/iiif-discuss/OOkBKT8P3Y4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to iiif-discuss...@googlegroups.com.
You received this message because you are subscribed to the Google Groups "IIIF Discuss" group.To unsubscribe from this group and stop receiving emails from it, send an email to iiif-discuss...@googlegroups.com.
See comments in line below.
On 1/3/19 4:24 AM, Andrew Hankinson wrote:
> Hi Stefano,
>
> First, I stumbled across these two pages when researching your question. They seem like they would be generally useful for the whole IIIF Community. I haven't seen them referenced anywhere before.
>
> https://github.com/plroit/Skyreach/wiki/Understanding-scalability-in-JPEG2000
> https://github.com/plroit/Skyreach/wiki/Introduction-to-JPEG2000-Structure-and-Layout
These are definitely worth adding to a resource collection, e.g. Awesome
IIIF.
>
> In answer to your questions:
>
> 1. Kinda. You would have to dig deep into a particular JPEG2000 implementation, but I suspect the clever ones only read the parts of the file that correspond to the data being requested. JPEG2000 files also implement a 'pyramid' encoding, with multiple resolution layers in a single file.
>
> The structure of a JPEG2000 file header provides an index into the data, so it's actually quite fast to read that, find out the specific location of the data you need, seek to that location, pull it out and send it along. So it doesn't necessarily need to pull the full file over the network.
>
> That would, however, depend on whether the network protocol supports this sort of activity (file seeking). NFS probably does; HTTP probably not, without some sort of magic?
HTTP supports byte ranges, however that depends on the server.
>
> Again -- this is just based on my own observations, and may not be correct. We run all of our image servers over NFS, and while it can be a bit pokey on larger images (~1s for large image thumbnails) it's not as slow as I would expect it to be if we were pulling, say, 100 * 100MB (~10GB) JPEG2000 files over the wire. So there must be some sort of optimization happening at the server level.
>
> 2. Yes, Pyramid TIFF also (AFAIK) has an index for individual layers and tiles. When you create PTIFs you specify the tile sizes, and I believe this also creates an index to help readers seek to and extract the data from the file for a given region and size.
That's good to know.
>
> 3. The main processing and UX bottleneck won't be generating a thumbnail from a large image, given (1), since it will only request the bits of data it needs from the file. If your JPEG2000 files have been encoded properly, then you should be able to request the lower resolution 'pyramid' and send that data over, not touching the larger resolution parts of the file. Transformations and interpolations are then done on the smaller representations, not on the full file.
>
> Wavelet encoding *is* the JPEG2000 encoding format. It's not anything special.
>
> JPEG2000 offers several advantages over Pyramid TIFF (which vary, of course, on what sort of compression you are using on the images in the TIFF):
>
> - Much smaller file size. In a full resolution uncompressed lossless TIFF file (say, 200MB) with five resolutions, a pyramid TIFF would be 200 + 100 + 50 + 25 + 12.5 = 387.5 MB. Most of this is just 'wasted' space. The same JPEG2000 file would probably be on the order of 80-100MB, lossless. For larger collections, this is likely a significant factor.
TIFF pyramids can be JPEG compressed, however even then a quick test
with a small image resulted in a PTIFF file being much larger than a
JP2. This may be far off however, and I intend performing a conversion
on a full set of sample full-size images with different sizes, aspect
ratios, and color distribution. These images will be published as open
content, so it will be possible for anyone to compare size and quality.
It could be, however, that the wavelet algorithm is just smarter and
more evolved and may yield better quality with a smaller file size.
>
> - Smarter data storage. JPEG2000 can store 'lossy' representations of the image, even in a file with 'lossless' compression. With PTIF, the data is either compressed or uncompressed -- there is no way to say 'store the full image uncompressed, but smaller representations compressed.'
>
> - Smarter interpolation. I *think* that JPEG 2000 is able to support requests for images between the layers. Pyramid TIFF is only able to serve data from the sizes present, so if you have five layers you're only able to support five sizes (with any resizing of those layers happening after the fact). JPEG2000 will let you request sizes in between the layers and it will do the interpolation for you. Again, I think this is true, but would like to be corrected if not.
>
> On the other hand, I have observed higher CPU usage for JPEG2000 over PTIF.
This is definitely a decision factor, and one of the reasons why I am
performing this comparison. We have millions of images to process in the
next few years, and both storage size and computation power will be
important to evaluate the costs.
>
> For reference, the JPEG2000 kdu_compress command we use to compress images is:
>
> kdu_compress -i input.tif -o output.jp2 Clevels=6 Clayers=6 “Cprecincts={256,256},{256,256},{128,128}” “Stiles={512,512}” Corder=RPCL ORGgen_plt=yes ORGtparts=R “Cblk={64,64}” Cuse_sop=yes Cuse_eph=yes -flush_period 1024 Creversible=yes -rate -
Thanks. I have used that (actually, the lossy variant) as my reference
conversion command.
https://tools.ietf.org/html/rfc7233
As I mentioned in my other posts, this is left up to the server.
Stefano
On 1/3/19 11:25 AM, David Beaudet wrote:
> In theory, it should be possible to use http's range request to do this.
>
> On Thu, Jan 3, 2019, 14:14 Kevin S. Clarke <kscl...@ksclarke.io
> <mailto:kscl...@ksclarke.io> wrote:
>
> I'm curious if there are folks on this list who are using JP2 over
> HTTP, and whether they're pulling the whole file or have figured out
> the magic mentioned below(?)
>
> Thanks,
> Kevin
>
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Thursday, January 3, 2019 7:24 AM, Andrew Hankinson
> <andrew.h...@gmail.com <mailto:andrew.h...@gmail.com>> wrote:
>
> > The structure of a JPEG2000 file header provides an index
> into the data, so it's actually quite fast to read that, find out
> the specific location of the data you need, seek to that location,
> pull it out and send it along. So it doesn't necessarily need to
> pull the full file over the network.
> >
> > That would, however, depend on whether the network protocol
> supports this sort of activity (file seeking). NFS probably does;
> HTTP probably not, without some sort of magic?
>
> --
> -- You received this message because you are subscribed to the
> IIIF-Discuss Google group. To post to this group, send email to
> iiif-d...@googlegroups.com
> <mailto:iiif-d...@googlegroups.com>. To unsubscribe from this
> group, send email to iiif-discuss...@googlegroups.com
> <mailto:iiif-discuss%2Bunsu...@googlegroups.com>. For more
> options, visit this group at
> https://groups.google.com/d/forum/iiif-discuss?hl=en
> ---
> You received this message because you are subscribed to a topic in
> the Google Groups "IIIF Discuss" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/iiif-discuss/OOkBKT8P3Y4/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> iiif-discuss...@googlegroups.com
> <mailto:iiif-discuss%2Bunsu...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
>
> --
> -- You received this message because you are subscribed to the
> IIIF-Discuss Google group. To post to this group, send email to
> iiif-d...@googlegroups.com. To unsubscribe from this group, send
> email to iiif-discuss...@googlegroups.com. For more options,
> visit this group at https://groups.google.com/d/forum/iiif-discuss?hl=en
> ---
> You received this message because you are subscribed to the Google
> Groups "IIIF Discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to iiif-discuss...@googlegroups.com
> <mailto:iiif-discuss...@googlegroups.com>.
After a less-quick test with larger images and more fine-tuned
parameters, it seems like JPEG-encoded PTIFFs (90% quality) are roughly
the same size (sometimes smaller, sometimes larger)as JP2.
-rwxrwx--- 1 root vboxsf 6955828 Jan 3 14:10 gm_00000201.tif
-rwxrwx--- 1 root vboxsf 10568044 Jan 3 14:10 gm_00001201.tif
-rwxrwx--- 1 root vboxsf 6617182 Jan 3 14:10 gm_00002501.tif
-rwxrwx--- 1 root vboxsf 33159356 Jan 3 14:10 gm_00002701.tif
-rwxrwx--- 1 root vboxsf 4689272 Jan 3 14:10 gm_00004301.tif
-rwxrwx--- 1 root vboxsf 5838922 Jan 3 14:10 gm_00007801.tif
-rwxrwx--- 1 root vboxsf 5013732 Jan 3 14:10 gm_00008801.tif
Encoding time (2 threads, apparently not fully used):
real 0m32.424s
user 0m10.312s
sys 0m10.417s
-rwxrwx--- 1 root vboxsf 8940324 Jan 3 14:11 gm_00000201.jp2
-rwxrwx--- 1 root vboxsf 11588527 Jan 3 14:11 gm_00001201.jp2
-rwxrwx--- 1 root vboxsf 7080557 Jan 3 14:11 gm_00002501.jp2
-rwxrwx--- 1 root vboxsf 33520379 Jan 3 14:11 gm_00002701.jp2
-rwxrwx--- 1 root vboxsf 5151315 Jan 3 14:11 gm_00004301.jp2
-rwxrwx--- 1 root vboxsf 4975438 Jan 3 14:11 gm_00007801.jp2
-rwxrwx--- 1 root vboxsf 5519055 Jan 3 14:11 gm_00008801.jp2
Encoding time (2 threads):
real 0m21.882s
user 0m27.011s
sys 0m6.729s
I used a fork of Dave Beaudet's libvipsEncoding time (2 threads):
I used a fork of Dave Beaudet's libvips + tiffcp script to encode the
PTIFFs:
https://github.com/scossu/iipsrv/blob/master/imagescripts/tiff_to_pyramid.bash