TIFF vs. JPEG2000

Creighton Barrett

unread,

Mar 18, 2013, 1:04:03 PM3/18/13

to digital-...@googlegroups.com

Hi everyone,

I'm curious to hear whether you are using TIFF or JPEG2000 as a preservation scan in your digitization workflows. If you switched to JPEG2000, could you shed some light on the reasoning? What has your experience been? If you considered switching but stuck with TIFF, could you explain why? Did you perform tests on image quality, storage requirements, etc.?

Any info would be much appreciated!

Cheers,

Creighton Barrett
Dalhousie University Archives

Christie Peterson

unread,

Mar 18, 2013, 1:50:30 PM3/18/13

to digital-...@googlegroups.com

A recent blog post on the Library of Congress's Digital Preservation blog did the best job I've seen of laying out issues related to JPEG2000 and preservation (read the discussion in the comments, too): http://blogs.loc.gov/digitalpreservation/2013/01/is-jpeg-2000-a-preservation-risk/

Best,

Christie Peterson

--
You received this message because you are subscribed to the Google Groups "Digital Curation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digital-curati...@googlegroups.com.
To post to this group, send email to digital-...@googlegroups.com.
Visit this group at http://groups.google.com/group/digital-curation?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Creighton Barrett

unread,

Mar 18, 2013, 2:35:23 PM3/18/13

to digital-...@googlegroups.com

Thanks Christie, much appreciated. I didn't realize use of JPEG2000 was still so low. I would love to hear from people who are using it. Is anyone aware of efforts to introduce browser-support?

Cliff, Peter

unread,

Mar 18, 2013, 3:23:31 PM3/18/13

to digital-...@googlegroups.com

Perhaps a little old now, but you may find useful stuff on the Open Planets Foundation JP2 Working Group (now disbanded I believe):

http://wiki.opf-labs.org/display/JP2/Home

and also the Wellcome Library's blog:

http://www.jpeg2000wellcomelibrary.blogspot.co.uk/

again, last update 2011 but still has interesting things in it.

At the BL we have migrated TIFFs to JPEG2000s and I understand the reason for doing this is that the file sizes are significantly smaller but image degradation is not so bad.

As with all these things it will probably come down to your own use case - you have a load of things that can render TIFF? Keep the TIFF. You have a load of TIFFs and no significant processing power to convert to JP2 even if you wanted to? Keep the TIFF. You want to save disk space? Use JP2.

My feeling is that neither format is a particular preservation risk at the moment, but as Chris' post points out, currently support for JP2 is patchy at best - so we as a community either need to get involved with the tools, be prepared to pay or use something else!

I'm not aware of any efforts to introduce JP2 support to any of the major browsers.

--
Pete Cliff
Digital Preservation Technical Lead
The British Library

-----Original Message-----
From: digital-...@googlegroups.com on behalf of Creighton Barrett
Sent: Mon 18/03/2013 18:35
To: digital-...@googlegroups.com
Subject: Re: [digital-curation] TIFF vs. JPEG2000

Thanks Christie, much appreciated. I didn't realize use of JPEG2000 was still so low. I would love to hear from people who are using it. Is anyone aware of efforts to introduce browser-support?

On 18 March 2013 14:50, Christie Peterson <christie....@gmail.com> wrote:

        A recent blog post on the Library of Congress's Digital Preservation blog did the best job I've seen of laying out issues related to JPEG2000 and preservation (read the discussion in the comments, too): http://blogs.loc.gov/digitalpreservation/2013/01/is-jpeg-2000-a-preservation-risk/

        Best,

        Christie Peterson

        On Mon, Mar 18, 2013 at 1:04 PM, Creighton Barrett <csba...@gmail.com> wrote:


                Hi everyone,

                I'm curious to hear whether you are using TIFF or JPEG2000 as a preservation scan in your digitization workflows. If you switched to JPEG2000, could you shed some light on the reasoning? What has your experience been? If you considered switching but stuck with TIFF, could you explain why? Did you perform tests on image quality, storage requirements, etc.?

                Any info would be much appreciated!

                Cheers,

                Creighton Barrett
                Dalhousie University Archives







                --
                You received this message because you are subscribed to the Google Groups "Digital Curation" group.

To unsubscribe from this group and stop receiving emails from it, send an email to digital-curati...@googlegroups.com <mailto:digital-curation%2Bunsu...@googlegroups.com> .

                To post to this group, send email to digital-...@googlegroups.com.
                Visit this group at http://groups.google.com/group/digital-curation?hl=en.
                For more options, visit https://groups.google.com/groups/opt_out.







        --
        You received this message because you are subscribed to the Google Groups "Digital Curation" group.

To unsubscribe from this group and stop receiving emails from it, send an email to digital-curati...@googlegroups.com <mailto:digital-curation%2Bunsu...@googlegroups.com> .

Kevin S. Clarke

unread,

Mar 18, 2013, 3:24:58 PM3/18/13

to digital-...@googlegroups.com

On Mon, Mar 18, 2013 at 2:35 PM, Creighton Barrett <csba...@gmail.com> wrote:
> Thanks Christie, much appreciated. I didn't realize use of JPEG2000 was
> still so low. I would love to hear from people who are using it. Is anyone
> aware of efforts to introduce browser-support?

Are you only interested in places using it as the preservation format?
The places I know using it are using it as the access format (with an
image server) but they store TIFF as the preservation copy.

Kevin

Chris Adams

unread,

Mar 18, 2013, 3:33:31 PM3/18/13

to digital-...@googlegroups.com

As the author of that post, I'm glad to hear that you found it interesting and would definitely be interested in hearing others’ opinions. I think Peter summed it up nicely: if we want the space savings & interesting access options, we need to back that with development resources.

Chris

Edward M. Corrado

unread,

Mar 18, 2013, 3:41:32 PM3/18/13

to digital-...@googlegroups.com

We looked into this here. We are using JPEG2000 as a preservation
format when we acquire digital objects as JPEG2000 files. In other
words, we do not convert JPEG2000 to TIFF or any other format (or
require depositors to do the conversion before providing images to
us). Really, I think a lot of the issues about not using JPEG2000 is
FUD, although I recognize there are some real issues. I agree with
Peter Cliff when he says he feels "that neither [JPEG2000 or TIFF] is
a particular preservation risk at the moment." If JPEG2000 becomes an
actual risk, I believe we will have more than enough time to migrate
the images to something else in the future and see no reason for us to
do it at this time. If that time ever comes, we may have better
options than TIFF.

All that said, we have not gone as far as using JPEG2000 when we
perform our own digitization. We are still using TIFF. We review this
periodically and the next time we do, we may consider changing.

Edward

adam brin

unread,

Mar 18, 2013, 3:52:45 PM3/18/13

to digital-...@googlegroups.com

David Lowe @ UConn has done a bunch of work in this area:

http://digitalcommons.uconn.edu/libr_pubs/19/

http://www.digitizationguidelines.gov/still-image/documents/Bennett_Lowe.pdf

We're strongly considering it at the moment for the space savings alone.

Michael J. Giarlo

unread,

Mar 18, 2013, 4:38:34 PM3/18/13

to digital-...@googlegroups.com, Edward M. Corrado

I'm not sure that dismissing JPEG2000 criticism as "FUD," without
substantiating it, does much to further the discussion. Especially when
folks *have* put forward thoughtful arguments against it as a
preservation format. I'd be interested in hearing which of the points
in the LC blog post that was linked earlier ring particularly FUDdy in
your ears.

-Mike

Edward M. Corrado

unread,

Mar 18, 2013, 5:52:53 PM3/18/13

to Michael J. Giarlo, digital-...@googlegroups.com

A short answer:

1. The patent and other legal issues people raise seem to be extremely
overblown.

2. I disagree with the post that JPEG2000 is not widely used. True, it
is not as used as much as other image formats directly on the Web, but
that is a chicken-and-egg issue. However, it is still used (and I
suspect it might be used more than some (non-image) formats that
people think are preservation worthy although I don't have numbers). I
feel it passes the ubiquity test. Of course, this is a personal
decision about how much use makes it common, and you are free to
disagree.

3. Many, if not most major open source image programs support it, as
do many proprietary ones.

3. There are both Open Source and proprietary libraries readily
available. OpenJPEG, for instance is under active development and
version 2.0 was released just this past November and code was added to
trunk as recently as yesterday.

3. JPEG2000 is supported by a number of web browsers including Konquer
and Safari and by Quiktime within other browsers. See the bugzillia
thread linked to in the LC blog post for more details.

4. People can, and have, converted files from TIFF to JPEG2000 and
back without any loss.

Edward

Chris Adams

unread,

Mar 18, 2013, 7:14:58 PM3/18/13

to digital-...@googlegroups.com

On Mon, Mar 18, 2013 at 5:52 PM, Edward M. Corrado <ecor...@ecorrado.us> wrote:

2. I disagree with the post that JPEG2000 is not widely used. True, it

is not as used as much as other image formats directly on the Web, but
that is a chicken-and-egg issue.

The core of my argument is that a format which is not commonly supported on the web effectively doesn't exist for a large a large number of people. There's a particularly poignant comment on a previous LC post (http://blogs.loc.gov/digitalpreservation/2011/06/a-fine-view-at-the-summit-of-jp2/) which represented the problem nicely:

“After spending literally hours trying to find some way to view or convert the JP2 files from the LoC’s map collection, I have to surrender and assume the Library of Congress doesn’t want me to have access to any of these images. Why on Earth would an organization whose mission is to make information available to the public adopt an esoteric file format that requires third-party applications and plugins to view?”

I'm speaking unofficially of course but it should be obvious this is not what LC would want. It's also a given that had the images been PNG files the question would never have arisen. This is what I meant by wide support: right now, there are significant numbers of JP2 files in use around the world but they're concentrated in a few communities (libraries, medical imaging, etc.) which means most tool builders don't make JP2 a priority unless a significant percentage of their users are in those spaces. Toss in things like Adobe removing support for JP2 from Photoshop Elements and it's reasonable to wonder how likely it is that future users will need to buy more expensive, specialist-oriented software simply to view a file.

I'd also like to note that I wanted JP2 to succeed for various reasons: I was hoping we'd see widespread support ages ago, when the file size savings were even more important. Years back, I wrote an export plugin for Apple's Aperture so I could publish higher-quality web galleries but the better part of a decade later it's still not feasible unless your audience is entirely comprised of people using Apple hardware. Right now most of the web community is looking for a way to manage images of varying resolutions as screen sizes and densities continue to expand – this would be a perfect use for JP2's progressive decoding but the support is simply not there.

3. Many, if not most major open source image programs support it, as
do many proprietary ones.

This point was specifically addressed in my post: while many open source projects have some level of JP2, almost all of them are using jasper, frequently without much direct intention because they're using a library like GraphicsMagick or ImageMagick to perform the actual image processing. This means that their JP2 support is very slow – which seriously discourages users (including digital image specialists who might appreciate some of the format's technical details but are pressed for time – and that there are valid .jp2 files which Jasper cannot open.

The other reason for concern is that while many programs can technically open a .jp2 file this isn't really reliable unless the community developing and using a program regularly exercise that functionality. That means there are possible compliance bugs and other areas of concern – proper color management, metadata handling, etc. which might be silently causing problems until someone notices.

I'll present one example of why I'm concerned: Debian and Ubuntu Linux had a bug for years which prevented any application not running as root from opening a .jp2 file using libjasper:

https://bugs.launchpad.net/ubuntu/+source/jasper/+bug/620633

Despite a 100% failure rate for all users and all sane server applications, that bug has two people watching it. As an erstwhile open-source JPEG 2000 user that's disturbing, particularly because there are likely to be more subtle bugs which I haven't encountered yet.

3. There are both Open Source and proprietary libraries readily
available. OpenJPEG, for instance is under active development and
version 2.0 was released just this past November and code was added to
trunk as recently as yesterday.

OpenJPEG was specifically recommended in the post. Speaking strictly for myself, I would offer the even stronger argument that anyone investing in a large library of JP2 files would do well to make a proportionate investment helping the OpenJPEG project (testing all of the many format variations being a key area) and helping popular image manipulation tools – particularly ImageMagick and GraphicsMagick – make the switch.

3. JPEG2000 is supported by a number of web browsers including Konquer
and Safari and by Quiktime within other browsers. See the bugzillia
thread linked to in the LC blog post for more details.

More precisely, JPEG 2000 is not supported by any browser but WebKit browsers on OS X and anything else which uses CoreImage will enjoy a high-quality implementation (Apple licenses Kakadu, if I recall correctly) while applications like Firefox or Google Chrome which use their own portable imaging libraries will not and, of course, this doesn't help anyone on Windows or Linux. Even on OS X, browser support is not competitive with the high-performance JPEG or PNG implementations – a JP2 file may open on an iPad but it won't progressively render and it's noticeably slower.

Chris

Ferran Jorba

unread,

Mar 19, 2013, 3:57:49 AM3/19/13

to digital-...@googlegroups.com

Hi,

after reading for long about JPEG2000 troubles, and suffering huge TIFF
files and serious budget cuts, we are seriously considering migrating
those TIFF to plain JPEGs.

Yes, we know that JPEG means loosy compression. But if it is good
enough for born digital pictures, why cannot it be good for reading old
books, and newspapers, where the original print quality was even lower
than current digitalization quality? It has 100% browser support,
native to all kind of digital gadgets, huge amount of tools, and,
according to our conversions (using ImageMagick default values), 10% the
size of the original TIFF files.

What are we really loosing? Texts pages are crisp, printed pictures are
clear, there are tools to lossles rotate (http://jpegclub.org/jpegtran/)
or all kind enhancements with ImageMagick and other alternatives.
Documents and its information is prefectly kept. And, as David
Rosenthal always stressess, the big preservation issue is economic, not
technical.

10% the size is a strong economic argument, isn't it?

Best regards,

Ferran Jorba
Universitat Autònoma de Barcelona

adam brin

unread,

Mar 19, 2013, 12:32:47 AM3/19/13

to digital-...@googlegroups.com

Hi Chris,

I might frame the question a few different ways…

If we return to OAIS, we have the SIP, the DIP, and the AIP: the original version of the file, a derivative or accessibility version, and the archival version. Does the archival version have to be as accessible as the derivative version? It may be that TIFF, ODT, or PDF/A in 200 years are perfectly reasonable archival formats, but aren't at all accessible. With that said, they were or may have been popular in their time, and are open and viable archival formats. Is it okay if an archival format is less accessible?
I think the attraction of JPEG2000 is particularly strong because, for example, we have thousands of JPEGs and on average we see between 10 and 100 times the filesize when stored as a TIFF than as a JPEG, and a much smaller ratio when stored as a JPEG2000. (that's likely many TB of archive storage). To put the question a different way… over a 30 year period, is it worth the extra effort to deal with the JPEG2000 file than the overall cost of decoding the JPEG2000, and remember we still have the JPEG. While costs of space have come down significantly, they are still likely save staffing one of the higher costs we have when you factor in backups and other infrastructure dependencies.

I don't think there's a right answer here, and I'm sorry if I'm taking things down a tangent, but I think the questions and issues are a bit broader than you've laid out. I'm really curious what others have to say though too.

Best,

Adam Brin

Director of Technology

Digital Antiquity

http://www.digitalantiquity.org

Cliff, Peter

unread,

Mar 19, 2013, 9:21:46 AM3/19/13

to digital-...@googlegroups.com

Just wanted to add, Oracle long ago dropped the Java Advanced Imaging API and with it JPEG2000 support in Java. In theory this was adopted as a java.net project, but the encoder/decoder has not been touched for some time.

An alternative, jj2000, is also seemingly without a home:

http://jpeg2000.epfl.ch/ (Doesn’t work)

And the fork at:

http://code.google.com/p/jj2000/

Seems just as defunct (check the issues), albeit accessible.

An active imaging library for Java, Apache Commons Imaging (http://commons.apache.org/proper/commons-imaging/) (formerly Sanselan) does not (currently) support JPEG2000 and I don’t see any feature requests to make do so.

I’m not coming to any conclusions here, but pointed out another example where major vendors are failing to see the need for JPEG2000.

Pete Cliff

Digital Preservation Technical Lead

From: digital-...@googlegroups.com [mailto:digital-...@googlegroups.com] On Behalf Of Chris Adams
Sent: 18 March 2013 23:15
To: digital-...@googlegroups.com
Subject: Re: [digital-curation] TIFF vs. JPEG2000

On Mon, Mar 18, 2013 at 5:52 PM, Edward M. Corrado <ecor...@ecorrado.us> wrote:

--

Chris Adams

unread,

Mar 19, 2013, 10:21:42 AM3/19/13

to digital-...@googlegroups.com

That is a useful perspective to consider. I strongly agree with considering the space savings: there's a great appeal if you can apply the savings from storage costs towards expanding your collection.

I would also agree that the savings for an archival copy might be compelling. The concern I have is that while it's certainly possible to maintain multiple file formats, there is a cost to managing the different copies and the more different formats we use the less full advantage we're taking of the JP2 space savings. I'd like that cost/benefit tradeoff to be more compelling, particularly since there are technical areas where the wins could be huge (i.e. using JP2 masters with a tiled decoder allows you to treat thumbnails, etc. as on-demand derivatives rather than files which need to be managed).

My main concern is that the other formats we're trying to preserve for long periods of time (e.g. TIFF, PDF/A) are far more widely used so the community supporting the software is much larger than just the greater library/archive community. Peter's note about the Java imaging libraries quietly fading is exactly the kind of thing I'm worried about and it's the kind of thing which we could avoid with a relatively modest investment of time.

Chris

Randy Fischer

unread,

Mar 19, 2013, 10:35:32 AM3/19/13

to digital-...@googlegroups.com

On Tue, Mar 19, 2013 at 12:32 AM, adam brin <ad...@brin.org> wrote:

> I think the attraction of JPEG2000 is particularly strong because, for
> example, we have thousands of JPEGs and on average we see between 10 and
> 100 times the filesize when stored as a TIFF than as a JPEG, and a much
> smaller ratio when stored as a JPEG2000. (that's likely many TB of archive
> storage). To put the question a different way… over a 30 year period, is it
> worth the extra effort to deal with the JPEG2000 file than the overall cost
> of decoding the JPEG2000, and remember we still have the JPEG. While costs
> of space have come down significantly, they are still likely save staffing
> one of the higher costs we have when you factor in backups and other
> infrastructure dependencies.

I'm curious about this: typically preservationists shy away from
using compressed TIFFs, so I suspect that's part of the reason for the
1-2 orders of magnitude difference. If that's the case, I think you
would be better off using TIFF with lossless compression than JP2K.

As far as costs, scaling of staffing and infrastructure effort is the
one big idea that are cloud-based storage systems. It does require a
certain amount of devop knowledge from us end-users, to be sure.

-Randy Fischer

Simon Spero

unread,

Mar 19, 2013, 4:53:46 PM3/19/13

to digital-...@googlegroups.com

On Tue, Mar 19, 2013 at 10:35 AM, Randy Fischer <randy....@gmail.com> wrote:

I'm curious about this: typically preservationists shy away from using compressed TIFFs, so I suspect that's part of the reason for the 1-2 orders of magnitude difference. If that's the case, I think you
would be better off using TIFF with lossless compression than JP2K.

TIFF compression is generally lossless.

snarkive:ses$ compare -metric RMSE CRW_4237_tiff_8_compressed.tif CRW_4237_tiff_8_uncompressed.tif foo.png

0 (0)

The reason why TIFF with LZW compression has been disfavoured in the past is that the algorithm, LZW, was covered by a Unisys patent until the middle of 2004.

A reason to disfavor it in the present is that does not get very good compression.

Here are some results using a processed Canon Raw image taken using an EOS-10D, with Raw processing by DXO Optics Pro 8.

Conversions and comparisons handled using ImageMagick

First some sizes:

-rw-r--r--@ 1 ses staff 18M Mar 19 14:52 CRW_4237_tiff_8_uncompressed.tif

-rw-r--r--@ 1 ses staff 9.4M Mar 19 14:53 CRW_4237_tiff_8_compressed.tif

-rw-r--r-- 1 ses staff 8.2M Mar 19 14:29 CRW_4237-0.png

-rw-r--r--@ 1 ses staff 6.1M Mar 19 14:03 CRW_4237_quality_100-0.jp2

This shows shows that the lossless JP2 is about 65% of  the size of lossless compressed TIFF, and about 75% of lossless PNG. To confirm that these formats are lossless:

$ compare -metric RMSE compressed.tif CRW_4237-0.png = 0 (0)

$ compare -metric RMSE compressed.tif CRW_4237_quality_100-0.jp2 = 0 (0)

When we start using compression, we need to consider the loss of quality vs. size saving.

In the table below, I use uncompressed tiff as the baseline for compression ratios; however, a more meaningful measure is comparing files to lossless jp2.

That gives a compression ratio for JPEG quality 90 of about 4.5:1, which isn't as automatic a win. JP2 at quality 75 has about the same error, but still only gives an improvement of 5.5:1 over uncompressed tiff.

[Increasing order of RMS errors relative to TIFF, JPEG in bold ].

Ratio of size is relative to uncompressed TIFF.

$compare ... CRW_4237_quality_100-0.jp2 = 0.000 (0.00000000) [6.1M] 3:1

$compare ... CRW_4237_quality_95.jp2 = 163.646 (0.00249708) [4.5M] 4:1

$compare ... CRW_4237_quality_90-0.jp2 = 247.683 (0.00377941) [2.9M] 6:1

$compare ... CRW_4237_quality_80.jp2 = 405.8 (0.00619211) [1.5M] 12:1

$compare ... CRW_4237_quality_75.jp2 = 457.959 (0.006988) [1.1M] 16:1

$compare  ... CRW_4237_jpg_90.jpg =        459.806 (0.00701619) [1.3M] 14:1

$compare  ... CRW_4237_quality_70-0.jp2 =  508.657 (0.0077616)  [911K] 20:1

$compare  ... CRW_4237_quality_60-0.jp2 =  628.744 (0.00959402) [610K] 30:1
$compare  ... CRW_4237_JPG_70.jpg =        672.716 (0.010265)   [694K] 26:1

$compare  ... CRW_4237_quality_50-0.jp2 =  739.68  (0.0112868)  [437K] 41:1
$compare  ... CRW_4237_quality_40.jp2 =    855.193 (0.0130494)  [328K] 55:1
$compare  ... CRW_4237_JPG_50.jpg =        879.356 (0.0134181)  [434K] 41:1

Note that measurements for different image types will give different results (e.g. line art or b/w), but does not typically excel on images of these kinds.

Anyone got some hi-res scans of newspapers?

Simon

Ferran Jorba

unread,

Mar 21, 2013, 5:02:40 AM3/21/13

to digital-...@googlegroups.com

Hello Jacob,

we have more than 670,000 unique TIFF files that occupy more than 15 TB.
We store two copies of them each one in a 25 TB Nexsan Satabeast. The
yearly maintenance of those Satabeasts is now out of our reach, with the
current budget cuts. If we convert (in fact, we have almost finished,
using ImageMagick default values) them into same-quality JPGs, we would
use less than 2 TB space, that we could easily host in corporate
servers, and duplicate somewhere else if needed.

My question stands: what are we _really_ loosing? The data lost with
the loosy JPEG algorithm doesn't affect the _information_ stored in
those files: the text and the images. Why does the library community
insist on those huge files if, for the same storage money, have 10 times
more documents? (I know that I have done a simplification here.)
Any person needing to work on the original high-resolution files
appreciates if them are in JPEG, and not in TIFF.

Who are we working for?

Ferran Jorba
Universitat Autònoma de Barcelona

jjn <jna...@gmail.com> wrote:
>
> Hello Ferran - I'm curious how many TB of image files you're
> maintaining and where the costs are piling up. My own institution
> operates on razor-thin margins, so I'm sympathetic to the cost
> concerns, but even cutting 90% of my storage 3TB bill wouldn't
> generate enough savings to do anything else really useful - we
> couldn't hire a programmer or buy much of anything else, for instance,
> and to the extent we're just trying to watch the bottom line, it's not
> a significant enough savings to bother with the risks and conversion
> process. If its'a question of storage and computation across a large
> set of images, though, then I can see where the JPGs might make a case
> for themselves.
>
> --Jacob Nadal

Ferran Jorba

unread,

Mar 21, 2013, 5:26:58 AM3/21/13

to digital-...@googlegroups.com

Hello Simon,

[...]

> Anyone got some hi-res scans of newspapers?

In our 300x300 full color scans of newspapers we have found that,
against common expectations, JPEG compresses much better than JPEG. I
attach and old chart with the comparison that I did for an internal
presentation. JPEG is, for this kind of documents, 10 times smaller
than uncompressed TIFFs, while PNG 75%. Even for grayscale documents
JPEG compresses very well, without any noticeable quality loss. Using
ImageMagick, at that time I wasn't able to create any JPEG2000 version
of my files, after many trials. Finally, I gave up.

How can we trust our collective memories and documents to something so
esoteric and minoritary?

tiff-jpg-png.png

Nathan Tallman

unread,

Mar 21, 2013, 9:46:53 AM3/21/13

to digital-...@googlegroups.com

I think the reason TIFFs or lossless compression are preferred is because of archival concerns. If you are scanning for preservation, that is a true digital facsimile, the goal is to create an archival representation as true as possible to the original. Lossless compression is throwing away part of the original. Even if it's not detectable by the average human eye, the loss is there. It's no longer a true archival representation, but an access derivative or mezzanine file.

If your reason for digitizing is purely access, and you are keeping the originals, I think most people wouldn't argue to keep them in TIFF. You might have to rescan some things latter when a user wants a 600-ppi uncompressed version for their publisher, but if your goal is purely access, JPEG is great. The originals are the preservation copy and if they're in good shape and under climate control, etc., they should still be around in 100 years without having to worry about the digital preservation aspects or huge datastores to backup. True, in a perfect world you might keep those TIFFs for convenience; who wants to rescan, especially if it's a big job. But, we don't live in a perfect world and decisions have to be made.

I understand your point about budget and resource limitations. My institution is facing a similar situation. One wants to the best, most archival, thing. But, when faced with the realities of budget and staff cuts, as well as technical resources, institutions need to do the best they can, even if it's not the industry standard. It's easy to say "storage is cheap! keep it all!", but it's not just the price of storage, it's having the staff, the bandwith, the backup, the hardware, etc. Not every institution has an IT department at their disposal, and even if they do, it may be shared by 37 other departments, all competing for IT resources.

Nathan

Ferran Jorba

unread,

Mar 21, 2013, 10:43:07 AM3/21/13

to digital-...@googlegroups.com

Hello Nathan,

but, again, _what_ are we preserving? The _information_ stored in that
paper, in high quality (say, 300x300, full color, good enough for
anybody interested in the information stored there) or the state that
that particular paper copy had at that time as seen by this optical
capture device?

The information that you say that is lost in a loosy compression has
been already lost if you are interested in the kind of ink or the paper
fiber or the dust stored there. It's gone already! If a particular
researcher needs to study the kind of paper used for this document, he
or she will need the original paper anyway.

Does the cost of this storage, backup, electricity, etc., justify the
extra details that are lost in a typical JPEG loosy compression that, as
everybody agrees, are beyond the average human eye? According to the
output of ImageMagick identify command or JHOVE, the difference between
the original TIFF or the JPEG is negligible.

I think it is disproportionate. Even the best artistic pictures are
stored in JPEG. Why do we need more detail for text (old newspaper)
scans? The images in those old pressess had much, much less detail.

Ferran

Jim Safley

unread,

Mar 21, 2013, 11:16:43 AM3/21/13

to digital-...@googlegroups.com

Ferran,

High quality images preserve artifacts that can be lost during lossy
compression. This is especially true with manuscripts. I've come to
appreciate this in my work for the Papers of the War Department. [1]
We house thousands of hand-written scanned documents, and if we didn't
preserve high quality images we'd lose potentially important artifacts
such as watermarks, embossing, fingerprints, light writing, pen
impressions, smudges, stains, and ink bleed-through.

Jim

[1] http://wardepartmentpapers.org/

Ferran Jorba

unread,

Mar 21, 2013, 11:51:05 AM3/21/13

to digital-...@googlegroups.com

Hello Jim,

JPEG loosy compression looses what you won't notice. I do agree to
preserve high quality images, of course. But I'm questioning that there
is 10 times more information in a TIFF file than in a JPEG one. Maybe
there is a tiny fraction more. Does this tiny fraction more information
more cost 10 times more? Again, we are talking about retrospective
digitalization, where the paper is there.

I did some experiments converting full color, grayscale and B/W TIFF
files into JPEGs some time ago and the results are eloquent. Tiny
sizes, no noticeable quality differences. You can check them yourself,
if interested:

http://ddd.uab.cat/record/59776

In our repository, we have been keeping 15 TB of high quality original
digitalisation scans (besides the public PDF version). We have been
asked a few times the original ones, but no one has specified that
wanted the TIFF version, even if we have told them. JPEG has proven
enough for all cases.

I suspect that this TIFF (or JPEG2000) requirement is more a
fear-and-doubt helmet that we, the library community, has agreed to wear
that few people, out of our corporate group, would agree about. If we
ask our user whether they prefer 10 times more documents in JPEG or 10
times less in TIFF, maybe we would have a few surprises. (Again, there
is a simplification here, I know, but the argument stands.)

Ferran

Creighton Barrett

unread,

Mar 21, 2013, 12:31:12 PM3/21/13

to digital-...@googlegroups.com

Very interesting, Ferran. When we get requests for high-res scans, it is usually for publication or distribution. Many users do not have technical specs, but we have had requests that include a specific resolution and TIFF format. It is hard to anticipate use requirements and who knows, the users of TIFF files may very well be rendering JPEGs on their own. But your point about the amount of lost information vs. the storage requirements is well taken, though I do think the fraction of information lost could end up being the most critical to someone like Jim or other manuscript/diplomatics researchers. Perhaps it is more a question of the end use goals and requirements?

Ben O'Steen

unread,

Mar 21, 2013, 1:29:03 PM3/21/13

to digital-...@googlegroups.com

I'm going to comment on a few technical things to do with the old JPG standard to add some additional food for thought, and add in a few comments on more general things.

JPG works by breaking up a pixelated image into cells of pixels (this is variable but 8x8 is common) and then, in short, tries to fit the details in the cell to something it can algorithmically generate, using a much smaller set of data. One of the aspects that is hurt the most is colour - there can be quite sizeable variations of colour across adjacent pixels. We tend not to notice these artefacts so much as modern JPG rendering takes advantage of how LCD/CRT display these images, and the compression algorithm tries to lose colour and detail differences that are (to its model) psychologically hard to notice. However, its algorithm isn't particularly good and there are better ways to do it. On the flip-side, due to the codec's age, very many software applications can render these files to screen.

The question of what the purpose of storing digital files is important. If it is a digitisation of in-print textbooks, where the content of the books is the important part, then that is one thing. If it is a set of manuscripts, where the slightest colour variations may lead to further discoveries, then the question of archival quality of images may lead to a different answer.

Back to the technical points, one thing that JPG is notorious for is its compression artefacts and how more noticeable artefacts are created by further processing and recompression. This effect might be important to consider for the longer term use of the images. The effect may not seem huge[1] to the naked eye but it adds up over operations rapidly and should be considered a potential form of digital 'rot' if there are any decompression-recompression steps to delivery (watermarking, highlighting, etc). Again, I agree that this should be considered on a case-by-case basis.

I believe that the effort that people are putting into archiving "lossless" formats is due to the overall cost of digitisation and I'm not talking about just straight money here. The effort required to select a set of items for digitisation, prepare them, handle logistics, and assign responsibility for these new digital items in an often *ahem* highly bureaucratic and committee-driven environment is not to be underestimated. Secondly, the items that are most valuable when digitised are often the oldest, the most fragile and most rare. You often only plan to digitise these sorts of items a single time as you may not get another shot at it. Consider DIAMM[2] - I've heard some truly depressing hints about the proportion of the physical items that have been digitised have since succumbed to theft, fire, rot or loss in the few years since this project started.

In conclusion, is there a straight-forward answer? No, it's generally more complicated than that :) JPG is ubiquitous, perfectly reasonable for non-manuscript items and saves space compared to TIFF and PNG. On the other hand, it is a lossy compression and certain uses may not be able to tolerate the loss of detail, detail which newer forms of compression - JPG2000 for example - are said to retain. As should be common in archival work, the decision is about what is permissible to be thrown away, rather than starting with the notion of keeping all of it, forever.

1 - http://www.impulseadventure.com/photo/jpeg-resaving.html

2 - http://blog.diamm.ac.uk/

jjn

unread,

Mar 21, 2013, 2:13:05 PM3/21/13

to digital-...@googlegroups.com, ferran...@uab.cat

Yes, the math is very different at 15TB than 3TB! It sounds like you've thought through the issues and if the JPEGs work for your community, then you've probably got your answer. I think you bring out a very important digital curation issue. The impact of a technically "lossy" JPEG on use may be more hypothetical than real. The image data needed for good reproduction of photos, maps, and artworks is a different case than printed text, for sure. JPEG compression affects the information if you treat the digital images as a visual resource, as you would for a digitized photograph, for example, but if the digital images are a step in text-capture leading to OCR, then compression is a much less important concern.

My own risk assessment would be to check to see about changes in color fidelity across different JPEG encoders (and decoders, though that should be less of a problem, or less of a controllable problem) and for images-of-text, to look into an edge-detection test of some kind, to see how much detail of the typefaces were being lost.

All that said, sounds like you've given this good thought and I'm glad you raise the issue of the real purposes for we we're creating and preserving digital assets It's better to preserve all the JPEGs in a financially sustainable way than none of the TIFFs due to budget cuts, to be sure!

-JJN

Simon Spero

unread,

Mar 21, 2013, 10:03:08 PM3/21/13

to digital-...@googlegroups.com

One 3TB drive is $140. 5 3TB drives are $700. Make it $1000 and you can RAID-6, though with two full copies, that's probably not necessary if you RAPTOR-Q.

Ferran Jorba

unread,

Mar 22, 2013, 5:48:19 AM3/22/13

to digital-...@googlegroups.com

Hello Creighton, Ben, Jacob and Jim,

I'm answering to all, as your answers have some common points, mainly:
(a) it is not the same preserving the high-resolution image of a printed
document than a unique (archival) one, (b) one should take into account
the purpose of the digitalisation, (c) what do we realy loose using a
loosy algorithm, and last but not least, (d) the economic impact of
choosing one or the other.

I think that the campaign that David Rosenthal has been pursuing in his
blog and talks in the last years against the myth of the digital
obsolescence risk (like, among so many others, in
http://blog.dshr.org/2007/04/format-obsolescence-scenarios.html, up to
the last one so far,
http://blog.dshr.org/2013/02/rothenberg-still-wrong.html) could partly
be taken to this loosy compression disaster scenario (sentence from this
last post):

"how can we use our limited resources to maximize the value delivered
to future readers?"

I'd like to know a real case of a
slightest-colour-variations-may-lead-to-further-discoveries (Ben's
answer) that goes beyond the theoretical case, and where the *format*
choosen for this file makes *the* difference. On printed material, the
difference between my copy or your copy of a 19th century newspaper is
irrellevant (given that they are in good enough conditions, of course).
The important is the what was published there, not if my copy is
slightly more yellow than yours. Again, I'm thinking as a librarian,
here not as an archivist.

I'm using paper copies as model: if a library gets a better copy of a
printed book (without unique marks, like valuable autographs, that is),
it replaces it, because the important think is that the readers have a
good copy. The individual characteristics or damages of this particular
paper book are not relevant for the users.

Some of the examples against JPEG, like Ben's jpeg resaving link are,
I'd say, misleading
(http://www.impulseadventure.com/photo/jpeg-resaving.html). The
original high-resolution copy has to be kept intact. If somebody
alters-recompress-watermarks-saves-recompress is doing a bad
manipulation of the file, as it is a bad manipulation of the paper to do
it with dirty hands. This doesn't go against paper (I can imagine the
headers: "Paper can get dirty, paper is a poor information medium!"),
but against the person that manipulates valuable material without
washing his or her hands. If we keep the original high-resolution JPEG
intact, we are safe against poor manipulations.

Instead, I'm facing webs, like Jim's War Departments Papers
(http://wardepartmentpapers.org/) where the papers have been scanned in
B/W or grayscale:

http://wardepartmentpapers.org/docimage.php?id=3958&docColID=4262

Maybe the high resolution of those scans are kept in TIFF, but *most* of
the data is already lost when it was decided to scan in grayscale
instead of full color. According to my experience, a full color JPEG is
smaller than a uncompressed grayscale TIFF. Do historians really prefer
those grayscale *because* they are in TIFF, instead of full color *even
though* they are in JPEG?

I think that we, as community, have to think twice, after all those
years, about our recommended practices, if we want to be useful to the
society and the future.

Thanks for your answers,

Andrew Woods

unread,

Mar 22, 2013, 10:37:43 AM3/22/13

to digital-...@googlegroups.com

Thanks Ferran and All, for the thought-provoking dialog.
One discussion point that seems to have been overlooked thus far is
the actual robustness of the various image formats in the face of bit
rot. The following article takes a systematic approach at analyzing
the resilience of JPEG, TIFF, and JPEG2000 to increasing percentages
of corrupted bits.
http://www.dlib.org/dlib/july08/buonora/07buonora.html
That is all to say that file size and data loss during digitization
are only two dimensions of consideration. Depending on the
preservation environment anticipated for the digital artifacts, format
robustness should be taken into consideration as well.
Andrew

Ferran Jorba

unread,

Mar 22, 2013, 2:31:11 PM3/22/13

to digital-...@googlegroups.com, awo...@duraspace.org

Hello Andrew,

bit rot consideration is critical if there is only a single copy of the
file. If there is more than one, and a checksum value somewhere to
choose the right one, and/or a parity file to correct the missing value,
it becames less critical. As a matter of fact, my library cannot
expect an external mirror of our high-quality TIFF files if they occupy
16 TB, but we can easily find an external site for 2-3 TB if they are
in JPEG. So, the bit-rot argument can be defeat easily, it becomes a
self-correcting problem.

But, anyway, at our university we have been keeping digital
documents (PDFs version 1.0) since 1996, that have been migrating from
server to server, and I've never found an unreadable file. Since the
last 6 years, we also keep the md5 and sha1 file of our 1,3 milion
files (counting TIFFs, the corresponding JPEGs and thousands of PDFs)
and, again, no checksum has ever failed. And, I can assure you, we
have not been using high-end equipment at all.

My two cents experience,

Ferran Jorba
Universitat Autònoma de Barcelona

El Fri, 22 Mar 2013 10:37:43 -0400
Andrew Woods <awo...@duraspace.org> escrigué:

Matt Schultz

unread,

Mar 22, 2013, 3:21:16 PM3/22/13

to digital-...@googlegroups.com

Ferran,

I've really enjoyed this thread. You've raised brave questions and have given everybody a good example of how an institution analyses a difficult problem in the context of their local resource constraints. Some of the best comments from folks I think have been the ones who have boiled these sorts of decisions down to use cases and local needs. Which is not to say that that is where the conversation should stop necessarily - the tension between local preferences and technical concerns probably warrants some more great expert comments.

The backdrop for all of your decision-making is important to me. As I've read it, you are an institution that suddenly finds yourself with an abundance of data and in need of making some quick decisions about how to continue supporting it. It is yet another reminder to me of how important it would be for all of us in the cultural community to start building toward some best practices literature aimed at the early stages of creation, selection and acquisition (of both born-digital and digitized content)--getting us all thinking more seriously about the consequences of the decisions we make at those early stages for the long-term stewardship of the data.

Who is my current Designated Community? Who are they likely to be in 5-10 years? Why am I creating all this metadata right now? What am I really going to do with it? How will this collection grow? What is my collection policy and how will this type of data likely accumulate in the next 5-10 years? Etc., etc. Digitization and Digital Preservation I think are too divorced from one another in this sense and yet often times conflated to the detriment of both ends of the lifecycle.

If anybody knows of such literature out there I would welcome the pointers.

In any case - I have found this discussion to be really useful so far. Thanks!

--
Matt Schultz
Program Manager
Educopia Institute, MetaArchive Cooperative
http://www.metaarchive.org
matt.s...@metaarchive.org
616-566-3204

jjn

unread,

Mar 22, 2013, 3:41:23 PM3/22/13

to digital-...@googlegroups.com, ferran...@uab.cat

This doesn't quite answer the color-fidelity issues raised, but this report may shed some better light on them: http://msc.mellon.org/research-reports/Direct%20Digital%20Capture%20of%20Cultural%20Heritage.pdf/view

24-bit RGB at 600-1200 DPI will show a lot, but will still not reveal IR, UV, or X spectra. I think it's the right choice for photos and artworks, but we shouldn't confuse it with specilized imaging to enable conservation science research.

In counterpoint, Ferran's argument for JPEGs as a good enough facsimile of printed text seem stronger to me, by comparison. He has a use-case and his digital assets meets it, after all.

Ferran Jorba

unread,

Mar 25, 2013, 2:51:10 AM3/25/13

to jjn, digital-...@googlegroups.com

Hello Jacob,

many years ago, a professional musicial I had the luck to know, told me
something I've never forgotten, because it impressed me so much: "When I
listen to music, often I'd wish I knew less about the technicalities of
music, because then I cannot enjoy it".

Many times I read about digital preservation papers and discussions, I
feel that going so much into the technical details misses our goal as
digital preservationists. Or digital librarians; we don't call
librarians "paper preservationists", do we?

Thanks,

Ferran

El Fri, 22 Mar 2013 12:41:23 -0700 (PDT)
jjn <jna...@gmail.com> escrigué:

Ferran Jorba

unread,

Mar 25, 2013, 2:53:49 AM3/25/13

to digital-...@googlegroups.com, matt.s...@metaarchive.org

Hello Matt,

thanks for your answer. I'd like to comment a little bit about the last
point you've made:

Digitization and Digital Preservation I think are too divorced from one
another in this sense and yet often times conflated to the detriment of
both ends of the lifecycle.

This obsessive discussion about the lossless characteristics of formats
like TIFF and JPEG2000 happens almost exclussively in the
ditigalization circles. And it is precisesly in digitalization where
the paper is (still) there, and in a significative number of times, in
many copies, as long as it is printed material. And, to bring even
more contradiction to the argument, so many times the capture was done
en grayscale or, even worse, in black and white, where the loosiness
whas perpetrated in the fist stage.

Myself, being an affectionate reader of old material, feel that black
and white is difficult to read, grayscale is depressing to read, and
color capture a joy to read. And the argument for not taking color
capture often is that color capture produce large files that are
expensive to keep, but insist on TIFF! Please, let's go down to
earth! I'm part of those circles, but being also part of the readers,
I'd like the community to balance more to the readers and less to the
specialists that are confusing us with too much technical details.
Paper librarians are not chemists, and they they have been doing
perserving paper documents job for centuries. They need to know about
humidity and temperature conditions, some weight calculation and lots
of common sense.

About the "preservation role" that often is associated to those formats,
again, I'd like to quote David Rosenthal:

This suggests that the idea of "preservation formats" as opposed to
"access formats" is a trap. Precisely because they aren't access
formats, preservation formats are less likely to have the strong open
source support that enables successful preservation.
http://blogs.loc.gov/digitalpreservation/2013/01/is-jpeg-2000-a-preservation-risk/#comment-11802

Now, I'm not thinking in my institution. Internet has no borders. I'm
sad that so many documents have been digitalized in poor legibility
conditions due to format fundamentalist reasons.

Thank you all for your contributions,

Ferran

El Fri, 22 Mar 2013 15:21:16 -0400
Matt Schultz <matt.s...@metaarchive.org> escrigué:

Declan Fleming

unread,

Mar 28, 2013, 2:47:31 PM3/28/13

to digital-...@googlegroups.com

Hi - really interesting discussion! This is a not a complete tangent, but photographers often have to make a similar decision whether to shoot RAW or JPG. RAW allows for a lot of post processing control to recover low light parts of an image, or to control what color ranges are brought out on the final image. This is because the data saved is what the camera actually captured (mostly) and not a firmware version of a lossy compression algorithm that happens on the fly. RAW is much bigger than JPG, very much analogous to the TIFF format.

Most experienced photographers I know shoot in RAW because we aren't always sure what the end goal will be with the image nor that we captured the scene perfectly, and RAW leaves the most options open for post processing. But, much as Ferran is positing about simple black and white doc capture, when I'm shooting an event without a tripod and with a flash, I'll shoot JPG because the images are not about art or high quality results. They are about conveying a specific moment in time - and the flash flattens out most of that I want out of the bit depth of a RAW image anyway. I can also quickly post the JPG with almost no processing, trusting that the algorithm will do an ok job.

I shoot mainly in RAW to keep my options open. I don't know how to metricize that in terms of cost for storage, but it feels worth it to me given the time investment in taking the picture and that I probably can't shoot that same sunset just the same way again.

Declan

Ed Fay

unread,

Apr 26, 2013, 5:38:02 AM4/26/13

to digital-...@googlegroups.com

No-one has mentioned OCR yet...

If these are textual documents and you are interested in the informational content rather than artefactual accuracy then does being able to extract the text in a machine-processable way matter?

I don't have references to hand to say whether the loss of detail through JPEG compression would adversely affect the OCR accuracy, but I'd certainly say it's a risk.

Also, if these are master files you are considering storing as JPEGs at their original pixel dimensions and resolution will you be generating smaller versions from these for presentation on the web? If so, then you would be double-compressing the files seen by most people as they access the content, leading to further loss of information compared to the originals.

We have a tiered approach to digitisation, where in some (extreme) cases we will keep a RAW file unaltered as it came from the capture device and a TIFF which will have undergone post-capture alteration (if the digitisation is contributing towards preservation of the original), while in most cases we keep only one (altered) version, and in others we simply preserve whatever original (born-digital) file we have been given.

For us, storage in terms of pure capacity is not our largest preservation concern. Staff costs and overheads contribute equally to our capacity planning costs in a non-linear way. We incur far higher costs to acquire and process digital content than to store it.

Ed

---
Ed Fay
Collection Digitisation Manager
Library, The London School of Economics and Political Science
10 Portugal Street, London WC2A 2HD

E....@lse.ac.uk
0207 955 7235
http://www.library.lse.ac.uk/
http://www.twitter.com/digitalfay

Tito Robe

unread,

Apr 30, 2014, 10:29:28 PM4/30/14

to digital-...@googlegroups.com

About convert from TIFF to JPEG2000 in Java

Hello everyone, interesting thread, but no one has mentioned if is possible convert from each other, for example right now I need to convert from TIFF to JPEG2000 in Java, but I haven't been able to find any documentation about that, even in this site.. where you have talked so long about that subject. If someone has done something like that.. I would thank a lot any help..

Tito Robe

unread,

Apr 30, 2014, 10:37:41 PM4/30/14

to digital-...@googlegroups.com, ferran...@uab.cat

"I did some experiments converting full color, grayscale and B/W TIFF

files into JPEGs some time ago and the results are eloquent. "

Hi Ferran, could you please tell me which libraries or algorithms did you used for that conversion, right now I need to do something like that in Java, but I can't find documentation about how do it in any place. Please I would thank you a lot any orientation with that. Thanks.

Kevin Hawkins

unread,

May 1, 2014, 11:15:57 AM5/1/14

to digital-...@googlegroups.com

I recommend finding a way to call ImageMagick from your Java application. —Kevin

On 4/30/14 9:29 PM, Tito Robe wrote:

About convert from TIFF to JPEG2000 in Java

Hello everyone, interesting thread, but no one has mentioned if is possible convert from each other, for example right now I need to convert from TIFF to JPEG2000 in Java, but I haven't been able to find any documentation about that, even in this site.. where you have talked so long about that subject. If someone has done something like that.. I would thank a lot any help..

--
You received this message because you are subscribed to the Google Groups "Digital Curation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digital-curati...@googlegroups.com.
To post to this group, send email to digital-...@googlegroups.com.

Visit this group at http://groups.google.com/group/digital-curation.
For more options, visit https://groups.google.com/d/optout.

Tom Creighton

unread,

May 1, 2014, 5:14:02 PM5/1/14

to digital-...@googlegroups.com

We have two primary origins for still image ingest into our repository. One is from on-going camera capture stations all over the world. These stations today are purpose-built capture stations. Our software actually reads the CCD and produces the resulting bitmap file. Long ago we chose TIFF. I think this was primarily because at the time there was not a better choice. Also it matched the choice on our other origin - scanning more than 3.3 million rolls of microfilm. Our digital image processing system (DPS) takes these TIFF images and converts them to JPEG2000 for preservation as well as JPEG for dissemination. The JPEG images are at greatly reduced quality and size.

We chose JPEG2000 vs. staying with TIFF because it is widely supported and understood and we had access to acceptable software libraries. We believe the standard has longevity. Also, without losing quality, representing the images as JPEG2000 with lossless compression changes TIFF images that are on the order 10-12 MB each to 5-6 MB each for JPEG2000. We process about a million images per day and already have a repository storing in excess of 10 PB of images for one copy. The cost savings are significant.

In order to handle the volume of data, everything is automated from the capture/scan on to writing to tape and publishing to the Internet. For details on how we handle image processing:

1) We ensure image quality by comparing an in-memory bitmap produced by decoding the TIFF with an in-memory bitmap produced by decoding the JPEG2000. We do this on every image. Every one matches exactly.

2) When we begin to support color (the above is grey-scale) we probably will change the above comparison to some kind of histogram comparison with tolerances geared to target acceptable conversion.

3) Our code looks like this:

Encoding/decoding of TIFF images we use the open source library libtiff version 4.0.0 beta 7 (remotesensing.org/libtiff). We patched the source to speed up the LZW decoding. Probably should consider submitting the changes back.

Encoding/decoding JPEG2000 we use the commercial library kakadu version 6.4.1 (kakadusoftware.com)

Here is a snippet of our Java code demonstrating decoding of a TIFF and encoding lossless JPEG2000:

// Create the Image Plan

List<Operation> plan = new ArrayList<Operation>();

Operation op = new Operation(Operation.Type.DECODE);

op.addParameter(“inputimage”, “img0”);

op.addParameter(“outputimage”, “img0”);

plan.add(op);

op = new Operation(Operation.Type.ENCODE);

op.addParameter(“inputimage”, “img0”);

op.addParameter(“outputimage”, “img0”);

op.addParameter(“format”, “jp2”);

op.addParameter(“quality”, 100);

plan.add(op);

// Convert the image

byte[] img = // Read the input image into this byte array

ImageLibrary lib = ImageLibraryFactory.createImageLibrary();

int planId = lib.createOperationPlan(plan);

Map<String, ConvertResult> cnvResult = lib.convert(planId, img);

ConvertResult jp2 = cnvResult.get(“img0”);

// Lossless JP2 is found in byte[] in jp2.img

FileOutputStream fos = new FileOutputStream(xxxx);

fos.write(jp2.img);

fos.close();

// Cleanup

lib.deleteOperationPlan(planId);

Hope this is useful.

tc

Tom Creighton

CTO FamilySearch

Tito Robe

unread,

May 1, 2014, 9:59:13 PM5/1/14

to digital-...@googlegroups.com

Hi Tom, thank you so much by your help, that was just what I needed, I already was able to convert the images, but I am facing another problem, when I convert the image, I get a new one named filename.jp2, who has 4.82 KB (so is not empty), but I can't open it with the Windows Photo Viewer, even with the Paint, so I am wondering if is this normal, if I need an special image viewer to see .jp2 files, if so, Which can I use, and where can I find a free one.

Thank again for your time, I really appreciate your help.

You received this message because you are subscribed to a topic in the Google Groups "Digital Curation" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/digital-curation/jX1MELYvpKE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to digital-curati...@googlegroups.com.

Kevin Hawkins

unread,

May 1, 2014, 11:50:29 PM5/1/14

to digital-...@googlegroups.com

Tito, there are some non-proprietary programs listed at
http://en.wikipedia.org/wiki/JPEG2000#Applications . --Kevin

> <http://remotesensing.org/libtiff>). We patched the source to speed

> up the LZW decoding. Probably should consider submitting the
> changes back.
>
> Encoding/decoding JPEG2000 we use the commercial library kakadu

> version 6.4.1 (kakadusoftware.com <http://kakadusoftware.com>)

> <mailto:kevin.s...@ultraslavonic.info>> wrote:
>
> I recommend finding a way to call ImageMagick from your Java
> application. —Kevin
>
>
> On 4/30/14 9:29 PM, Tito Robe wrote:

>> * About convert from TIFF to JPEG2000 in Java*

>>
>> Hello everyone, interesting thread, but no one has
>> mentioned if is possible convert from each other, for
>> example right now I need to convert from TIFF to JPEG2000
>> in Java, but I haven't been able to find any documentation
>> about that, even in this site.. where you have talked so
>> long about that subject. If someone has done something
>> like that.. I would thank a lot any help..
>>
>>
>>
>>
>> --
>> You received this message because you are subscribed to the
>> Google Groups "Digital Curation" group.
>> To unsubscribe from this group and stop receiving emails from
>> it, send an email to
>> digital-curati...@googlegroups.com

>> <mailto:digital-curati...@googlegroups.com>.

>> To post to this group, send email to
>> digital-...@googlegroups.com

>> <mailto:digital-...@googlegroups.com>.

>> Visit this group at
>> http://groups.google.com/group/digital-curation.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the
> Google Groups "Digital Curation" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to
> digital-curati...@googlegroups.com

> <mailto:digital-curati...@googlegroups.com>.

> To post to this group, send email to
> digital-...@googlegroups.com

> <mailto:digital-...@googlegroups.com>.

> Visit this group at http://groups.google.com/group/digital-curation.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to a topic in
> the Google Groups "Digital Curation" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/digital-curation/jX1MELYvpKE/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> digital-curati...@googlegroups.com

> <mailto:digital-curati...@googlegroups.com>.

> To post to this group, send email to
> digital-...@googlegroups.com

> <mailto:digital-...@googlegroups.com>.

> Visit this group at http://groups.google.com/group/digital-curation.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Digital Curation" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to digital-curati...@googlegroups.com

> <mailto:digital-curati...@googlegroups.com>.

> To post to this group, send email to digital-...@googlegroups.com

> <mailto:digital-...@googlegroups.com>.

Tom Creighton

unread,

May 2, 2014, 12:12:37 AM5/2/14

to digital-...@googlegroups.com

Tito,

I replied to your direct message, but I'll repeat it here in case it's useful to others on the list.

I am not familiar with Windows Photo Viewer or Paint. I recommend downloading gimp. It's open source. Version 2.7 supports jp2. Also irfanview has a plugin that supports jp2. Irfanview is a little simpler than gimp because it's really just a viewer.

Good luck!

tc

Chris Adams

unread,

May 2, 2014, 7:43:03 AM5/2/14

to digital-...@googlegroups.com

I would definitely test using ImageMagick but know that the JPEG 2000 implementation uses JasPer, which is old and extremely slow. For production usage you are likely to want to use one of the commercial codecs or switch to the OpenJPEG Java wrapper.

For a really simple test, you could simply call the OpenJPEG image_to_j2k command-line utility to test the various encoding options for your application's needs:

http://www.openjpeg.org/index.php?menu=doc#encoder

Chris

Jon Stroop

unread,

May 2, 2014, 7:56:13 AM5/2/14

to digital-...@googlegroups.com

I would strongly recommend getting a newer version of OpenJPEG than is available in most software repos:

https://code.google.com/p/openjpeg/

Also, I believe that since version 6.8.8 ImageMagick is using OpenJPEG instead of Jasper, at least that's what I assume from the evidence in these threads:

http://imagemagick.net/discourse-server/viewtopic.php?t=25357&p=109914

http://imagemagick.org/discourse-server/viewtopic.php?f=3&t=25362

-Jon

Chris Adams

unread,

May 2, 2014, 11:47:12 AM5/2/14

to digital-...@googlegroups.com

Good catch - http://www.imagemagick.org/script/changelog.php shows that OpenJPEG support landed at the very end of 2013 so now it's just the slow process of waiting for it to percolate through the various distributions.

Chris

Carl Fleischhauer

unread,

May 4, 2014, 9:51:34 AM5/4/14

to digital-...@googlegroups.com

Warning: this is a _lengthy_ set of, um, musings, devoid of the helpful and specific technical notes (and even examples of command-line software instructions!) that characterize many of the TIFF-vs.-JPEG2000 comments that preceded this one. Skim-reading is advised.

I send my thanks to the many commentators--including my Library of Congress colleague Chris Adams--for this illuminating thread! I sometimes deliver a stump speech about the lack of consensus about digital preservation practices for video and motion picture film scanning. In the speech, I compare this lack of practice-maturity to the relatively more mature and settled state of digitization that employs still imaging or waveform sound formats.

In the face of this thread, however, my stump speech seems to be a bit of an overstatement. I should know better: I help out with the FADGI Still Image Working Group, and see first hand the still-evolving matter of imaging performance metrics (and related tools) and the slow implementation of fully realized color management in memory institutions. These are two more indicators of the not-yet-settled-ness of still image digitization. (I won't even mention the background topic of embedding metadata and the use of image-level identifiers. Yikes!)

Regarding JPEG 2000 in the still image realm, I echo Chris's frustration over the state of tools: it is maddening that more have not come into play, and Chris is also correct that we ought to collectively put more resources into the game, especially supporting the development of open source tools.

There is a bit of a chicken-and-egg to this, as one commentator noted: with more adoption, more tools would come along; with more tools, we'd have more adoption. Sigh. It is tempting to speculate about the underlying causes, which have nothing to do with image quality: everyone agrees that wavelet transforms (JPEG 2000) provide results that are visually superior to DCT applied to 8-pixel blocks (old JPEG). And I agree with the commentator who wrote that patents are also not a real issue.

Chris mentioned one yardstick for adoption: "Will it play in a browser?" There is a second yardstick that also counts for a lot in our age of ubiquitous digital photography and the social Web: "What formats are natively supported in the camera?" Camera specifications are the outcome of decision-making in Japan, supported by JEITA. The old JPEG format has proven to be good enough for most photographers (in contrast, the pros shoot raw) and JPEG's wide support in many applications leaves camera manufacturers in no mood to change.

A few years ago, I attended a professional photography conference at Microsoft that included strong advocacy for their then-new Windows photo format (subsequently called HD Photo and now standardized as JPEG XR). Like JPEG 2000, this Windows format was clearly technically superior to old JPEG. But the sense of the conference was that if it didn’t get into cameras as a native format, there was little hope of adoption. It is hard not to be reminded of the frustration we hear from our colleagues in professional sound recording (including Neil Young and his push for his Pono format): "How can people stand to listen to those awful MP3 files?" Yet MP3 is everywhere and it passes the "good enough" test for the masses.

It has been interesting to compare the relatively warm embrace that JPEG 2000 has received in moving image quarters with the cold shoulder we see for still images. The Digital Cinema Package (DCP) is _the_ widely adopted format (truly universal these days) for the theatrical distribution of motion pictures and it employs lossy JPEG 2000 compression. To be sure, this is a closed system in which the content creators have dictated the specification to theater operators, and projector manufacturers are required to have their systems tested for conformance. But the very high level of clarity in the lossy JPEG 2000 imagery--thanks again to wavelet transforms--was very important to the motion picture industry. The picture has to look good on the screen; in comparison, Internet-delivered MPEG is still pretty awful. For theaters, JPEG 2000 is unshakably in place.

We also see the adoption and use of JPEG 2000 elsewhere in professional moving image circles, ranging from digital moving image cameras to post-processing systems. Beginning in 2007, the Library of Congress began reformatting its holdings of older video recordings into a format that employs lossless JPEG 2000 compression, and we now have tens of thousands of digital video master recordings in this format, petabytes-worth. My sense is that there is a bit of open source development to support JPEG 200 with moving images but, as noted, the adoption has mostly been among professional users, and these are folks who are very accustomed to purchasing commercial products.

There are a couple of other considerations to throw into this idea marketplace. One came up at our last FADGI still image meeting and was echoed in the flow of comments in this Digital Curation thread. At the FADGI meeting, we heard from a representative from a federal agency that partners with commercial entities. One of their partners scans microfilmed census and other historical records using an all-JPEG-2000 workflow, producing millions of images every year, which are then given to the agency. I don't remember the details, but my impression is that the commercial partner produces JPEG 2000 master images with a modest level of lossy compression. That would be easy to believe: wavelet transforms in modest-lossy compression will retain the legibility even of handwriting. Some folks even argue that a lot of what is discarded in modest-lossy JPEG 2000 compression is noise, thereby (according to this account) improving the image quality. In any case, for these forms of content, neither the originals nor the microfilm masters are being discarded, so the anxieties about authenticity come down a notch.

Meanwhile, I have been told that one or more of the so-called book mass-digitization projects also use lossy-JPEG-2000 workflows. The critiques of those projects tend not to complain about technical image quality but rather about "housekeeping": missing pages, pages out of sequence, thumbs or fingers obscuring text, or blurring caused by a page flipping or moving when being scanned. When the thumbs are out of the shot, the JPEG 2000 imagery delivers a perfectly readable typographic page. And some presentation applications take advantage of JPEG 2000's tiling options to support zooming in the browser, as the Library does for, say, its map collections. (This happens under the covers since, as has been noted, JPEG 2000 imagery cannot be directly displayed in browsers.)

At our FADGI meeting, our colleague's remarks did not concern JPEG 2000 as an output for the agency's own image production; rather, the question had to do with the receipt and preservation of JPEG 2000 images delivered to the agency under the terms of the partner agreement, images that the agency can make publicly available in a few years time. This circumstance--JPEG 2000 images delivered by a partner--also arises in some libraries as a result of their partnering in mass book-digitization partnerships. The point is that some organizations have, ipso facto, large bodies of JPEG 2000 imagery that they need to be able to manage over the medium or long term. This is a just another reminder that many archives must support the long-term management of more than one digital format.

The topic of when and where to use JPEG 2000 at the Library of Congress for still image digitization comes up from time to time in informal conversations. (As noted, this is pretty much a settled matter for moving image materials.) Alas, our informal conversations have thus far produced little more consensus than what is manifest in this thread.

These informal conversations sometimes conflate a pair of topics, the second one being whether, where, and when to accept lossy compression for master files. Some have looked at classes or categories of material, for example (a) catalog cards, (b) widely held twentieth-century printed matter (think "congressional documents"), (c) maps, (d) manuscripts, and (e) photographs (including negatives). Reading from left to right, one can make the argument that the imaging stakes go up in a progressive manner. Might lossy compression be acceptable for the first one or two in the spectrum, and not for the others? The Library carried out a bit of an exploration of the lossy-vs-lossless question in the mid-1990s but the findings did not take root (http://memory.loc.gov/ammem/pictel/). That twenty-five-year-old project compared old JPEG to uncompressed images and, for certain classes of manuscript, lossy seemed acceptable. But the context was different. For example, we were then shy about color--we remembered that monochrome had always been accepted for preservation microfilms and we worried about big files. Today, I think we would likely embrace color for most manuscripts and, if we still accepted lossy, we would be glad to see the added clarity provided by JPEG 2000.

At the risk of forking this thread into a tangent, I would like to ask for experience narratives about the use of PNG, a format that Chris mentioned. For several years, due to inattentiveness on my part, I had relegated PNG to the use case of access-via-browser. This was due to the fact that the format was initially created as a reaction to the threats of licensing fees for GIF (another good-for-browsers format) in the 1990s. Last month, however, I re-read the W3C specification for PNG and found lots of nifty features, on paper at least. For example, there are specs for things that provide helpful support for color management, including a group of tagged color-space metadata elements document things like primary chromaticities and white point, image gamma, and carry an embedded ICC profile. In addition, as has been reported in this thread, PNG offers lossless compression with good results (albeit apparently a bit less efficient than JPEG 2000). Do any libraries or archives that use PNG as a mastering file? Have people found that tools support some of the features that caught my eye, like the ones that support color management?

Whew! Thanks for your patience with this long posting.

Carl Fleischhauer

Simon Spero

unread,

May 5, 2014, 4:43:51 PM5/5/14

to digital-...@googlegroups.com

I did some measurements comparing file sizes and signal loss for different lossless/lossy formats and setting, including PNG: see my message in this thread from Tue, 19 Mar 2013 16:53:46 -0400

--

Jay Gattuso

unread,

May 6, 2014, 3:47:36 PM5/6/14

to digital-...@googlegroups.com

"Anyone got some hi-res scans of newspapers?"

Hi Simon,

Do you still want some high-res scans of newspapers?

From our digital newspaper archive http://paperspast.natlib.govt.nz/ you can get to more high-res lossless png scans than you'll ever need.

We have some helper methods to help get to the content if this is still an active project / concern for you.

Simon Spero

unread,

May 6, 2014, 4:25:19 PM5/6/14

to digital-...@googlegroups.com

That was a poke at Ed Summers :-)

Reply all

Reply to author

Forward