Ingesting TIFF files

26 views
Skip to first unread message

Evelyn McLellan

unread,
Apr 2, 2010, 2:09:40 PM4/2/10
to archivematica
Hi everyone,

As you know, our default normalization path for raster images is
uncompressed TIFF 6.0 (see http://www.archivematica.org/wiki/index.php?title=Raster_images).
So all the gifs, jpegs, bmps etc. coming in are converted to that
format using ImageMagick. But what about incoming TIFF files? Should
we just skip over them?

The problem is that although we can always tell when a file is a TIFF
file, we don't always know what version it is or whether it's
compressed. Unfortunately right now we can't really rely on or format
identification/validation tools to give us that information. So I
think what we might want to consider doing is normalizing them to TIFF
using ImageMagick, even through they're already TIFFs, because
Imagemagick will uncompress them and update them to 6.0 if they're
compressed and/or older versions. This might me mean storing extra
TIFFs unnecessarily if the incoming TIFFs were already uncompressed
6.0s. But it might also be worth it because it's easy to do and avoids
accidentally preserving old compressed TIFFs.

Any thoughts on this?

Evelyn

Jordan, Paul

unread,
Apr 2, 2010, 2:13:14 PM4/2/10
to archiv...@googlegroups.com
I'd go with that, especially when you consider how long people have been using tiff. That's a lot of non-6.0 files out there. Besides, sooner or later someone is going to develop a tool that can accurately differentiate between the various types of tiff. Once that happens, it should be fairly straightforward to run that tool against your entire collection and weed out the duplicates. You'd probably want to put a note to that effect in the Archivematica tiff documentation, so people don't forget in the intervening years.

Paul
IMF

Hi everyone,

Any thoughts on this?

Evelyn

--
You received this message because you are subscribed to the Google Groups "archivematica" group.
To post to this group, send email to archiv...@googlegroups.com.
To unsubscribe from this group, send email to archivematic...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/archivematica?hl=en.

Evelyn McLellan

unread,
Apr 2, 2010, 2:24:57 PM4/2/10
to archivematica
Hi Paul! Thanks for the input. Yes, I agree that if this is our
thinking we should definitely put it in the preservation plan.

Evelyn

On Apr 2, 11:13 am, "Jordan, Paul" <PJor...@imf.org> wrote:
> I'd go with that, especially when you consider how long people have been using tiff. That's a lot of non-6.0 files out there. Besides, sooner or later someone is going to develop a tool that can accurately differentiate between the various types of tiff. Once that happens, it should be fairly straightforward to run that tool against your entire collection and weed out the duplicates. You'd probably want to put a note to that effect in the Archivematica tiff documentation, so people don't forget in the intervening years.
>
> Paul
> IMF
>
> -----Original Message-----
> From: archiv...@googlegroups.com [mailto:archiv...@googlegroups.com] On Behalf Of Evelyn McLellan
> Sent: Friday, April 02, 2010 2:10 PM
> To: archivematica
> Subject: [archivematica] Ingesting TIFF files
>
> Hi everyone,
>
> As you know, our default normalization path for raster images is

> uncompressed TIFF 6.0 (seehttp://www.archivematica.org/wiki/index.php?title=Raster_images).

Bigelow, Sue

unread,
Apr 2, 2010, 6:15:19 PM4/2/10
to archiv...@googlegroups.com
I guess we should be able to turn this off when we are ingesting known TIFFs, like if we ingest the product of our own digitization efforts, for instance.

Also or alternatively, would it be possible to run a special hash check at some point on the original TIFF vs the normalized TIFF? If they were the same value, then you'd know you'd just "normalized" an uncompressed TIFF 6.0 to an uncompressed TIFF 6.0, and you could delete the normalized version.

Because they are exactly the same hash when you run them through ImageMagick, right?

Sue Bigelow
Digital Conservator
City of Vancouver Archives
1150 Chestnut Street,
Vancouver, B.C. V6J 3J9
604.829.4271 Tel
604.736.0626 Fax

winmail.dat

eve...@artefactual.com

unread,
Apr 2, 2010, 6:42:16 PM4/2/10
to archiv...@googlegroups.com
Hi Sue,

Yes, it would be quite simple to have ImageMagick skip TIFFs when you knew
that you were ingesting uncompressed 6.0 TIFFs.

I was also wondering about recognizing identical TIFFs and deleting
normalized ones that are identical to the originals. Next week I'll do
some tests in ImageMagick and see if the results really are identical...it
sure would be nice if they were.

Evelyn

Bigelow, Sue

unread,
Apr 2, 2010, 7:04:00 PM4/2/10
to archiv...@googlegroups.com
It would be disturbing if they weren't.
winmail.dat

eve...@artefactual.com

unread,
Apr 2, 2010, 7:14:10 PM4/2/10
to archiv...@googlegroups.com
Not necessarily - I was thinking that the checksum could be entirely
different because of slight differences in metadata, such as the
timestamp, for example.

Bigelow, Sue

unread,
Apr 2, 2010, 7:59:41 PM4/2/10
to archiv...@googlegroups.com
Hmm. If we're running checksums to see if the 2 files are identical, is it possible to run the checksum of only the image data part of the file, and not the other parts (like header chunks) that would contain changeable metadata?
winmail.dat
Reply all
Reply to author
Forward
0 new messages