TIFF access normalization failure

126 views
Skip to first unread message

Geoff Edwards

unread,
Jun 20, 2012, 10:40:04 AM6/20/12
to archiv...@googlegroups.com
Hello
 
I'm having real problems when it comes to creating access copies of TIFF files. I'm testing the process on a few pretty small files, but each time the process fails. Any suggestions as to what the problem might be? Oddly, I'm getting different errors today to those I got yesterday; previously the error message reported a problem with the TIFF headers, whereas today it doesn't seem to give any specific details, unless I'm looking in the wrong place (see error below).
 
Any suggestions/recommendations would be much appreciated.
 
Thanks.
 
Geoff
 

Task UUID: dff5ccbb-bea1-4c81-830f-ce17440c39a4
File UUID: 52e45594-b471-44dd-9a3f-f4ff1ae81e40
File name: 41D02-9901-DF-00001.TIF
Client: ubuntu_1

(exit code: 255)

ST:

June 20, 2012, 2:29 p.m.
ET: June 20, 2012, 2:32 p.m.
CT: June 20, 2012, 2:20 p.m.

--inputFile "%sharedPath%currentlyProcessing/2020100-f079a607-6ca2-4f4e-b6dc-38b8d4f84ab6/objects/41D02-9901-DF-00001.TIF" --commandClassifications "access" --fileUUID "52e45594-b471-44dd-9a3f-f4ff1ae81e40" --taskUUID "%taskUUID%" --excludeDirectory "%sharedPath%currentlyProcessing/2020100-f079a607-6ca2-4f4e-b6dc-38b8d4f84ab6/objects/submissionDocumentation/" --objectsDirectory "%sharedPath%currentlyProcessing/2020100-f079a607-6ca2-4f4e-b6dc-38b8d4f84ab6/objects/" --logsDirectory "%sharedPath%currentlyProcessing/2020100-f079a607-6ca2-4f4e-b6dc-38b8d4f84ab6/logs/" --date "%date%" --accessDirectory "%sharedPath%currentlyProcessing/2020100-f079a607-6ca2-4f4e-b6dc-38b8d4f84ab6/DIP/objects/"

STDOUT

Operating on file:  /var/archivematica/sharedDirectory/currentlyProcessing/2020100-f079a607-6ca2-4f4e-b6dc-38b8d4f84ab6/objects/41D02-9901-DF-00001.TIF
Using access command classifications
Running:
[COMMAND]
PK: 16
Type: command
command: convert "/var/archivematica/sharedDirectory/currentlyProcessing/2020100-f079a607-6ca2-4f4e-b6dc-38b8d4f84ab6/objects/41D02-9901-DF-00001.TIF" -sampling-factor 4:4:4 -quality 60 "/var/archivematica/sharedDirectory/currentlyProcessing/2020100-f079a607-6ca2-4f4e-b6dc-38b8d4f84ab6/DIP/objects/52e45594-b471-44dd-9a3f-f4ff1ae81e40-41D02-9901-DF-00001.jpg"
description: Transcoding to jpg with convert
outputLocation: /var/archivematica/sharedDirectory/currentlyProcessing/2020100-f079a607-6ca2-4f4e-b6dc-38b8d4f84ab6/DIP/objects/52e45594-b471-44dd-9a3f-f4ff1ae81e40-41D02-9901-DF-00001.jpg
verificationCommand: 1

Running:
[COMMAND]
PK: 1
Type: command
command: test -s "/var/archivematica/sharedDirectory/currentlyProcessing/2020100-f079a607-6ca2-4f4e-b6dc-38b8d4f84ab6/DIP/objects/52e45594-b471-44dd-9a3f-f4ff1ae81e40-41D02-9901-DF-00001.jpg"
description: Standard verification command
outputLocation: None
verificationCommand: None

STDERR

Failed:
Failed:

Joseph Perry

unread,
Jun 20, 2012, 3:05:08 PM6/20/12
to archiv...@googlegroups.com
First let's break down what's happening here:
The normalization command (using convert) is running, exiting with an exit code of zero, indicating a success.

Because the exit code isn't always reliable, with some tools, archivematica is running a verification command "test -s ". This is checking to ensure the output file exists, and is not of size zero.
It is failing this test. Meaning the output file was not created, or was created as a 0 byte file (no content).

So where do we go from here?
gather more information.

Is archivematica failing on the same files consistently?

Could you test the convert command outside the archivematica processing on the file, and see if it fails.

What is the FITS output on the file? Is it well formed and valid?

Do you know what created the file? What software, what version, what scanner firmware, drivers etc.?

Joseoph

Geoff Edwards

unread,
Jun 20, 2012, 3:51:49 PM6/20/12
to archiv...@googlegroups.com
I tried running the conversion outside of Archivematica and it worked. However, the TIF file I was working with seems to be a multi-page file, so for the one TIF file I converted, 6 JPEGs (each one being a single page of the original document) was produced. Each of the JPEGs was then labelled individually, e.g. filename-1.jpg, filename-2.jpg, etc. Is this the root of the issue - the fact that AM is looking to check a file called filename.jpg and not finding it?
Geoff

Joseph Perry

unread,
Jun 20, 2012, 4:29:38 PM6/20/12
to archiv...@googlegroups.com
Geoff,
That does sound like an issue, but can I get you to confirm you're using the same parameters as the archivematica command:

convert INPUTFILE.TIF -sampling-factor 4:4:4 -quality 60 -layers merge OUTPUTFILE.jpg

...

Some part of my memory is telling me we've addressed converting multi page tif's in the past.
But I think that was for normalizing to preservation format.

We'd create a directory, as the output of the command, and put the output files in that directory, then use a different verification command, to verify it output the files not of size 0:
exitCode=0
filesFound=0
function checkDirectory {
    cd "$1"
    for f in *; do
        if [ -d "$f" ] ; then
            checkDirectory "$f" || exitCode=1
        else
            if [ -s "$f" ]; then
                filesFound=1
            else
                echo "0 byte file: $f"
                exitCode=2
            fi
        fi
    done
}

checkDirectory "%outputLocation%"
if [ $filesFound -eq 0 ] ; then
    exit 3
else
    exit $exitCode
fi


The problem with this approach in relation to access normalization is that archivematica DIPs are flat, and don't support sub directories.


Joseph




On 12-06-20 12:51 PM, Geoff Edwards wrote:
I tried running the conversion outside of Archivematica and it worked. However, the TIF file I was working with seems to be a multi-page file, so for the one TIF file I converted, 6 JPEGs (each one being a single page of the original document) was produced. Each of the JPEGs was then labelled individually, e.g. filename-1.jpg, filename-2.jpg, etc. Is this the root of the issue - the fact that AM is looking to check a file called filename.jpg and not finding it?
Geoff
--
You received this message because you are subscribed to the Google Groups "archivematica" group.
To view this discussion on the web visit https://groups.google.com/d/msg/archivematica/-/hHXWWuh6GD8J.
To post to this group, send email to archiv...@googlegroups.com.
To unsubscribe from this group, send email to archivematic...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/archivematica?hl=en.

Joseph Perry

unread,
Jun 20, 2012, 4:44:39 PM6/20/12
to archiv...@googlegroups.com
Geoff,
My apologies. The command below is in the one in the current development trunk for the 0.9 release. It appears '-layers merge' has been added. I'll speak with Evelyn, who is more familiar with .tif normalization than myself, about this and get back to you.

In the mean time, would it be possible, for you to send us a sample of one of the multi page tif's to do some testing with?

Joseph

Evelyn McLellan

unread,
Jun 21, 2012, 6:50:12 PM6/21/12
to archiv...@googlegroups.com
Hi Geoff,

I believe your problems are related to the fact that you are processing multi-page TIFF files. The problem is that jpeg doesn't support multi-page files, so during conversion Archivematica creates multiple single-page jpegs from the multi-page tiff file. This is problematic because the system relies on linking one access copy to one original file (i.e. through the file UUID). It's failing because it's trying to connect multiple jpeg access files to a single original file.

Have you had any success with single-page files?

Evelyn McLellan
Artefactual Systems Inc.
To unsubscribe from this group, send email to archivematica+unsubscribe@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/archivematica?hl=en.

--
You received this message because you are subscribed to the Google Groups "archivematica" group.
To post to this group, send email to archiv...@googlegroups.com.
To unsubscribe from this group, send email to archivematica+unsubscribe@googlegroups.com.

Geoff Edwards

unread,
Jun 28, 2012, 3:00:20 PM6/28/12
to archiv...@googlegroups.com
Hi Evelyn
 
No problems with single page TIFFs.
 
Do you think there's any solution for the multi-page issue?
 
Thanks,
 
Geoff

Evelyn McLellan

unread,
Jul 2, 2012, 8:41:35 PM7/2/12
to archiv...@googlegroups.com
Hi Geoff,

It might be possible to convert them to multi-page GIF files using ImageMagick, which is the tool we use for raster image normalization in Archivematica. I have filed this as a possible enhancement for Archivematica 1.0 (scheduled for release in early 2013). If you do any experiments yourself, please feel free to add your comments to the issue at http://code.google.com/p/archivematica/issues/detail?id=1048. Thanks.

Evelyn

Geoff Edwards

unread,
Jul 5, 2012, 2:36:08 PM7/5/12
to archiv...@googlegroups.com
Thank's for your help. I'll be sure to post any further developments.
Reply all
Reply to author
Forward
0 new messages