fits 1.2.0 problem in detecting TIFF file

90 views
Skip to first unread message

situ...@gmail.com

unread,
Nov 26, 2017, 8:38:48 PM11/26/17
to fits-users
Hi,

I tried to use fits.sh (1.2.0) to get the PRONOM PUID for the files. Result for example: 
<identity format="JPEG File Interchange Format" mimetype="image/jpeg" toolname="FITS" toolversion="1.2.0">
<tool toolname="Droid" toolversion="6.3"/>
<tool toolname="Jhove" toolversion="1.16"/>
<tool toolname="file utility" toolversion="5.14"/>
<tool toolname="Exiftool" toolversion="10.00"/>
<tool toolname="NLNZ Metadata Extractor" toolversion="3.6GA"/>
<version toolname="Droid" toolversion="6.3">1.02</version>
<externalIdentifier toolname="Droid" toolversion="6.3" type="puid">fmt/44</externalIdentifier>
</identity>

When I input a tiff file, I couldn't find PUID from the result. Here is the result:

<?xml version="1.0" encoding="UTF-8"?>
output http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd" version="1.2.0" timestamp="11/27/17 1:26 AM">
  <identification status="SINGLE_RESULT">
    <identity format="TIFF EXIF" mimetype="image/tiff" toolname="FITS" toolversion="1.2.0">
      <tool toolname="Exiftool" toolversion="10.00" />
      <version toolname="Exiftool" toolversion="10.00">6.7</version>
    </identity>
  </identification>
  <fileinfo>
    <size toolname="Jhove" toolversion="1.16">56612260</size>
    <creatingApplicationName toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">Adobe Photoshop CS5 Macintosh</creatingApplicationName>
    <lastmodified toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">2017:11:01 09:58:18</lastmodified>
    <created toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">2017:09:15 09:55:31</created>
    <filepath toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">/vagrant/P1922-001-085.tiff</filepath>
    <filename toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">P1922-001-085.tiff</filename>
    <md5checksum toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">6bc3503cd02774fb7a25871e5973dd69</md5checksum>
    <fslastmodified toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">1511742705000</fslastmodified>
  </fileinfo>
  <filestatus />
  <metadata>
    <image>
      <compressionScheme toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">Uncompressed</compressionScheme>
      <imageWidth toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">3772</imageWidth>
      <imageHeight toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">5000</imageHeight>
      <colorSpace toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">RGB</colorSpace>
      <iccProfileName toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">Adobe RGB (1998)</iccProfileName>
      <iccProfileVersion toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">2.1.0</iccProfileVersion>
      <orientation toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">normal*</orientation>
      <xSamplingFrequency toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">600</xSamplingFrequency>
      <ySamplingFrequency toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">600</ySamplingFrequency>
      <bitsPerSample toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">8 8 8</bitsPerSample>
      <samplesPerPixel toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">3</samplesPerPixel>
      <digitalCameraManufacturer toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">Leaf</digitalCameraManufacturer>
      <digitalCameraModelName toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">Credo 60</digitalCameraModelName>
      <scanningSoftwareName toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">Adobe Photoshop CS5 Macintosh</scanningSoftwareName>
      <fNumber toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">11.0</fNumber>
      <exposureTime toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">0.125</exposureTime>
      <exposureProgram toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">Manual</exposureProgram>
      <isoSpeedRating toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">100</isoSpeedRating>
      <exifVersion toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">0230</exifVersion>
      <shutterSpeedValue toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">1/8</shutterSpeedValue>
      <apertureValue toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">11.0</apertureValue>
      <exposureBiasValue toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">0</exposureBiasValue>
      <lightSource toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">other light source</lightSource>
      <focalLength toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">80.0</focalLength>
      <sensingMethod toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">One-chip color area sensor</sensingMethod>
    </image>
  </metadata>
  <statistics fitsExecutionTime="3685">
    <tool toolname="MediaInfo" toolversion="0.7.75" status="did not run" />
    <tool toolname="OIS Audio Information" toolversion="0.1" status="did not run" />
    <tool toolname="ADL Tool" toolversion="0.1" status="did not run" />
    <tool toolname="VTT Tool" toolversion="0.1" status="did not run" />
    <tool toolname="Droid" toolversion="6.3" executionTime="3644" />
    <tool toolname="Jhove" toolversion="1.16" executionTime="2144" />
    <tool toolname="file utility" toolversion="5.14" executionTime="1080" />
    <tool toolname="Exiftool" toolversion="10.00" executionTime="1262" />
    <tool toolname="NLNZ Metadata Extractor" toolversion="3.6GA" status="did not run" />
    <tool toolname="OIS File Information" toolversion="0.2" executionTime="1418" />
    <tool toolname="OIS XML Metadata" toolversion="0.2" status="did not run" />
    <tool toolname="ffident" toolversion="0.2" executionTime="1005" />
    <tool toolname="Tika" toolversion="1.10" executionTime="3188" />
  </statistics>
</fits>

Do I have something wrong? Anyone had same problem?

Many thanks

Elin


dne...@g.harvard.edu

unread,
Dec 12, 2017, 3:19:09 PM12/12/17
to fits-users
Hello Elin,

Here is the reason why you are not seeing the PRONOM PUID in some files you are processing:
This value, as you can see from the example <identity> element you posted, the "puid" value is provided by the Droid tool.
In the following output you provide you can see in the <identification> section that there is no <identity> element provided by the Droid tool. Thus, no "puid". There is a reason for this.

In the first example, all tools in that section identified the input file format as "JPEG File Interchange Format" (JFIF). That is, the Droid, Jhove, file utility, Exiftool, and NLNZ Metadata Extractor tools all identified this format.

In the second example, only the Exiftool identified the input file format as "TIFF EXIF".
But what about the other tools???
They most likely identified a file format that was less specific than "TIFF EXIF". Thus, their output was not included in the <identification>, <fileinfo>, or <metadata> sections.

The hierarchy of file formats is contained in the FITS deployment in the file xml/fits_format_tree.xml. If you look at the first section of that file you'll find "TIFF EXIF" embedded under <branch format="Tagged Image File Format">. So if the other tools identified the file format as "Tagged Image File Format" (TIFF) and Exiftool identified the input file as "TIFF EXIF", then only the output from the Exiftool will be used in the output because it is a more specific format than just TIFF.

There are a couple of ways to test this.
1) Open the configuration file xml/fits.xml. In the <output> section change the value of the <display-tool-output>false</display-tool-output> to true. Then, when running FITS, you will see the output of all the tools being used by FITS even if their output is not included in the FITS output.

2) Try commenting out the line in the <tools> section that contains the Exiftool and run the file through FITS again. There's a good chance you will see Droid tool output with the "puid" along with output from other tools as well.

I'd be interested to hear back your results.

Best,
David N.

situ...@gmail.com

unread,
Dec 13, 2017, 9:36:29 PM12/13/17
to fits-users
Thanks David,

As you said the Exiftool is chosen to extract information in <identity> and <metadata> because of the <branch> settings.  The tool OIS File Information is still used for extracting file information.

I have done some quickly tests as you suggest

1) Open the configuration file xml/fits.xml. In the <output> section change the value of the <display-tool-output>false</display-tool-output> to true. Then, when running FITS, you will see the output of all the tools being used by FITS even if their output is not included in the FITS output.

I set <display-tool-output>true</display-tool-output>, in the output file, there is a <tooOutput> listing different tool results including Droid and PUID. It is good to know the details for the result of each tool. 

2) Try commenting out the line in the <tools> section that contains the Exiftool and run the file through FITS again. There's a good chance you will see Droid tool output with the "puid" along with output from other tools as well.

After commenting out the Exiftool line, the DROID running and the PUID shown. The result is:
ersion="1.2.0" timestamp="12/13/17 10:38 PM">
  <identification>
    <identity format="Tagged Image File Format" mimetype="image/tiff" toolname="FITS" toolversion="1.2.0">
      <tool toolname="Droid" toolversion="6.3" />
      <tool toolname="Jhove" toolversion="1.16" />
      <tool toolname="file utility" toolversion="5.14" />
      <tool toolname="ffident" toolversion="0.2" />
      <tool toolname="Tika" toolversion="1.10" />
      <version toolname="Jhove" toolversion="1.16">6.0</version>
      <externalIdentifier toolname="Droid" toolversion="6.3" type="puid">fmt/353</externalIdentifier>
    </identity>
  </identification>
  <fileinfo>
    <size toolname="Jhove" toolversion="1.16">32895032</size>
    <filepath toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">/vagrant/TIFF_Image-MS_0056_143_001.t
if</filepath>
    <filename toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">TIFF_Image-MS_0056_143_001.tif</filen
ame>
    <md5checksum toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">3b50d711804e51c1574d527fab54b207</
md5checksum>
    <fslastmodified toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">1436502558000</fslastmodified>
  </fileinfo>
  <filestatus>
    <well-formed toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">true</well-formed>
    <valid toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">true</valid>
    <message toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">Value offset not word-aligned: 32894971 offset=32894
814</message>
    <message toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">Value offset not word-aligned: 32895003 offset=32894
826</message>
  </filestatus>
  <metadata>
    <image>
      <byteOrder toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">little endian</byteOrder>
      <compressionScheme toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">Uncompressed</compressionScheme>
      <imageWidth toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">2964</imageWidth>
      <colorSpace toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">RGB</colorSpace>
      <referenceBlackWhite toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">0 255 0 255 0 255</referenceBlackWhite>
      <orientation toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">normal*</orientation>
      <samplingFrequencyUnit toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">in.</samplingFrequencyUnit>
      <xSamplingFrequency toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">300</xSamplingFrequency>
      <ySamplingFrequency toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">300</ySamplingFrequency>
      <bitsPerSample toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">8 8 8</bitsPerSample>
      <samplesPerPixel toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">3</samplesPerPixel>
      <scanningSoftwareName toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">Adobe Photoshop CS6 (Windows)</scanningSoftwareName>
      <fNumber toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">11</fNumber>
      <exposureTime toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">1</exposureTime>
      <isoSpeedRating toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">100</isoSpeedRating>
      <exposureBiasValue toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">0</exposureBiasValue>
      <meteringMode toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">Pattern</meteringMode>
      <flash toolname="Jhove" toolversion="1.16" status="SINGLE_RESULT">Flash did not fire, compulsory flash mode</flash>
    </image>
  </metadata>
  <statistics fitsExecutionTime="6085">
    <tool toolname="MediaInfo" toolversion="0.7.75" status="did not run" />
    <tool toolname="OIS Audio Information" toolversion="0.1" status="did not run" />
    <tool toolname="ADL Tool" toolversion="0.1" status="did not run" />
    <tool toolname="VTT Tool" toolversion="0.1" status="did not run" />
    <tool toolname="Droid" toolversion="6.3" executionTime="2192" />
    <tool toolname="Jhove" toolversion="1.16" executionTime="1867" />
    <tool toolname="file utility" toolversion="5.14" executionTime="722" />
    <tool toolname="NLNZ Metadata Extractor" toolversion="3.6GA" status="did not run" />
    <tool toolname="OIS File Information" toolversion="0.2" executionTime="728" />
    <tool toolname="OIS XML Metadata" toolversion="0.2" status="did not run" />
    <tool toolname="ffident" toolversion="0.2" executionTime="668" />
    <tool toolname="Tika" toolversion="1.10" executionTime="6038" />
  </statistics>
</fits>

The tools chosen can also be set up from the fits.xml.

That really helps. Thanks.

Elin

situ...@gmail.com

unread,
Mar 12, 2018, 11:03:54 PM3/12/18
to fits-users
Hi David,

I have another same kind of problem for a .doc file. 
I have removed all branches from fits_format_tree.xml. I would expect all tools will be used to identify the file format and list different <identity> in the <identification>. I set <display-tool-output>true</display-tool-output> in the fits.xml. The following is the result. Droid doesn't display in the <identification>, but from  <toolOutput>, you can see Droid result. 

<?xml version="1.0" encoding="UTF-8"?>
  <identification>
    <identity format="Microsoft Word Binary File Format" mimetype="application/msword" toolname="FITS" toolversion="1.2.0">
      <tool toolname="file utility" toolversion="5.14" />
      <tool toolname="Exiftool" toolversion="10.00" />
      <tool toolname="NLNZ Metadata Extractor" toolversion="3.6GA" />
      <tool toolname="ffident" toolversion="0.2" />
      <tool toolname="Tika" toolversion="1.10" />
    </identity>
  </identification>
  <fileinfo>
    <lastmodified toolname="Exiftool" toolversion="10.00" status="CONFLICT">2014:09:07 22:12:00</lastmodified>
    <lastmodified toolname="Tika" toolversion="1.10" status="CONFLICT">2014-09-07T22:12:00Z</lastmodified>
    <created toolname="Exiftool" toolversion="10.00" status="CONFLICT">2014:09:07 22:12:00</created>
    <created toolname="NLNZ Metadata Extractor" toolversion="3.6GA" status="CONFLICT">2014-09-07 22:12:00</created>
    <creatingApplicationName toolname="Exiftool" toolversion="10.00">Microsoft Office Word</creatingApplicationName>
    <creatingApplicationVersion toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">14.0000</creatingApplicationVersion>
    <filepath toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">/vagrant/M Davidson - HK at the Hocken.doc</filepath>
    <filename toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">M Davidson - HK at the Hocken.doc</filename>
    <size toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">76288</size>
    <md5checksum toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">7e1f0bb07230b56e9501cafbe5c56757</md5checksum>
    <fslastmodified toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">1410127973000</fslastmodified>
  </fileinfo>
  <filestatus />
  <metadata>
    <document>
      <pageCount toolname="Exiftool" toolversion="10.00">13</pageCount>
      <wordCount toolname="Exiftool" toolversion="10.00">4635</wordCount>
      <characterCount toolname="Exiftool" toolversion="10.00">26422</characterCount>
      <author toolname="NLNZ Metadata Extractor" toolversion="3.6GA">Meg</author>
      <lineCount toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">220</lineCount>
      <paragraphCount toolname="Exiftool" toolversion="10.00" status="SINGLE_RESULT">61</paragraphCount>
      <isProtected toolname="Tika" toolversion="1.10" status="SINGLE_RESULT">yes</isProtected>
      <language toolname="NLNZ Metadata Extractor" toolversion="3.6GA" status="SINGLE_RESULT">U.S. English</language>
    </document>
  </metadata>
  <toolOutput>
    <tool name="Droid" version="6.3">
      <results xmlns="">
        <result>
          <filePuid>fmt/111</filePuid>
          <formatName>OLE2 Compound Document Format</formatName>
          <mimeType />
          <version>null</version>
        </result>
      </results>
    </tool>
    <tool name="file utility" version="5.14">
      <fileUtilityOutput xmlns="">
        <rawOutput>Microsoft Office Document Microsoft Word Document
application/msword; charset=binary</rawOutput>
        <mimetype>application/msword</mimetype>
        <format>Microsoft Office Document Microsoft Word Document</format>
      </fileUtilityOutput>
    </tool>
...
 
Are there any other places to set to use tools?

Many thanks.

Kind regards,
Elin

dne...@g.harvard.edu

unread,
Mar 14, 2018, 10:47:20 AM3/14/18
to fits-users
Hi Elin,

I'd suggest not editing the fits_format_tree.xml file unless you really understand its role in FITS.
Without actually having your .doc file that you used for this output, I cannot accurately explain your output since different .doc files might provide slightly different metadata depending on platform and version.
That being said, in looking at the Droid tool output you included, it provides no useful metadata besides mimetype and format, which would only help to determine <identity> information which is already being provided by 5 other tools. As an experiment, you might try reverting back the fits_format_tree.xml file and in the fits.xml file comment out all but the Droid tool and see what your output looks like.

Best,
David

situ...@gmail.com

unread,
Mar 14, 2018, 7:13:51 PM3/14/18
to fits-users
Thanks David.

As you suggested, commented 5 other tools, the result is

<?xml version="1.0" encoding="UTF-8"?>
  <identification status="PARTIAL">
    <identity format="OLE2 Compound Document Format" mimetype="application/octet-stream" toolname="FITS" toolversion="1.2.0">
      <tool toolname="Droid" toolversion="6.3" />
      <externalIdentifier toolname="Droid" toolversion="6.3" type="puid">fmt/111</externalIdentifier>
    </identity>
  </identification>
...

Status="PARTIAL" might be the reason. How can we do to let this DROID <identity> still appear on the <identification>?

Kind regards,
Elin

dne...@g.harvard.edu

unread,
Mar 15, 2018, 10:01:53 AM3/15/18
to fits-users
Hi Elin,

With our FITS test suite this is the result we get with .doc files. Unfortunately, with our limited resources, we are unable to look into this problem further as we do not consider it a bug. Since Droid does not supply any additional useful metadata for .doc files and we already have sufficient identity data from other tools, we are satisfied with the current output of FITS.

Thanks,
David

situ...@gmail.com

unread,
Mar 15, 2018, 4:17:36 PM3/15/18
to fits-users
Thanks David.

the result is OK. I want both identities as follows appeared in the FITS output. Currently only the first identity appeared . Is there anywhere we can configure to allow all identities by all tools to be displayed under the <identification> in the FITS output?

<?xml version="1.0" encoding="UTF-8"?>
  <identification>
    <identity format="Microsoft Word Binary File Format" mimetype="application/msword" toolname="FITS" toolversion="1.2.0">
      <tool toolname="file utility" toolversion="5.14" />
      <tool toolname="Exiftool" toolversion="10.00" />
      <tool toolname="NLNZ Metadata Extractor" toolversion="3.6GA" />
      <tool toolname="ffident" toolversion="0.2" />
      <tool toolname="Tika" toolversion="1.10" />
    </identity>
    <identity format="OLE2 Compound Document Format" mimetype="application/octet-stream" toolname="FITS" toolversion="1.2.0">
      <tool toolname="Droid" toolversion="6.3" />
      <externalIdentifier toolname="Droid" toolversion="6.3" type="puid">fmt/111</externalIdentifier>
    </identity>
  </identification>

Kind regards,
Elin

dne...@g.harvard.edu

unread,
Mar 16, 2018, 1:52:34 PM3/16/18
to fits-users
Hi Elin,

Having more than one identity happens when there are conflicting identities and is considered undesirable. When this happens there is a status="CONFLICT" attribute in the outer <identification> element. Following is an example. Unfortunately, I am unable to look into this further at this time. I'm sorry.

Sincerely,
David

  <identification status="CONFLICT">

    <identity format="Microsoft Word Binary File Format" mimetype="application/msword" toolname="FITS" toolversion="1.2.0">
      <tool toolname="Droid" toolversion="6.3" />

      <tool toolname="Exiftool" toolversion="10.00" />
      <tool toolname="Tika" toolversion="1.10" />
      <version toolname="Droid" toolversion="6.3">97-2003</version>
      <externalIdentifier toolname="Droid" toolversion="6.3" type="puid">fmt/40</externalIdentifier>
    </identity>
    <identity format="Microsoft Excel Format" mimetype="application/vnd.ms-excel" toolname="FITS" toolversion="1.2.1">

      <tool toolname="ffident" toolversion="0.2" />
    </identity>
  </identification>
Reply all
Reply to author
Forward
0 new messages