ZIP file identifies as two different format types

42 views
Skip to first unread message

David Neiman

unread,
Nov 5, 2019, 10:41:25 AM11/5/19
to droid-list
We are processing ZIP files (through FITS - File Information Tool Set) and are receiving two different types of identification values depending on the contents of the ZIP.
Typically we get PUID x-fmt/263 MIMEType="application/zip" Name="ZIP Format".

However, if the ZIP contains JAR files, even within embedded directories the results are
PUIT x-fmt/412 MIMEType="application/java-archive" Name="Java Archive Format".

Within this ZIP file there is no META-INF/MANIFEST-MF file as is typical in a Java Archive file.

When these embedded JAR files are removed from the ZIP the results are now the former, expected results.
Any why this might be happening?


Dclipsham

unread,
Nov 5, 2019, 11:27:31 AM11/5/19
to droid-list
Hi David,

I'm struggling to replicate this locally. In my Win10 environment if I place a JAR in a ZIP via either Windows' native zip tool or 7zip, I get the results I'd expect.

I have a suspicion the files are being identified by the binary signature rather than the container mechanism. For the binary signature we're looking for this sequence: 504B0304*4D4554412D494E462F4D414E49464553542E4D46

This is basically seeking the zip header at the start of the file, then at any point thereafter the 'META-INF/MANIFEST-MF' string, meaning a jar within an uncompressed zip would cause the zip itself to identify as a JAR 

For the container signature we're specifically looking for the META-INF/MANIFEST-MF path from the root of the zip file.

Just to check though, if you generate a DROID CSV report from the identification outcome, under the 'METHOD' column, does it contain the value 'Container' or 'Signature?'. How are your zips being created?

David

Dclipsham

unread,
Nov 5, 2019, 11:29:25 AM11/5/19
to droid-list
Just to add, in my testing I've been using DROID 6.4 GUI....

David

klpend...@gmail.com

unread,
Jan 4, 2023, 10:48:15 AM1/4/23
to droid-list
Following up to log a similar result in case anyone else runs into this.

I ran DROID on an uncompressed ZIP file containing MP3 files in a nested directory hierarchy, which resulted in a dual identification of:
x-fmt/263,application/zip,ZIP Format
AND
fmt/134,audio/mpeg,MPEG 1/2 Audio Layer 3

The Method is Signature.

I'm using DROID 6.5.2 GUI in Windows 10, and have tried the Windows zipping tool as well as 7zip to create the ZIP file, with the same result for each. I have the DROID preferences set to not scan inside of archive files.

Making a compressed ZIP file instead results in a single identification of:
x-fmt/263,application/zip,ZIP Format

Dclipsham

unread,
Jan 4, 2023, 11:11:48 AM1/4/23
to droid-list
For this second issue this could be worked-around through giving ZIP priority over MP3. Unfortunately uncompressed zips have a strong chance of unexpected clashes with other formats where there isn't an anchor for an identification signature to a specific offset.

I'm curious - does DROID still attempt to identify its internal files or does the multiple identification impede this behaviour?

David

klpend...@gmail.com

unread,
Jan 4, 2023, 12:27:54 PM1/4/23
to droid-list
Identification of internal files still works with full and expected results for cases of dual-identification of the parent ZIP.
Reply all
Reply to author
Forward
0 new messages