Should we reinstate a binary signature for fmt/59 - Microsoft Excel 5.0/95 Workbook (xls)?

32 views
Skip to first unread message

Dclipsham

unread,
Sep 29, 2016, 11:00:01 AM9/29/16
to droid-list
For v88 of the PRONOM signature release, I removed the binary signature for fmt/59. The release note states:

fmt/59: Microsoft Excel 5.0/95 Workbook (xls). Removed binary signature at request of National Library of New Zealand due to wide byte range of signature causing identification clashes with unrelated formats. PUID will now identify by Container Signature only.

The binary signature was:
BOF offset range 512-8704: 0908{2}00050500 

The problem was that the large offset range resulted in coincidental clashes with unrelated formats, e.g. with TIFF, that just happened to contain the same sequence of bytes within that range, so I felt the signature as it stood was too loose.

Meanwhile the container signature is a lot more specific, seeking the presence of the byte sequence '0908{2}00050500' within a file called 'Book' within an OLE2 compound file (fmt/111).

I therefore took the decision to remove the binary signature so that identification of fmt/59 now relies entirely on the container signature mechanism.

We do not currently have a policy of removing old binary signatures where there is a container signature equivalent. 

Where we create new container signatures, we do not currently create a binary equivalent.

We strongly encourage users of DROID to use the most up to date version which includes container signature identification (it was introduced in version 6), and we encourage tool providers that rely on PRONOM to also implement container signature identification where possible.

An alternative approach to fmt/59 would have been to modify the binary signature. For example, we could create a signature that seeks the OLE2 BOF sequence 0xD0CF11E0A1B11AE1 as well as the 0x0908{2}00050500 

So, I would like the community to provide feedback, not just on this suggestion [to re-implement binary ID for fmt/59], but also container signatures generally - 

Should we reinstate a binary signature for fmt/59?
Is a binary equivalent for container formats generally desirable (and note that this may not be possible with all container-type formats)? 
What PRONOM-based tools exist that are still in use do not or cannot employ container signature identification? 
How prevalent are these tools?


N.B. container formats in the DROID/PRONOM sense are compound files that represent a single intellectual entity that employs a zip or OLE2 mechanism for storage. We are not referring to archival container type formats (.7z, .rar, .tar) that may contain many files, nor are we referring to media container formats like MPEG-2, that could contain video, audio and subtitle data within a wrapper container.


David


Lehane, Richard

unread,
Sep 29, 2016, 11:55:52 PM9/29/16
to droid...@googlegroups.com

Hi David,

 

I’d be happy to see it removed permanently & for this to be applied as a general rule to all binary sigs for formats that have container sigs. I’m not aware of any PRONOM-based tools that would be impacted … & thanks for considering these tools J.

 

For the last year, siegfried has by default filtered out all binary signatures where formats have container signatures & it was for precisely this same reason that you are describing: I was getting reports of false positive identifications. This change was made in v1.3.0 (https://github.com/richardlehane/siegfried/blob/master/CHANGELOG.md#v130-2015-09-27). There is a flag (-doubleup) that can be used to override this but I’m not aware of anyone using it. During the last year I haven’t had any reports of issues arising from the feature.

 

I think there is an additional benefit of removing these binary signatures: if the container matching has failed for these formats (e.g. xls) I’d generally want to know (with an Unknown result) because that is a good indicator of some kind of problem with the file. If a binary result is returned, many users would not realise that the match *should* have been based on the container signature and so wouldn’t be alerted to the likely corruption of that file.

 

Cheers

Richard

--
You received this message because you are subscribed to the Google Groups "droid-list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to droid-list+...@googlegroups.com.
To post to this group, send email to droid...@googlegroups.com.
Visit this group at https://groups.google.com/group/droid-list.
For more options, visit https://groups.google.com/d/optout.

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________


______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

Lehane, Richard

unread,
Sep 30, 2016, 12:13:09 AM9/30/16
to droid...@googlegroups.com

Apologies… when considering the tools I was just thinking of fido, sf and DROID. Thinking about it further, I believe fidoo (http://www.techmaurice.com/fidoo/) may be impacted because I understand it is binary only at the moment.

Cheers

Richard

Matt Palmer

unread,
Sep 30, 2016, 4:57:42 AM9/30/16
to droid-list
Hi David,

My own view is that binary signatures should not be necessary where there is a container equivalent, but with one caveat.

 The original rationale for container signatures was that the container part (OLE2 or ZIP) obscures the recognisable binary byte sequences inside them, leading to unreliable binary signatures, which are often too loose in order to find the small fragments which aren't obscured (false positives) and which nevertheless have large numbers of identification failures (false negatives).  Container signatures essentially unwrap the container first, making good binary byte sequences discoverable again - with the added bonus you can limit them to particular "files" within the container, increasing the accuracy in most cases.

The only reasons I can see to continue to support these (somewhat broken) binary signatures is:
1. to continue supporting older versions of DROID which don't support container signatures at all,
2. to support other non-DROID programs which only support PRONOM binary signatures.

Interested to hear what others have to say on the matter,

Regards,

Matt.

Matt Palmer

unread,
Sep 30, 2016, 5:37:16 AM9/30/16
to droid-list
I guess one way to get the best of both worlds is to publish two signatures files at different URLs, one without the older binary versions.  That might be a rather tedious manual process for someone at TNA however... 

One more thought: does there need to be a generic way to "deprecate" any signature, while still making it available to those who may still rely on them for whatever reason?

Matt.
Reply all
Reply to author
Forward
0 new messages