PPT fmt/181 vs fmt/126

68 views
Skip to first unread message

Hannah Wang

unread,
Nov 28, 2023, 3:31:14 PM11/28/23
to PRONOM
Hi everyone,

At NARA, we are making updates and deprecating some of the entries for Microsoft PowerPoint in our Digital Preservation Framework, specifically ones that map to PUIDs that have been merged into fmt/126. So far, we are definitely planning to deprecate both Microsoft PowerPoint for Macintosh 98 (formerly fmt/180) and Microsoft PowerPoint for Macintosh X (formerly fmt/182) for this reason.

We currently have Microsoft PowerPoint for Macintosh 2001 as its own entry, which maps to fmt/181. However, I was wondering if this PUID should be deprecated in favor of fmt/126 as well? 

Per the FDD on PPT, PPT for Mac 2001 falls within the range of releases that used the Microsoft Office PowerPoint 97-2003 Binary File Format (fmt/126). fmt/181 also does not have an internal signature (though its description and MIME type were recently updated, which does give me pause about this question!).

Thanks in advance,
Hannah

Tyler Thorsted

unread,
Dec 7, 2023, 1:49:44 PM12/7/23
to PRONOM
Hi Hannah, 

It was good to chat with you today. I think this question warrants closer inspection.

Looking at fmt/126 closer, there is some curious identification going on. There appears to be a binary and a container signature both mapped to fmt/126 

    <ContainerSignature Id="3000" ContainerType="OLE2">
      <Description>Microsoft Powerpoint 2000 OLE2</Description>
      <Files>
        <File>
          <Path>PowerPoint Document</Path>
        </File>
...
    <!-- Microsoft PowerPoint 2000 (OLE2)-->
    <FileFormatMapping signatureId="3000" Puid="fmt/126"/>

        <FileFormat ID="135" MIMEType="application/vnd.ms-powerpoint"
            Name="Microsoft Powerpoint Presentation" PUID="fmt/126" Version="97-2003">
            <InternalSignatureID>172</InternalSignatureID>
            <Extension>ppt</Extension>
            <HasPriorityOverFileFormatID>767</HasPriorityOverFileFormatID>
        </FileFormat>

Not sure if this was on purpose as I have seen container files defined in the binary signature file, but not usually both. 

Might be good to clean up the set of Powerpoint signatures from this time period. I have lots of samples, so I can compare each major version to see if it is necessary to break down by version. 

Tyler Thorsted

David Clipsham

unread,
Dec 12, 2023, 5:19:25 AM12/12/23
to PRONOM

Interesting thread! 

A bit of historical context to start with: 

DROID version 6, in 2010 introduced Container Signatures, but many formats that went on to have Container Signatures already had binary signatures. Because TNA had no way of tracking user take-up of DROID 6, there was a decision that they should maintain backwards compatibility with earlier versions of DROID for a time, so this meant: a) not removing pre-existing Binary Signatures and b) I believe for a brief period they created both Binary and Container Signatures where this was possible. By the time I started working on PRONOM in 2012 the guidance was not to remove pre-existing Binary Signatures that also had Container Signatures, but IIRC I think I didn’t create any new ones where I was creating a new Container Signature.

My personal view on this now is that I think it’s been long enough that there won’t be anybody using DROID 5 or earlier in any formal context (maybe historical research/curiosity?) so perhaps it is time to remove the older Binary Signatures, but I also expect that they are mostly likely to be benign.

Here are all the file formats that have both at least one Container Signature plus at least one binary signature. Some big hitters in here:

fmt/39 - Microsoft Word Document 6.0/95
fmt/40 - Microsoft Word Document 97-2003
fmt/61 - Microsoft Excel 97 Workbook (xls) 8
fmt/125 - Microsoft Powerpoint Presentation 95
fmt/126 - Microsoft Powerpoint Presentation 97-2003
fmt/136 - OpenDocument Text 1.0
fmt/137 - OpenDocument Spreadsheet 1.0
fmt/138 - OpenDocument Presentation 1.0
fmt/139 - OpenDocument Graphics 1.0
fmt/140 - OpenDocument Database Format 1.0
fmt/161 - SIARD (Software-Independent Archiving of Relational Databases) 1.0
fmt/290 - OpenDocument Text 1.1
fmt/291 - OpenDocument Text 1.2
fmt/292 - OpenDocument Presentation 1.1
fmt/293 - OpenDocument Presentation 1.2
fmt/294 - OpenDocument Spreadsheet 1.1
fmt/295 - OpenDocument Spreadsheet 1.2
fmt/296 - OpenDocument Graphics 1.1
fmt/297 - OpenDocument Graphics 1.2
fmt/429 - CorelDraw Drawing X4
fmt/430 - CorelDraw Drawing X5
fmt/482 - Apple iBook format
fmt/483 - ePub format
x-fmt/88 - Microsoft PowerPoint Presentation 4.x
x-fmt/412 - Java Archive Format
x-fmt/430 - Microsoft Outlook Email Message 97-2003

I would hope that in all cases the Binary and Container signatures at least agree on what they’re seeking, or if they differ then the Container Signature contains greater detail.

So in terms of clean-up – If these PowerPoint formats that have existing entries can get unambiguous identification, then great, please go ahead with that. If they cannot be disambiguated, but that means that version/date ranges & descriptions need to be tweaked then I think that’s good also.

If we do encounter any given instance where the binary/container signatures are fundamentally different then that instance will warrant revisiting – I expect in most cases the Container Signature will be more specific/accurate.

I hope this is useful,

David

Reply all
Reply to author
Forward
0 new messages