PDF Portfolio files

32 views
Skip to first unread message

Kieron Niven

unread,
Dec 18, 2023, 12:20:15 PM12/18/23
to pro...@googlegroups.com
Hi,

I've a small set of PDF Portfolio files (look to have been created in Acrobat Pro) that DROID is identifying as fmt/276 (Acrobat PDF 1.7 - Portable Document Format). I've noticed that fmt/1451 (PDF Portfolio 1.7) also exists in pronom with a higher priority than the other signature so wondering why they're identifying as fmt/276? Is it a problem with the identification/signature or with the files themselves? I've had a quick look at the signature description for fmt/1451 and it looks - to my untrained eye - to contain the right elements but not sure of the syntax here (and it would be useful to see an example that successfully identifies).

Any help gratefully appreciated!

Kieron

Kieron Niven orcid.org/0000-0002-0537-9238 

Digital Archivist: Data Standards

Archaeology Data Service

Department of Archaeology, University of York, The King’s Manor, York, YO1 7EP

Email Disclaimer  |  Privacy Policy

David Clipsham

unread,
Dec 18, 2023, 12:42:56 PM12/18/23
to PRONOM
Hi Kieron, The PDF Portfolio signature contains some varaibly-positioned elements, so the first thing I'd suggest checking is the DROID settings under Tools > Preferences. The default value of the 'Maximum bytes to scan' is 65536, which means it'll stop scanning after the first and last 64KB of the file, so if these other elements appear outside of there then they won't identify accurately.

To change this to scan the whole file, set the value to any minus value, e.g. -1. If you change any Preferences then once you've saved them, you will need to choose 'New' on the main DROID screen to start a new profile in order for those changes to take effect.

If you're already scanning the full file, are you in a position to share an affected file? 

David

---
Note I do not work for The National Archives and am replying in a personal capacity

Kieron Niven

unread,
Dec 19, 2023, 5:13:54 AM12/19/23
to pro...@googlegroups.com
Thanks David,

Set to unlimited so full file is being scanned. Just checking about sharing the files and will message you directly.

Kieron

--
You received this message because you are subscribed to the Google Groups "PRONOM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pronom+un...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/pronom/53ab51b0-1692-422d-b622-324c99183fc4n%40googlegroups.com.

David Clipsham

unread,
Dec 19, 2023, 8:14:28 AM12/19/23
to PRONOM
Thanks Kieron,

The issue with PRONOM identification is that the PDF Portfolio signature expects to see both <</Collection (Collection Dictionary) and <</CI (Collection Item Dictionary), but the files you provided offline only contain the <</Collection entry. From a cursory read through the relevant parts of the PDF 1.7 spec (https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf), I think the signature can drop the <</CI part and reliably identify PDF Portfolio, but I'll need to do a little more testing to be certain.
---
12.3.5:
A collection dictionary specifies the viewing and organizational characteristics of portable collections. 

7.11.3, Table 44 – Entries in a file specification dictionary:
A collection item dictionary, which shall be used to create the user interface for portable collections
---
My (possibly imprecise) interpretation of the above suggests that a <</CI will be present if the portable collection has a user interface, but that this isn't necessarily essential to PDF Portfolio, therefore we can hopefully rely on <</Collection alone.

David

Murray, Kate

unread,
Dec 19, 2023, 8:58:29 AM12/19/23
to PRONOM

On a sort of related note, we recently released a new Library of Congress file format description (fdd) on PDF Portfolio: https://www.loc.gov/preservation/digital/formats/fdd/fdd000620.shtml.

 

All comments are welcome.

 

Best from Kate

 

Kate Murray (she/her)

Sustainability of Digital Formats

FADGI AudioVisual Working Group

Digital Collections Management and Services

Library of Congress

km...@loc.gov

 

 

 

 

From: pro...@googlegroups.com <pro...@googlegroups.com> On Behalf Of David Clipsham
Sent: Tuesday, December 19, 2023 8:14 AM
To: PRONOM <pro...@googlegroups.com>
Subject: Re: PDF Portfolio files

 

CAUTION: This email message has been received from an external source. Please use caution when opening attachments, or clicking on links.

--

You received this message because you are subscribed to the Google Groups "PRONOM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pronom+un...@googlegroups.com.

Tyler Thorsted

unread,
Dec 19, 2023, 6:29:08 PM12/19/23
to PRONOM
I can confirm this as an issue with a few PDF Portfolio's I have. They currently do not identify as fmt/1451 but as regular PDF 1.7 files. They do not contain the "<</CI<<" string, but do contain the "/Collection" string.

Had it on my list to investigate further.

Tyler Thorsted

David Clipsham

unread,
Dec 21, 2023, 6:50:38 AM12/21/23
to PRONOM
Thanks Tyler, that's useful confirmation. I've submitted an update via the PRONOM mailbox,

David
Reply all
Reply to author
Forward
0 new messages