PUID for encrypted PDF's

86 views
Skip to first unread message

Paul Young

unread,
Nov 22, 2016, 6:47:43 AM11/22/16
to droid-list

We have been looking at identifying PDF’s which contain encryption to ensure their accessibility. I wanted to get opinions on whether people would like to see this capability added to PRONOM like we have for office documents with fmt/754 and fmt/494.

 

Looking at PDF specs, http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf p115 talks of encryption and that if in the trailer dictionary there is an entry for ‘Encrypt’ then the document is encrypted and if it is absent then it is not. P55 of this seems to conform with that http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf let me know if you interpret this differently.

 

This allows us to create a signature which would pick up PDF’s which have encryption. I suggest from the EOF up to around a 2048 byte offset looking for the /Encrypt (2F456E6372797074) sequence in the trailer dictionary. This marker does however appear whenever there is any level of PDF encryption, including setting permissions for copying or printing, not just password protection.

 

If we added it to PRONOM we could have one new PUID entry for encrypted PDF’s v1.1-1.7 and possible other entries covering pdf formats which allow encryption.


 

Paul Young

Digital Archivist

+44 (0)20 8876 3444 ext 2308

The National Archives, Kew, Richmond, Surrey TW9 4DU

nationalarchives.gov.uk

Paul Wheatley

unread,
Nov 22, 2016, 10:55:38 AM11/22/16
to droid-list
Hi Paul,

Isn't this confusing the core aim of what PRONOM and PUIDs are for? A PUID is used to identify a file format, and possibly a version of a file format (admittedly this in itself is a fuzzy concept). What you are suggesting (if I'm understanding correctly) is using a PUID to identify a set of files of a particular file format with a particular characteristic. In that case, how far does this go? How many characteristics will get PUIDs? Clearly this would become very confusing very quickly. What about PDF password protection? Does that qualify?

I'd argue that PRONOM and PUIDs should remain focused on doing what they do, and furthermore, continuing to do that very well. Scope creep, in a way that some users (I suspect) would not like to go along with, could well damage the fabulous work you guys have done on this core challenge.

If PUIDs are going to be used to identify the myriad ways in which files are not accessible, you're really going to have your work cut out. If you're going to tackle only a handful of these issues for particular formats, its going to get very confusing due to the inconsistency that this will create.

As regards 754 and 494: 494 makes some sense as from an identification point of view the encryption prevents finer grained format ID (if extension is deemed insufficient). So in this case, 494 would be used to identify the format in lieu of a more specific format version ID. This might be replaced at a later date if cracked or indeed replaced from source with a non-encrytped version of the same object. But this is unnecessary with PDF, so a unique PUID for an encrypted PDF would seem to be solving a completely different kind of problem and taking PRONOM/PUIDs into new territory.

Just my twopenneth

Cheers

Paul

Andy Jackson

unread,
Nov 22, 2016, 11:29:12 AM11/22/16
to droid-list
Hi,

I generally agree with Paul W. on this, but I think it's a moot point, because in PDF 'Encrypted' doesn't mean what you'd think.

Prior to PDF 1.5, any PDF that had access restrictions (can't print/copy/modify/annotate) was required to be encrypted with an empty 'owner' password. This means that the only way you can tell the difference between a accessible PDF and a locked PDF is to attempt to decrypt it with the shared password. All PDF parser implementations have to do this.

Therefore, this cannot be done unless you actually implement the PDF decryption algorithm. I think this is probably out of scope for DROID.

Best wishes,
Andy Jackson

Matt Palmer

unread,
Nov 23, 2016, 5:00:10 AM11/23/16
to droid-list
Hi,

We discussed this during DROID development - we had some plans to identify encrypted documents, as there are clear preservation issues when material is encrypted.  Knowing which material is (potentially) encrypted would be useful. 

However, we came to the same conclusion at the time that there are many characteristics which we may want to know about a file, and this was merely one of them, and it wasn't necessarily possible to identify all encrypted material reliably.  We did imagine an extension of DROID which would allow for adding arbitrary characteristics to a file (e.g. a format characteristics table), and signatures to identify them.  This wouldn't alter the PUID of the format - it would just add an arbitrary list of characteristics if they existed.  I still think this is a nice idea, but it's a fair amount of work to add new types of signatures and alter the underlying data model.

Of course, nothing stops particular institutions adding signatures that they find useful, and creating their own PUIDs (maybe in a slightly different format, e.g. chr/1, instead of fmt/1).   Might be nice to change DROID so it can download signature files from different locations without requiring the PRONOM web service on the other end, in the same way it currently handles container signatures as a simple HTTP file download.  Then people could easily host their own signature files for DROID, adding whichever ones they wanted to the standard PRONOM sigs.

Regards,

Matt.

Paul Young

unread,
Nov 23, 2016, 5:27:05 AM11/23/16
to droid-list
Hi, 

Thanks everyone for feedback. Happy to agree that this is out of scope for official PRONOM signature releases. However have been thinking on similar lines to Matt that we could make available signatures which we have found useful but do not fit into PRONOM, for example we also have a signature which can scan if a .msg file contains an attachment. People can then download these and run them separately to standard PRONOM signature. We will think about where we could make them available.

Paul

Lehane, Richard

unread,
Nov 28, 2016, 4:44:40 AM11/28/16
to droid...@googlegroups.com
Hi droid list,
agree that probably best to keep these “characterisation” type signatures out of PRONOM-core but that they might still be worth pursuing as optional add-ins & I like Matt’s “chr” namespace suggestion.

Paul – in terms of thinking about *where* to make such signatures available, suggest it might be worth pursuing the ideas in Ross’s blog post re. using Github repos: http://openpreservation.org/blog/2015/08/11/proposal-github-to-enable-a-federated-approach-to-distributing-and-utilising-custom-droid-signatures/ & https://github.com/exponential-decay/droid-signature-files.

Cheers
Richard

Hi, 

Paul

Regards,

Matt.

Best wishes,
Andy Jackson

Just my twopenneth

Cheers

Paul

--
You received this message because you are subscribed to the Google Groups "droid-list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to droid-list+...@googlegroups.com.
To post to this group, send email to droid...@googlegroups.com.
Visit this group at https://groups.google.com/group/droid-list.
For more options, visit https://groups.google.com/d/optout.

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

Andy Jackson

unread,
Nov 28, 2016, 5:06:10 AM11/28/16
to droid-list
I believe the chr namespace is already taken: http://www.nationalarchives.gov.uk/PRONOM/chr/1

+1 to using GitHub.

Andy

Matt Palmer

unread,
Nov 28, 2016, 6:29:05 AM11/28/16
to droid...@googlegroups.com
Hmmm, didn't know that namespace was already taken by TNA!

I wonder if external PUIDs shouldn't somehow include the organisation that creates it, to avoid naming conflicts.  Something like "au.gov.nsw:chr/1"

Impossible to enforce, but would make life easier for everyone if conflicts could be avoided.  May not be enough of them to matter, of course!

Regards,

Matt.


--
You received this message because you are subscribed to a topic in the Google Groups "droid-list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/droid-list/OgeJX6oEaAc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to droid-list+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages