DROID and pdf/a

285 views
Skip to first unread message

Antonio Dalvit

unread,
Dec 5, 2013, 3:06:26 AM12/5/13
to droid...@googlegroups.com
Goodmorning everyone!

I have a problem (I think it's a problem...) with droid v6.1.3

we have lots of documents created with word 2010 in pdf/a format: when we use droid to identify them, we have as output the PUID "fmt/18" that is simple pdf 1.4.

Is there a problem with pdf/a identification? with office2010 pdf/a creation?

Has someone the same problem?


thanks!

Graham Seaman

unread,
Dec 5, 2013, 3:58:17 AM12/5/13
to droid...@googlegroups.com
Hi Antonio,

DROID needs to analyse the whole file, not just the start and end, to recognise PDF/A format. If you set the option  'maximum bytes to scan' under 'profile defaults' in the preferences to -1, it should do this. The default is to only scan part of the file - scanning the whole file does slow DROID down.

Hope that works for you!

Best regards
Graham 

Antonio Dalvit

unread,
Dec 5, 2013, 4:22:22 AM12/5/13
to droid...@googlegroups.com
Excellent!

It works perfectly now.

thanks!

Antonio Dalvit

unread,
Dec 12, 2013, 3:04:18 AM12/12/13
to droid...@googlegroups.com
First of all.... sorry! I know i'm little noob here....

I'm testing droid in command line and, in tests, i found a discrepancy. 
Droid receives pdfs decrypted from openssl (our documents are in the format pdf.p7m so, we have to decrypt them to feed droid).
With some PDFs (until now, only common element is the pdf creation with "solid pdf creator" software):
- adobe reader identifies them as pdf/a compliant;
- droid as simple pdf;

i tested decrypt function with different formats and i found no differences in the droid output, so i would exclude a problem in the openssl decrypt function.

Any ideas of root causes? any tests i can do? Someone with same problem?


Thanks!!!!!

Dclipsham

unread,
Dec 12, 2013, 8:12:52 AM12/12/13
to droid...@googlegroups.com
Hi Antonio,

I'm curious to understand what Adobe Reader is picking up that DROID isn't. Are you in a position to upload a sample so I can look at the internal byte-code more closely?

DROID's identification of pdf/a is relying on the presence of 2 sequences - one at the beginning of the file which detects the file as a pdf, then a second, variably-positioned sequence, which describes the pdf/a conformance level (there have been several conformance levels). Assuming you have DROID set to scan the entire file, then the most likely scenario is that the pdf/a conformance description is formatted in some way differently to what we are expecting (even a rogue space, captalization of the conformance tag, or a null-byte would mean DROID won't recognise it as a pdf/a). 

I should add that DROID has no sense of the validity of a pdf file, so whereas tools like Adobe Preflight will be checking such things as embedded fonts and other pre-requisites for creating conformant pdf/a files, DROID is for simple (and efficient) first-pass identification and won't pick up on non-valid elements or file corruption.

I hope to hear from you soon.

David

Antonio Dalvit

unread,
Dec 12, 2013, 8:54:12 AM12/12/13
to droid...@googlegroups.com
Im checking pdf/a fingerprint: i can send you in PM one of the problematic pdf, i wanto to undurstand this difference or pdf malformations.

Thanks for help

(Clearly droid is for fast scan and identification. But if we can improve it, its better!)

Dclipsham

unread,
Dec 12, 2013, 9:01:54 AM12/12/13
to droid...@googlegroups.com
No problem Antonio. You can email any samples to pro...@nationalarchives.gsi.gov.uk (also, I note your submission for .p7m and a colleague will respond shortly..).

Tyler Thorsted

unread,
Jan 6, 2014, 4:17:19 PM1/6/14
to droid...@googlegroups.com
We have found Acrobat 11 Pro writes the XMP in such a way that causes DROID not to see the PDF as a PDF/A. All Other versions of Acrobat and other PDF/A writers all place the part and conformance level immediately after the namespace. Acrobat 11 Pro separates them with other XMP data in between. 

-Tyler

Antonio Dalvit

unread,
Jan 10, 2014, 12:35:45 PM1/10/14
to droid...@googlegroups.com
Ok, 

i will try to write in python some workaround to solve the problem. If i can solve in some way, ill post the code here.

thanks!
Reply all
Reply to author
Forward
0 new messages