How to remediate PDF/A-1b non-compliance

60 views
Skip to first unread message

Chelcie Juliet Rowell

unread,
Oct 19, 2016, 12:57:06 PM10/19/16
to digital-...@googlegroups.com
We are working with our campus's alumni magazine to develop a process for born-digital deposit of new issues to our university archives.

After receiving a PDF from the magazine, we tried to convert to PDF/A-1b using Adobe Acrobat DC. The conversion failed, and a subsequent veraPDF compliance check provided more info about the failure. Both the issue and the veraPDF report are attached.

Ideally, we'd love to remediate this PDF and others that we receive from the magazine such that they comply. We could also have conversations with editorial staff of the magazine about packaging their PDFs, but I'm concerned that there's no singular production scenario, and we might run into different compliance problems with each quarterly issue that they send to us.

What advice can you share?

Many thanks,
Chelcie

Chelcie Juliet Rowell
Digital Initiatives Librarian
Z. Smith Reynolds Library
Wake Forest University
rowe...@wfu.edu | 336.758.5477
htmlReport.html

John Scancella

unread,
Oct 20, 2016, 11:28:19 AM10/20/16
to Digital Curation
Hi Juliet,

The question I would ask is why are you converting it to PDF-A? As in, what added value do you get by doing so? And if any value is added, is it enough to warrant a change in how the magazine is create?

As a side, being a developer I can tell you it is really hard to do PDF-A right now due to the lack of tools/libraries available to work with. I believe this is due to the fact that the specification is very complicated and thus hard to implement correctly.

Simon Spero

unread,
Oct 20, 2016, 12:20:49 PM10/20/16
to digital-...@googlegroups.com

Did you try selecting a standards compliance profile from preflight, then running analyze and fix?

If I recall, that gives you more control and feedback, and may indicate where the conversion might be failing. This can be useful if you want to report a bug to Adobe.

Simon


--
You received this message because you are subscribed to the Google Groups "Digital Curation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digital-curation+unsubscribe@googlegroups.com.
To post to this group, send email to digital-curation@googlegroups.com.
Visit this group at https://groups.google.com/group/digital-curation.
For more options, visit https://groups.google.com/d/optout.

Chelcie Juliet Rowell

unread,
Oct 20, 2016, 1:58:34 PM10/20/16
to digital-...@googlegroups.com
Simon, yes, I did select PDF/A-1b from preflight, to no avail. Acrobat wasn't able to fix the errors that were thrown. Thank you for suggesting submitting the report to Adobe; I'll do that! My concern is that it's not an Adobe error but an error with the way the PDF was generated, although I did walk them through creating PDFs that included fonts & images.

John, thank you for the good questions you raise about added value. My aspiration is to create a digital object for the library to steward that is less fragile, more preservation-ready.

I am no expert in digital preservation — more experienced in digitization and digital content strategy — so I would welcome perspectives from digital preservationists on the relative fragility of the example issue I linked to in my original message. Is it a relatively acceptable digital object, despite not strictly complying with the PDF/A-1b standard?

Chelcie Juliet Rowell
Digital Initiatives Librarian
Z. Smith Reynolds Library
Wake Forest University
rowe...@wfu.edu | 336.758.5477

Sara Amato

unread,
Oct 20, 2016, 2:20:50 PM10/20/16
to digital-...@googlegroups.com
Hi Chelcie -
I too have this problem also with some campus generated pdfs (in our case the student newspaper which I think has something to do with the way the image layers are made, but really have no clue)
I have been able to convert the problematic PDFs to PDF/A by first opening them in Acrobat Reader, and choosing File/Print/PDFCreator (or CutePDFWriter in some versions).   The resulting PDF then successfully converts to PDF/A (which I do using  Preview on the Mac,  File / Print  / PDF / Save as Adobe PDF).   I always do a quick visual review of the pdf/a to make sure the images came through correctly.  So far so good!

Hope this helps. 

==============================

Sara Amato

Digital Asset Management Librarian

Hatfield Library, Willamette University

Salem, OR, 97301


John Scancella

unread,
Oct 20, 2016, 2:27:13 PM10/20/16
to Digital Curation
Juliet,

I come the perspective of a developer since that is my day job. To me (and this is just my personal opinion) if there are few tools/libraries for interacting, creating, and verifying file formats then the format might as well not even exist since it will most likely not be supported in the future. Take JPEG-2000 for example, there are very few tools to deal with them even though they have a very well known format. Most of the software industry has ignored it, and I see very little reason for this to change, so why waste time and energy preserving and validating JPEG-2000s when most likely no one will even be able to open them in the future?

To answer your question about "is it relatively acceptable despite not strictly complying" - I think it is OK(personal opinion), but would try to also save it in another well used format (like TIFF) to ensure that if needed I can convert it to some other format in the future. With so many tools and libraries for dealing with TIFFs I believe it to be a safe bet that you will be able to still read a TIFF in 20 years, but I am not nearly so sure about PDF/A.
To unsubscribe from this group and stop receiving emails from it, send an email to digital-curati...@googlegroups.com.
To post to this group, send email to digital-...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Digital Curation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digital-curati...@googlegroups.com.
To post to this group, send email to digital-...@googlegroups.com.

kate...@umd.edu

unread,
Oct 21, 2016, 11:13:13 AM10/21/16
to Digital Curation
Hi Chelcie,

You can of course refry the PDF--either using the method Sara outlines in the thread, or exporting the PostScript file, and then piping it back through Distiller (which gives you a lot of control and visibility on what's happening to your color, structure, etc). Both methods can fix your /A compliance problem, but now you have a second-generation file that needs to be re-audited, and both methods introduce risks regarding transparency layers, color management, subsetting, etc., and that goes back to John's good question about the value-add and purposes of the conversion. 

FWIW, I've never personally found PDF/A conversion workflows to come out ahead in cost-benefit analysis (though I will happily take them when available), if the expense is raising barriers to entry for content creators and/or doing harder-to-scale work for a fairly good PDF (most of the red-flag issues I see with the doc are related to transparency layers, with a couple janky fonts on a couple pages), but that's just my perspective on things. 

On Thursday, October 20, 2016 at 1:58:34 PM UTC-4, Chelcie Juliet Rowell wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to digital-curati...@googlegroups.com.
To post to this group, send email to digital-...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Digital Curation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digital-curati...@googlegroups.com.
To post to this group, send email to digital-...@googlegroups.com.

Chelcie Juliet Rowell

unread,
Oct 21, 2016, 11:29:32 AM10/21/16
to digital-...@googlegroups.com
Thanks, Kate! The PostScript > Distiller method you outline decreases the errors reported by veraPDF to 4 metadata errors, which we can easily live with. We'll still want to consider the time trade-offs, but now we have a possible workflow that creates a more preservation-ready digital object with minimal burden on content creators.

Chelcie Juliet Rowell
Digital Initiatives Librarian
Z. Smith Reynolds Library
Wake Forest University
rowe...@wfu.edu | 336.758.5477
To unsubscribe from this group and stop receiving emails from it, send an email to digital-curation+unsubscribe@googlegroups.com.
To post to this group, send email to digital-curation@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages