Join Digital Science for two RDM events in Philly/LA June 2nd and 4th

73 views
Skip to first unread message

Stacy Konkiel

unread,
May 25, 2015, 12:40:13 PM5/25/15
to rd...@asis.org, digital-...@googlegroups.com, c...@altmetric.com
* apologies for cross-posting *

Digital Science invites those in the Philadelphia and Los Angeles areas for two day-long events focused around technology trends in research data management. The event will specifically showcase outputs and collaboration amongst institutions. 

Scholarly communication is being rapidly transformed by innovations in digital technology and the events bring together experts and thought leaders in this space including: 

- Christine Borgman from the University of California 
- Mike Winkler and Lauren Gala from the University of Pennsylvania

At these events you will learn how to promote the successes of your institution, find first-rate collaborators and better manage all of the assets of the research process.

To find out more and to register for free visit:


Digital Science Showcase Philadelphia, Tuesday June 2nd at the University of Pennsylvania:
http://www.digital-science.com/events/digital-science-showcase-philadelphia/

Digital Science Showcase Los Angeles, Thursday June 4th at the Luxe Hotel Sunset Boulevard:

Jody L. DeRidder

unread,
Jun 5, 2015, 9:25:46 AM6/5/15
to digital-...@googlegroups.com

Hi --

    I expect others on this list have struggled with obtaining valid PDFs from contributors, so I came here to ask your advice.  We have mandatory ETDs submitted to us via ProQuest, for all those graduating.  Testing with FITS (File Information Tool Set, Harvard) tells me that I have around 500 PDFs so far that are either invalid or not well-formed.  And the software doesn't even try to validate the PDF/a versions.   (I am looking at options there too, if you have recommendations.)

I'm about to meet with the head of our graduate school about this issue, in the hopes that we can arrive at a solution that will ensure that I at least get valid, well-formed PDFs (and preferably PDF/a flavors) -- but I have not yet identified a software that will reliably do that.  I would like to recommend *something* that the grad school can require students to use, to generate their PDF prior to submission -- that will ALWAYS produce something well-formed.

  Does anyone here know of such a software?  If so, please advise.  Right now, I have poorly-formed or invalid PDFs from an amazing variety of creating software applications, including:

  • Adobe Acrobat PDFMaker 9.0 and 9.1 for Word
  • Adobe PDF Library 9.0, 9.3.2, 9.3.3
  • Adobe Acrobat Pro 9.0.0
  • Adobe Acrobat Distiller 9.0.0, 9.2.0, 9.3.0, 9.3.2, 9.3.3 
  • Mac OS X 10.5.6 Quartz PDF, 10.6.2
  • Microsoft Word 2007,
  • pdfTex-1.40.3/Tex (and 1.21a)
  • PScript5.dll Version 5.2.2
  • activePDF Server
  • iText 2.1.7

... and the list goes on.

Suggestions, anyone?  Thank you in advance!

 

---
Jody L. DeRidder Head, Digital Services University of Alabama Libraries Tuscaloosa, AL 35487 Phone: 205.348.0511 "Hope lies in dreams, in imagination, and in the courage of those who dare to make dreams into reality." --Jonas Salk

Miguel Ferreira

unread,
Jun 5, 2015, 10:52:49 AM6/5/15
to digital-...@googlegroups.com
Dear Jody,

I would suggest you pay attention to the ongoing project PREFORMA/VeraPDF lead by the Open Preservation Foundation - http://www.preforma-project.eu/pdfa-conformance-checker.html

The project aims develop a definitive conformance checking software for PDF/A-1, PDF/A-2 and PDF/A-3 under GPLv3 or later and MPLv2 or later open source licenses together with a substantial and enthusiastic community supporting and extending the model.

The project is now on phase 2.

Cumprimentos,
Miguel Ferreira

- -
Diretor executivo
KEEP SOLUTIONS, LDA.
Rua Rosalvo de Almeida, nº 5
4710-429 Braga, Portugal
W www.keep.pt E in...@keep.pt
T +351 253066735 F +351 253067248
> --
> You received this message because you are subscribed to the Google Groups "Digital Curation" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to digital-curati...@googlegroups.com.
> To post to this group, send email to digital-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/digital-curation.
> For more options, visit https://groups.google.com/d/optout.

Christie Peterson

unread,
Jun 5, 2015, 11:48:12 AM6/5/15
to digital-...@googlegroups.com
Hi Jody,

When we started taking exclusively ETDs a year ago, we decided that we would only accept PDFa. I'm not sure exactly what the workflow is, but I do know that the administrator does some kind of validation on all submissions and goes back to the submittor if it does not validate.

Here is the documentation he developed to guide students in creating their PDFa's, which focuses on using the software packages available at computer labs on our campus: http://guides.library.jhu.edu/etd/pdfa

I'd advise emailing David Reynolds (email in the link above) if you have specific questions.

Best,

Christie

Jody L. DeRidder

unread,
Jun 5, 2015, 12:20:15 PM6/5/15
to digital-...@googlegroups.com

Thank you, Christie!

 

---
Jody L. DeRidder Head, Digital Services University of Alabama Libraries Tuscaloosa, AL 35487 Phone: 205.348.0511 "Hope lies in dreams, in imagination, and in the courage of those who dare to make dreams into reality." --Jonas Salk

Andy J

unread,
Jun 5, 2015, 12:21:14 PM6/5/15
to digital-...@googlegroups.com, jo...@jodyderidder.com
Hello,

I believe FITS relies on JHOVE to perform PDF validation. However, JHOVE is not actually able to reliably validate PDF and PDF/A. The implementation was never entirely complete, and the fact that is has not been updated significantly for many years means that is has fallen well behind the current state of PDF. i.e. features introduced in versions > 1.6 (IIRC) will probably be marked as 'invalid' simply because they are unknown to JHOVE. See http://www.pdfa.org/2014/12/ensuring-long-term-access-pdf-validation-with-jhove/ for more details about PDF and my own investigation of JHOVE and PDF/A http://anjackson.net/keeping-codes/experiments/does-jhove-validate-pdfa-files.html

The deeper issue is that, for the general PDF format (i.e. not any of the ISO or otherwise standardised flavours), validation is essentially impossible. Not simply because there are no tools capable of doing it, but because there is no standard to validate against beyond Adobe's implementation itself. Barring bugs, the Adobe tools should all be considered reference implementations of PDF, and so if they successfully create a PDF that Adobe Reader can read, that's a close to 'validation' as you are ever going to get.

As PDF/A has a meaningful standard, validation can be attempted. Like Miguel, I recommend you keep an eye on the PREFORMA project, but in the meantime, the most trustworthy open source tool I know of is Apache Preflight (see http://openpreservation.org/blog/2012/12/19/identification-pdf-preservation-risks-apache-preflight-first-impression/).

Hope that helps.

Andy Jackson

Jody L. DeRidder

unread,
Jun 5, 2015, 12:32:12 PM6/5/15
to digital-...@googlegroups.com

Thank you, Andy and Miguel.  How distressing.

 

---
Jody L. DeRidder Head, Digital Services University of Alabama Libraries Tuscaloosa, AL 35487 Phone: 205.348.0511 "Hope lies in dreams, in imagination, and in the courage of those who dare to make dreams into reality." --Jonas Salk
--

Alex Garnett

unread,
Jun 5, 2015, 12:40:49 PM6/5/15
to digital-...@googlegroups.com, jo...@jodyderidder.com
For what it's worth, it's reasonably easy to batch convert a bunch of PDFs to PDF/A using ghostscript if you have some (small) knowledge of bash scripting (and an OSX/Linux machine handy, as it's kind of a pain to get working on Windows). I did it for five years of our institution's backlog and I think the conversion failure rate was only about 1 in 1000, and it's easier to do it on our end than to try to teach students to do it.

This is the ghostscript syntax I used:

gs -dPDFA -dNOOUTERSAVE -dUseCIEColor -sProcessColorModel=DeviceRGB -sDEVICE=pdfwrite -o $2 -dPDFACompatibilityPolicy=1 /usr/share/ghostscript/9.10/lib/PDFA_def.ps $1

(you may have to change the location of PDFA_def.ps on a different system)

Hope that helps!

Jody L. DeRidder

unread,
Jun 5, 2015, 12:55:40 PM6/5/15
to digital-...@googlegroups.com, Alex Garnett

Alex, this is wonderful!  Yes, I work on a SUSE Linux server most of the time.  Thanks so much!

 

---
Jody L. DeRidder Head, Digital Services University of Alabama Libraries Tuscaloosa, AL 35487 Phone: 205.348.0511 "Hope lies in dreams, in imagination, and in the courage of those who dare to make dreams into reality." --Jonas Salk
Reply all
Reply to author
Forward
0 new messages