PDFs pages not showing in pdf.js viewer

341 views
Skip to first unread message

kstr...@wellesley.edu

unread,
Sep 1, 2021, 3:17:16 PM9/1/21
to islandora
Hi all,

Early on in our Digital Collections program (c. 2008-2010), we had a large number of publications digitized at the Internet Archive. We later collected the PDFs of these publications and added them to our repository. 

A few weeks ago, we noticed that these PDFs are rendering blank pages if there is text on the page. If the original page is blank or is an image, it still shows up, which suggests to us that the OCR is the problem. The metadata shows that LuraDocument v2.28 (or sometimes a later version) is what was used to create the PDF. If we re-enable the OCR, and reingest the PDF, then the pages with text show up.

This is a good example of the problem, if you scroll through the first few pages: https://repository.wellesley.edu/object/wellesley608

We see it on Mac and PC, in Chrome, Firefox, Edge, but not Safari. (However, Safari only updates with an OS update, and we have been asked not to update beyond Mojave for the time being). We haven't been able to check it on a more recent Mac. We are not sure if it's a browser issue or, the PDF viewer, or Islandora.

Does anyone have PDFs from the Internet Archive in their repository, and notice this problem? Or has anyone used LuraDocument to make PDFs, and now sees this issue?

We are hoping there is a solution that does not include re-enabling OCR on 2000+ PDFs and reingesting them!

Thanks for any advice or feedback you may have.

Kara

Graham, Clinton T

unread,
Sep 1, 2021, 3:44:11 PM9/1/21
to isla...@googlegroups.com

A quick look suggests this is a failure of pdf.js to render the document; a current version of pdf.js successfully rendered one of the failed pages in my (very limited) testing.

 

Enjoy,

 

- Clinton Graham

Systems Development Lead

University of Pittsburgh | University Library System

412-383-1057

--
For more information about using this group, please read our Listserv Guidelines: http://islandora.ca/content/welcome-islandora-listserv
---
You received this message because you are subscribed to the Google Groups "islandora" group.
To unsubscribe from this group and stop receiving emails from it, send an email to islandora+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/islandora/3e0f477b-132b-44b8-8223-1207f86a33a5n%40googlegroups.com.

Kara Hart

unread,
Sep 2, 2021, 8:12:46 AM9/2/21
to isla...@googlegroups.com
Thank you Clinton!  We've passed this info onto our hosting provider, who is looking into whether we have the latest version of the viewer and if an update will fix our problem.  
Kara

~~~~~ Kara S. Hart 

Systems Librarian - Library & Technology Services 
Wellesley College


You received this message because you are subscribed to a topic in the Google Groups "islandora" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/islandora/VBsBJq6vegk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to islandora+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/islandora/SN6PR04MB489626D6E7FCF86075AB1735BECD9%40SN6PR04MB4896.namprd04.prod.outlook.com.
Reply all
Reply to author
Forward
0 new messages