How to detect damaged PDF file?

147 views
Skip to first unread message

Dmitry Goryachev

unread,
Dec 7, 2020, 8:15:02 AM12/7/20
to pdfium

Hello,

I have a corrupted PDF that renders via PDFium without any exceptions. The resulting image contains unreadable text.

Is it possible to somehow configure PDFium to throw an exception for such files? Or maybe there is another way to identify the damaged file using PDFium?

Thanks,
Dmitry
pdfcrash.pdf
out.png

Olivia Yingst

unread,
Dec 7, 2020, 2:12:53 PM12/7/20
to Dmitry Goryachev, pdfium
Hi Dmitry,

I tested opening this file in Acrobat, it gave me a warning that there was an error in the file and it was rendered blank. Acrobat didn't indicate what exactly was the error.
When opening this PDF file in Okular or Zathura, the signature, name, date, the rectangle and the underline are rendered, but without the unreadable text. No warning message was provided with these two applications.
This PDF can be rendered nicely on mac's preview, with no error/warning message prompted. 

Judging by the behavior of the viewers' behaviors mentioned above, I think a crash message might not be necessary. However, PDFium can be improved with the one of following options:
1. Render this PDF nicely just like Preview. (If the error with this PDF is not a huge concern.)
2. Not rendering the unreadable text at all (If the error is with text rendering, and should not be tolerated by PDF reference's standard.).

On that note, is it OK that you create a PDFium bug (https://bugs.chromium.org/p/pdfium/issues/entry) ? I noticed there is some information on the PDF, which might be sensitive. Is it possible that you can generate a PDF with the similar issue but without the information? If that option is not available, we can see whether creating a new PDF file that represents the same issue is possible.

Thanks,
Olivia

Dmitry Goryachev

unread,
Dec 10, 2020, 10:23:52 AM12/10/20
to pdfium
Hi Olivia,

Thank you for your investigation.
It would be great if this PDF can be rendered without data loss and exceptions. But in choosing between displaying / not displaying unreadable text and an exception, we would prefer an exception.  Our SDK uses PDFium to process a large number of files and it is not always possible to check the result. In this case, silent data loss can be critical.

Unfortunately, I cannot generate a similar PDF.

Thanks,
Dmitry

понедельник, 7 декабря 2020 г. в 22:12:53 UTC+3, Olivia Yingst:
Reply all
Reply to author
Forward
0 new messages