Curious MalformedPDFError

19 views
Skip to first unread message

John Halloran

unread,
Dec 29, 2020, 5:55:42 AM12/29/20
to PDF::Reader
The pdf at:
generates a MalformedPDFError when I try to access the 'text' attribute of any page, even though it seems perfectly viewable via Acrobat Reader, and by Preview (Mac).

In fact, if I open it with Preview, and then export it to a different filename (without making any changes), the exported file seems then to be readable.

FWIW, this is not a major problem for me - I expect to get the information I need in some other way, but wanted to provide the example. The file is being generated by FINRA (www.finra.org) who create plenty of PDF's with valuable data, so if there is something to set them straight on, might be useful.

Thanks for a great project!

James Healy

unread,
Dec 30, 2020, 7:33:50 AM12/30/20
to pdf-r...@googlegroups.com
Thanks for sending this through.

I had a look at the sample file, and ruby's zlib bindings were
refusing to inflate some of the compressed data.

Interestingly though, the inflation works fine if I strip the final
byte from all compressed streams in that file (see
https://github.com/yob/pdf-reader/pull/341/files#diff-fc791ec29342f268860845a78fe29a4e64182317be39d93fd8c47cca19846cb4R21).
For your particular file, the extra byte is always 0x01.

I'm not familiar enough with the zlib format to know the root cause
here. Maybe the program that wrote that file added a garbage 0x01 byte
by accident?

In any case, my general rule is that if Adobe can parse a file and
pdf-reader can't, it's a pdf-reader bug. I've merged the above PR so
your files can be read.

James
> --
> You received this message because you are subscribed to the Google Groups "PDF::Reader" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pdf-reader+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pdf-reader/e097e9dd-9a6e-4ed9-aea2-97dbdaa8bca5n%40googlegroups.com.

John Halloran

unread,
Dec 30, 2020, 1:27:37 PM12/30/20
to PDF::Reader
Thanks so much for the quick turnaround! I will need to find some time to build from source and test, but it sounds like you've identified a straightforward issue.

Thanks again for your work on this project. Super useful!

Reply all
Reply to author
Forward
0 new messages