Got PDF::Reader::MalformedPDFError while reading pdf

Ajay Anarse

unread,

Sep 16, 2019, 1:47:39 AM9/16/19

to PDF::Reader

Hi Guys,

I am getting error MalformedPDFError while reading pdf and the error message is xref table not found at offset 248410 (\u009C\u0087e\u009A¯$ != xref) . The pdf can be opened in Adobe Reader.

Please find the attached pdf for the reference and please let me known the way to handle this.

Thanks,

Ajay.

encoding_issue.pdf

James Healy

unread,

Sep 17, 2019, 9:04:32 AM9/17/19

to pdf-r...@googlegroups.com

I suspect this file is a bit non-compliant with the spec, and Adobe
Reader is doing a better job and compensating.

The trailer of the file suggests there's an additional xref table
stored in a xref stream at byte offset 248405 (try opening the file in
a text editor and searching for 248405).

However, there's no xref stream at byte offset 248405.

if I comment out the line that reads xref streams in pdf-reader, then
the file can be processed:

diff --git a/lib/pdf/reader/xref.rb b/lib/pdf/reader/xref.rb
index 9e6a56c..5e91ce3 100644
--- a/lib/pdf/reader/xref.rb
+++ b/lib/pdf/reader/xref.rb
@@ -145,7 +145,7 @@ class PDF::Reader
raise MalformedPDFError, "PDF malformed, trailer should
be a dictionary"
end

- load_offsets(trailer[:XRefStm]) if trailer.has_key?(:XRefStm)
+ #load_offsets(trailer[:XRefStm]) if trailer.has_key?(:XRefStm)
load_offsets(trailer[:Prev].to_i) if trailer.has_key?(:Prev)

trailer

I'd be happy to merge a PR attempts to continue reading the PDF if no
xref stream is found at the offset. It'd need a spec based on a real
PDF, but I can add that to the PR if that's helpful.

James

> --
> You received this message because you are subscribed to the Google Groups "PDF::Reader" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pdf-reader+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pdf-reader/9357f168-2d26-40c8-b9d4-d40b55eb60ed%40googlegroups.com.

Jon Kern

unread,

Sep 19, 2019, 1:46:53 PM9/19/19

to pdf-r...@googlegroups.com

I just want to say how much I appreciate your attention to the PDF-READER community, James.

THANK YOU!

Jon Kern

linkedIn: https://www.linkedin.com/in/jonkern/

blog: http://technicaldebt.com

twitter: http://twitter.com/JonKernPA

To view this discussion on the web visit https://groups.google.com/d/msgid/pdf-reader/CAE4DO2RYfcF9AHgvjjOR5Cum8t8Yqhn-wMA5AQ%2B4Fz%2BPen5aNw%40mail.gmail.com.

Reply all

Reply to author

Forward