Got PDF::Reader::MalformedPDFError while reading pdf

63 views
Skip to first unread message

Ajay Anarse

unread,
Sep 16, 2019, 1:47:39 AM9/16/19
to PDF::Reader
Hi Guys,

     I am getting error MalformedPDFError while reading pdf and the error message is xref table not found at offset 248410 (\u009C\u0087e\u009A¯$ != xref) . The pdf can be opened in Adobe Reader.
     
     Please find the attached pdf for the reference and please let me known the way to handle this.


Thanks,
Ajay.
encoding_issue.pdf

James Healy

unread,
Sep 17, 2019, 9:04:32 AM9/17/19
to pdf-r...@googlegroups.com
I suspect this file is a bit non-compliant with the spec, and Adobe
Reader is doing a better job and compensating.

The trailer of the file suggests there's an additional xref table
stored in a xref stream at byte offset 248405 (try opening the file in
a text editor and searching for 248405).

However, there's no xref stream at byte offset 248405.

if I comment out the line that reads xref streams in pdf-reader, then
the file can be processed:

diff --git a/lib/pdf/reader/xref.rb b/lib/pdf/reader/xref.rb
index 9e6a56c..5e91ce3 100644
--- a/lib/pdf/reader/xref.rb
+++ b/lib/pdf/reader/xref.rb
@@ -145,7 +145,7 @@ class PDF::Reader
raise MalformedPDFError, "PDF malformed, trailer should
be a dictionary"
end

- load_offsets(trailer[:XRefStm]) if trailer.has_key?(:XRefStm)
+ #load_offsets(trailer[:XRefStm]) if trailer.has_key?(:XRefStm)
load_offsets(trailer[:Prev].to_i) if trailer.has_key?(:Prev)

trailer

I'd be happy to merge a PR attempts to continue reading the PDF if no
xref stream is found at the offset. It'd need a spec based on a real
PDF, but I can add that to the PR if that's helpful.

James
> --
> You received this message because you are subscribed to the Google Groups "PDF::Reader" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pdf-reader+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pdf-reader/9357f168-2d26-40c8-b9d4-d40b55eb60ed%40googlegroups.com.

Jon Kern

unread,
Sep 19, 2019, 1:46:53 PM9/19/19
to pdf-r...@googlegroups.com
I just want to say how much I appreciate your attention to the PDF-READER community, James.

Reply all
Reply to author
Forward
0 new messages