I read on Github that if I get a MalformedPDFError to send my file to the maintainers of the gem via Google Groups.
I tried passing this file (attached) I have but I keep getting this error: PDF::Reader::MalformedPDFError: xref table not found at offset 46293 (R != xref)
This happens for the master branch and also the latest stable branch :(. Please advice on how I can parse the PDF.
Thanks lots!
Regards,
Su Yuen
On 16 January 2012 05:16, Su Yuen Chin <suy...@gmail.com> wrote:
> I read on Github that if I get a MalformedPDFError to send my file to the maintainers of the gem via Google Groups.
>
> I tried passing this file (attached) I have but I keep getting this error: PDF::Reader::MalformedPDFError: xref table not found at offset 46293 (R != xref)
>
> This happens for the master branch and also the latest stable branch :(. Please advice on how I can parse the PDF.
If you open the sample file in a text editor you'll see there's some
html-ish garbage at the top of the file that is confusing pdf-reader.
Other pdf reading apps probably have some smarts to handle this sort
of scenario, but pdf-reader is a little dumb.
I suggest re-saving the file and trying again.
cheers
James
On 17 January 2012 05:04, Su Yuen, Chin <suy...@gmail.com> wrote:
> I opened the file in a text editor and saw <html> and <head></head>.
> Are those the HTML garbage? When I removed it and saved it, the PDF is
> a blank PDF in the viewer.
>
> Sorry, not familiar with PDF standards/structures so a bit lost on
> what is HTML garbage. Hope you can point me in the right direction.
The byte offsets in a PDF file must be exact - just removing the HTML
may not be enough to fix the file. Can you re-save the file from it's
original source?
A PDF file should always start with "%PDF".
James
On 17 January 2012 14:52, Su Yuen, Chin <suy...@gmail.com> wrote:
> Hmm this means that I may have to deduct the number of bytes that
> <html><head></head> is taking up from all the offset values in the
> PDF?
You could try patching the XRef#load_xref_table method to detect when
the xref table is offset by a few bytes and then add the same number
of bytes to all object offsets.
It might solve a particular case of corruption like what you're seeing
and I'd consider a patch provided it doesn't break anything in the
test suite.
James