Issue: PDFium parsing failure on corrupted PDF (works in WPS)

23 views
Skip to first unread message

llewelyn LUO

unread,
Mar 6, 2026, 3:26:58 PMMar 6
to pdfium

Dear PDFium Team,

I’m running into a PDF parsing failure with a ~10MB file that renders correctly in WPS Office, but fails in PDFium-based viewers (including Chrome and Edge).

Error
unexpected non-name key pdf.keyword(x=n7n5M) parsing dictionary

What I’m seeing / diagnosis

  • The file appears to contain a corrupted Object Stream and/or Pages Tree.

  • Attempting to “rebuild” the file with qpdf also fails, suggesting structural corruption rather than a simple xref mismatch.

  • From the error pattern, it looks like the parser ends up interpreting raw Flate (zlib) data as dictionary keys while reading an object stream (i.e., a Name object is expected, but binary data is encountered), even though the trailer startxref offset seems physically plausible.

qpdf output (excerpt)

qpdf --empty --pages FoodandDrinks_Extract_2.pdf -- recovered.pdf WARNING: ... object stream 28 ... unknown token while reading object; treating as string WARNING: ... expected dictionary key but found non-name object; inserting key /QPDFFake1 ... WARNING: ... too many errors; giving up on reading object WARNING: object 4 0: Pages tree includes non-dictionary object; ignoring qpdf: invalid vector subscript

Question
Some viewers (e.g., WPS) appear to handle this file, presumably by tolerating the broken xref/object stream and falling back to a more recovery-oriented strategy (such as scanning objects linearly).
Does PDFium expose any flags, build options, or API-level settings that enable a more lenient/recovery mode for malformed PDFs? If not, is there a recommended approach via the PDFium API to handle object stream corruption more gracefully (e.g., fail-soft behavior rather than aborting parsing)?

Note on sharing the file
Due to privacy concerns and the file size, I’m unable to share the full PDF. However, I can provide targeted artifacts if helpful (e.g., the header/trailer, the xref stream object, specific object numbers, or hex dumps around the reported offsets).

Any guidance would be greatly appreciated.

Best regards,

Lei Zhang

unread,
Mar 18, 2026, 4:46:42 PM (8 days ago) Mar 18
to llewelyn LUO, pdfium
Does Acrobat Reader and macOS Preview.app handle this PDF correctly?
In general, if a PDF is sufficiently malformed, there is no guarantee
that it will parse correctly.
> --
> You received this message because you are subscribed to the Google Groups "pdfium" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pdfium+un...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/pdfium/76ee8c85-e95d-4eff-9d0e-c239345ded6en%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages