Dear PDFium Team,
I’m running into a PDF parsing failure with a ~10MB file that renders correctly in WPS Office, but fails in PDFium-based viewers (including Chrome and Edge).
Error
unexpected non-name key pdf.keyword(x=n7n5M) parsing dictionary
What I’m seeing / diagnosis
The file appears to contain a corrupted Object Stream and/or Pages Tree.
Attempting to “rebuild” the file with qpdf also fails, suggesting structural corruption rather than a simple xref mismatch.
From the error pattern, it looks like the parser ends up interpreting raw Flate (zlib) data as dictionary keys while reading an object stream (i.e., a Name object is expected, but binary data is encountered), even though the trailer startxref offset seems physically plausible.
qpdf output (excerpt)
Question
Some viewers (e.g., WPS) appear to handle this file, presumably by tolerating the broken xref/object stream and falling back to a more recovery-oriented strategy (such as scanning objects linearly).
Does PDFium expose any flags, build options, or API-level settings that enable a more lenient/recovery mode for malformed PDFs? If not, is there a recommended approach via the PDFium API to handle object stream corruption more gracefully (e.g., fail-soft behavior rather than aborting parsing)?
Note on sharing the file
Due to privacy concerns and the file size, I’m unable to share the full PDF. However, I can provide targeted artifacts if helpful (e.g., the header/trailer, the xref stream object, specific object numbers, or hex dumps around the reported offsets).
Any guidance would be greatly appreciated.
Best regards,