Subject: Request to Restore PDF Object Stream Compression Saving Functionality

47 views
Skip to first unread message

尼古拉特斯拉

unread,
Jan 27, 2026, 4:57:55 AMJan 27
to pdfium


Dear PDFium Team,

I hope this message finds you well.

I'm writing to request clarification and assistance regarding a feature inconsistency I've observed in PDFium while working on PDF document processing.

Issue Description:

Upon reviewing the historical codebase of PDFium, I discovered that older versions had implemented complete PDF object stream compression functionality, including:

1. `CPDF_ObjectStream` class - For compressing multiple PDF objects into a single object stream
2. `CPDF_XRefStream` class - For implementing cross-reference streams (XRef streams)
3. Complete compression logic - Including `CompressIndirectObject()` methods, Flate encoding compression, etc.

However, in the current PDFium codebase (particularly in `core/fpdfapi/edit/cpdf_creator.cpp`), I found that:
- The `CPDF_Creator` class only implements traditional PDF generation methods
- Object stream compression functionality is missing
- Cross-reference tables are generated in traditional format rather than stream format

Feature Comparison:

| Feature | Old Version Support | Current Version Support |
|---------|-------------------|------------------------|
| Object Stream (ObjStm) Compression | ✅ | ❌ |
| Cross-Reference Stream (XRefStm) | ✅ | ❌ |
| PDF 1.5+ Compression Features | ✅ | ❌ |
| Object Stream for Incremental Updates | ✅ | ❌ |

Specific Code Evidence:

In the old version, these key features existed:
```cpp
// Object stream compression in old version
class CPDF_ObjectStream {
public:
    FX_INT32 CompressIndirectObject(FX_DWORD dwObjNum, const CPDF_Object pObj);
    FX_FILESIZE End(CPDF_Creator pCreator);
    // ... other methods
};

class CPDF_XRefStream {
public:
    FX_INT32 CompressIndirectObject(FX_DWORD dwObjNum, const CPDF_Object pObj, CPDF_Creator pCreator);
    FX_BOOL GenerateXRefStream(CPDF_Creator pCreator, FX_BOOL bEOF);
    // ... other methods
};
```

But in the current version, `CPDF_Creator` only has basic writing functionality:
```cpp
bool CPDF_Creator::WriteIndirectObj(uint32_t objnum, const CPDF_Object pObj) {
    // Direct writing, no compression
    if (!pObj->WriteTo(m_Archive.get(), encryptor.get()))
        return false;
    // ... No object stream compression logic
}
```

Why This Feature is Important:

1. File Size Optimization: Object stream compression can significantly reduce PDF file size, especially for documents containing many small objects
2. PDF Standard Compliance: Object streams are part of the PDF 1.5+ standard and are supported by many modern PDF readers
3. Performance Improvement: Compressed PDFs load and transfer faster
4. Professional Requirements: Certain use cases (such as forms with many fields, documents with numerous small images) require this feature

My Questions:

1. Reason for Removal: Why was this useful feature removed from the current version? Was it due to technical reasons or design decisions?
2. Restoration Plan: Is there a plan to restore this functionality in future versions?
3. Alternative Solutions: If there's no restoration plan, are there other ways to achieve similar object compression effects?
4. Migration Path: If we need to implement this ourselves, are there API guidelines or code examples we could reference?

Use Case Example:

In my project, I need to process PDF documents containing many small objects (such as batch-generated reports, documents with numerous form fields, etc.). Using object stream compression would:
- Reduce file size by 30-50%
- Improve network transfer efficiency
- Lower storage costs

Request:

Could you please:
1. Clarify the status of object stream compression functionality?
2. If possible, provide a plan or timeline for restoring this feature?
3. Or provide guidance on implementing similar functionality?



Thank you for your time and assistance!

Best regards,

Lei Zhang

unread,
Jan 27, 2026, 8:14:18 PMJan 27
to 尼古拉特斯拉, pdfium
On Tue, Jan 27, 2026 at 1:58 AM 尼古拉特斯拉 <20021...@gmail.com> wrote:
> My Questions:
>
> 1. Reason for Removal: Why was this useful feature removed from the current version? Was it due to technical reasons or design decisions?

Hi,

You can look in the git history to answer this question yourself, but
to save you time, https://pdfium-review.googlesource.com/5491 deleted
the code in question. The commit description explains why.

> 2. Restoration Plan: Is there a plan to restore this functionality in future versions?

No such plans as of now. You can file a feature request at
https://crbug.com/pdfium/new, but there's no guarantee anyone is going
to get to this in the near future.

> 3. Alternative Solutions: If there's no restoration plan, are there other ways to achieve similar object compression effects?

Run the output PDF through some other PDF optimizer?

> 4. Migration Path: If we need to implement this ourselves, are there API guidelines or code examples we could reference?

- See https://pdfium.googlesource.com/pdfium/+/main/CONTRIBUTING.md
for general contribution guidelines.
- There's no new API here. Probably just add an additional parameter
to FPDF_SaveAsCopy().
- Wouldn't the deleted code serve as a good example in this case? It
probably won't work out of the box in modern PDFium though, as the
code base has changed.
Reply all
Reply to author
Forward
0 new messages