Performance issue occurs while opening the attached PDF document

62 views
Skip to first unread message

CHINNAMUNIA KARTHIK C

unread,
Jun 4, 2026, 4:56:14 AM (9 days ago) Jun 4
to pdfium
Hi Team,

We are experiencing a performance issue when opening the attached PDF document. We have shared both the sample and the PDF file for your reference.

Sample Steps:
1. Run the sample. Performance.zip
2. Download and Load the PDF document using the Choose File option. (https://drive.google.com/file/d/1lB7laAIjizgkghTZUXabk1Wvo7o5ZLAX/view?usp=drive_link)
3. Click the RenderPDFPage button.
4. Observe the console output for:
○ Image rendering time
○ Image conversion time

We need suggestions to improve performance.

Observations & Queries:
• The PDF document size is approximately 30 MB, but the generated image size ranges between 200 MB and 400 MB per page.
→ How can we reduce the generated image size?
• Currently, we render images using the  tag.
→ Is there a way to obtain a Base64 string directly from PDFium instead of byte array?
• Is there any approach to reduce the image rendering time and improve overall performance?

We appreciate your guidance on optimizing this process.

geisserml

unread,
Jun 5, 2026, 5:39:12 AM (8 days ago) Jun 5
to pdfium
I can reproduce that rendering that file with a typical scale, like 3 (approx. 215 DPI), is very slow.
However, the pages are also oversized. In this file, 1px maps to 1 canvas unit, i.e. 72dpi. That means a scale of 1 is more appropriate here, reducing bitmap size and render time.
If you generated that file e.g. using img2pdf, try setting more appropriate DPI values so usual scales would work.
PS: As this PDF only consists of JPEG images, `pypdfium2 extract-images 30mb-merged.pdf -o out/` works pretty much instantly, and after removing duplicate images, gives the same 32.9MiB as the input file.

geisserml

unread,
Jun 5, 2026, 5:59:41 AM (8 days ago) Jun 5
to pdfium
To add some more detail, most images in this PDF have large dimensions like 8200x3700 or 6000x4000.
Logically, decoding and re-encoding large images takes longer than small images.
And as said, PDF and image size values match, so you will definitely not want to upscale.

Lei Zhang

unread,
Jun 5, 2026, 1:55:38 PM (8 days ago) Jun 5
to CHINNAMUNIA KARTHIK C, pdfium
On Thu, Jun 4, 2026 at 1:56 AM CHINNAMUNIA KARTHIK C
<chinnamuniak...@gmail.com> wrote:
> Sample Steps:
> 1. Run the sample. Performance.zip

This file contains HTML, JS, and a WASM blob. PDFium is a C++ library,
and PDFium developers cannot magically decipher the WASM blob and
solve the performance issues.

> We need suggestions to improve performance.
>
> Observations & Queries:
> • The PDF document size is approximately 30 MB, but the generated image size ranges between 200 MB and 400 MB per page.

Does your sample code decode to PNG? If you are decoding JPG to PNG,
then those size increases sound plausible. This is just the reality of
how PNG works. Not sure what you expect PDFium to do about it.

> → How can we reduce the generated image size?

Have you tried to just extract the embedded images instead?

> → Is there a way to obtain a Base64 string directly from PDFium instead of byte array?

No. PDFium is not in the business of providing a base64 encoder.

geisserml

unread,
Jun 5, 2026, 3:04:48 PM (8 days ago) Jun 5
to pdfium
Thanks for the more answers, Lei. 

I hadn't looked at the OP's .zip before, just tested from pypdfium2 and noticed rendering was indeed somewhat slow depending on the scale set.
Now having just tried the HTML page, does the code even work? After choosing a file and clicking RenderPage, nothing seems to happen and I don't see anything in console...

@chinnamuniak: How did you obtain the WASM binary? Is it from bblanchon/pdfium-binaries or paulocoutinhox/pdfium-lib? Or maybe self-built?

geisserml

unread,
Jun 5, 2026, 3:21:14 PM (8 days ago) Jun 5
to pdfium
Ah, Firefox showed some more info in console. It seems that my browsers just refuse to load the worker.js for security reasons.
I suppose the code would work when whitelisting the worker...

CHINNAMUNIA KARTHIK C

unread,
Jun 10, 2026, 8:20:30 AM (3 days ago) Jun 10
to pdfium

Thanks for the response.

 

I am trying to render images inside an HTML <img> tag by converting PDFium output (byte arrays) into Base64. However, this approach is causing significant delays.

 

During analysis, I observed that PDFium returns image byte arrays of approximately 200–400 MB per page, even though the original PDF document size is only 30 MB. This results in heavy memory usage and slow Base64 conversion.

 

I have the following questions:

 

  1. Is it possible to obtain Base64 data directly from PDFium, instead of manually converting the byte array?
  1. I am currently using bblanchon/pdfium-binaries to generate pdfium.wasm.
  2. How can I extract embedded images from the PDF and render them efficiently inside an HTML <img> tag?

 

Any guidance, best practices, or examples would be greatly appreciated.

Reply all
Reply to author
Forward
0 new messages