Fileaccess lifetime with FPDFImageObj_LoadJpegFile()

14 views
Skip to first unread message

geisserml

unread,
May 8, 2024, 6:00:36 PMMay 8
to pdfium
For which scope does the file access to `FPDFImageObj_LoadJpegFile` have to be kept alive? Presumably until the PDF is saved/closed?

The thing is, when adding many images to a PDF, we want neither load them all into memory as with `FPDFImageObj_LoadJpegFileInline`, nor have a high number of file handles open at the same time, given OS limits. So ideally, we'd like to open the files only for the time they are needed while saving, and then close again.

Is this possible with the current state of the API?
Is the reading process linear/predictable, so we could open the file on initial read, and close once the end is reached or something?

geisserml

unread,
May 8, 2024, 6:11:05 PMMay 8
to pdfium
OK, so I added some debug prints into pypdfium2 and ran its imgtopdf CLI, which yielded the following output:
```
a.jpg
before load
0 8192
0 5478889
after load

b.jpg
before load
0 8192
0 4896699
after load

c.jpg
before load
0 8192
0 8947281
after load

before save
0 5478889
0 4896699
0 8947281
after save
```

geisserml

unread,
May 8, 2024, 6:25:13 PMMay 8
to pdfium
So a possible idea might be to open/close before/after load, and subsequently open on callback if closed, and close if pos+size == end.

Lei Zhang

unread,
May 8, 2024, 6:27:57 PMMay 8
to geisserml, pdfium
You'll have to confirm this since I'm just eyeballing here, but I
think it should live as long as the associated FPDF_PAGEOBJECT
image_object. Once the FPDF_PAGEOBJECT is gone, then the file access
object is no longer needed.
> --
> You received this message because you are subscribed to the Google Groups "pdfium" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pdfium+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pdfium/c165e969-290d-40b6-9eb2-6abcd157c9een%40googlegroups.com.

geisserml

unread,
May 8, 2024, 6:36:01 PMMay 8
to pdfium
I doubt it, since callbacks are made during saving, when the image pageobject is already closed (see output from 2nd post)

geisserml

unread,
May 8, 2024, 6:40:23 PMMay 8
to pdfium
Respectively, the pageobject is owned by pdfium in the first place, because it is part of a page, and thus not actually closed by the caller (see docs for FPDFPageObj_Destroy).

Lei Zhang

unread,
May 8, 2024, 6:58:03 PMMay 8
to geisserml, pdfium
Yes, what I meant is when the FPDF_PAGEOBJECT is destroyed, and not
just when the PDFium embedder releases its handle. Unless the PDFium
embedder is explicitly destroying the FPDF_PAGEOBJECT, the only way to
guarantee it is destroyed is to close the FPDF_DOCUMENT.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pdfium/a9c4de79-8deb-4d63-b520-3944b1dbbf42n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages