Sample code to load a pdf over network

95 views
Skip to first unread message

Sashrika Waidyarathna

unread,
Feb 21, 2024, 12:57:33 PMFeb 21
to pdfium
Hi, I am coming from React Native background. I have very little knowledge on c++ and objective-c. Currently I am trying to bridge pdffium ios code to react-native. 

Can someone help to find a sample code where I can use to load a pdf over network. I could not find a documentation other than https://pdfium.googlesource.com/pdfium/+/HEAD/docs/getting-started.md

Any help or guidance is appreciated.

Lei Zhang

unread,
Feb 21, 2024, 1:43:55 PMFeb 21
to Sashrika Waidyarathna, pdfium
PDFium doesn't do networking. It only takes PDF data as input and
processes them. How one gets data to feed into PDFium is probably a
question for another forum.
> --
> You received this message because you are subscribed to the Google Groups "pdfium" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pdfium+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pdfium/fba3b4f6-a7c9-4ec8-87bf-5f01de891a6bn%40googlegroups.com.

geisserml

unread,
Feb 21, 2024, 7:28:06 PMFeb 21
to pdfium
FWIW, the header in question is `fpdf_dataavail.h` (with API docs).
However, I agree that a sample demonstrating how to integrate these APIs might indeed be helpful.
I didn't delve into it in detail, but from skimming the docs it's also not clear to me what an actual implementation would look like.
Seems like there will be some non-trivial caller side network/cache logic to implement (and care needs to be taken with efficiency).
Anyway, I understand helping with this may be out of scope for pdfium team.

Lei Zhang

unread,
Feb 21, 2024, 7:40:02 PMFeb 21
to geisserml, pdfium
If the PDFium embedder is using FPDF_LoadMemDocument() and provides
the PDF data in its entirety, then I believe there's no need to
interact with the code in fpdf_dataavail.h. FPDFAvail_GetDocument() is
the more complex API. Here, PDFium asks the embedder whether data in a
particular byte range is available. How the embedder fetches the data
and makes it available is up to the embedder.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pdfium/4c691b15-57f6-4d9d-80c4-65d33651a9efn%40googlegroups.com.

Jeroen Bobbeldijk

unread,
Feb 22, 2024, 4:52:38 PMFeb 22
to pdfium
I have made an implementation in go-pdfium for the test suite: https://github.com/klippa-app/go-pdfium/blob/main/shared_tests/fpdf_dataavail.go
It's not a network implementation and not a direct C implementation but the idea is the same:

- Implement FX_FILEAVAIL that reports whether the requested data section is available
- Implement FX_DOWNLOADHINTS that will tell you what to download when
- Implement FPDF_FILEACCESS on something that allows access to the downloaded data
- Call FPDFAvail_IsDocAvail on the FX_FILEAVAIL / FX_DOWNLOADHINTS until the is either PDF_DATA_ERROR or PDF_DATA_AVAIL
- When it's PDF_DATA_AVAIL, call FPDFAvail_GetDocument to actually get a document handle
- When you want to open a page, call FPDFAvail_IsPageAvail until it returns either PDF_DATA_ERROR or
- When it's PDF_DATA_AVAIL, you can open the page like you normally would with FPDF_LoadPage

You can basically use this in 2 ways:
 - Ability to open/show/preview the PDF while the file is downloading
 - Use FX_DOWNLOADHINTS to only download the data sections that are required at that moment, for example when the file doesn't come from disk but from a resource that supports byte range requests (S3 storage for example).

geisserml

unread,
Feb 22, 2024, 6:06:32 PMFeb 22
to pdfium
Thanks for sharing this!

I've been wondering what the advantages are compared to FPDF_LoadCustomDocument(), but here's an attempt at a self-answer:
- Obviously, we can use download hints rather than blind forward downloading. (Esp. useful if we don't want to read from the first page onward, but move to a later page early on.)
- The non-blocking API design allows us to abort a pending operation, e.g. if the user is initially waiting on P but then switches to X, we don't have to wait for P to finish, but can proceed to X immediately.
Anything else?

Further, is downloading supposed to happen in the same thread, or in a separate thread? In the latter case, would we need any kind of mutexing?
I imagine, if the incoming spans being fed into the callback are small, that it could all happen on the same thread, but that we would need ability to interrupt the downloading of a span if they are larger?
Or would the downloading function let us plug in an interruption callback to periodically check in some internal loop, similar to the progressive rendering API provided by pdfium?

Lei Zhang

unread,
Feb 22, 2024, 6:22:28 PMFeb 22
to geisserml, pdfium
Downloads can happen on another thread, but the interactions with
PDFium have to stay single threaded.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pdfium/d8002f6a-4bde-483f-b2fd-bb85d68cf61bn%40googlegroups.com.

geisserml

unread,
Feb 22, 2024, 6:42:49 PMFeb 22
to pdfium
OK, so we would start a download thread that listens for incoming requests, and let AddSegment() make non-blocking requests to that thread.
Then we would have the thread accumulate info about the already downloaded sections for use with IsDataAvail().
I'd guess that we don't need any mutexing, as we only have 1 writer and 1 reader, and the separate thread does not call pdfium functions.
Does that seem right? Sorry, I'm rather new to threading ;)
Reply all
Reply to author
Forward
0 new messages