Writing a HUGE pdf file (page by page if possible)

gsoucy

unread,

Jul 10, 2012, 12:44:49 PM7/10/12

to lib...@googlegroups.com

Hello,

I need to write a huge pdf file ( > 500 MB) and I would like to write it to disk page by page rather then creating the whole file in memory and then saving to disk.

After going over the API, I dont see how I can do that. Is it possible with libHaru ?

Thanks

Gilbert

Bill Horger

unread,

Jul 10, 2012, 1:24:05 PM7/10/12

to lib...@googlegroups.com

I'm not aware of that capability.

Why are you so concerned about the size? Is the size likely to be so large it won't fit in memory? If so, then it obviously takes a large amount of resources to process the data. Would it be better to break the document into manageable pieces, perhaps avoiding a reprocessing step in the event the application had a problem?

Also, be aware that of you just keep adding pages, the library will stop with an error when up get to 8192 pages anyway.

Bill

--
---
libHaru.org development mailing list
To unsubscribe, send email to libharu-u...@googlegroups.com

Paul Harris

unread,

Jul 10, 2012, 8:55:58 PM7/10/12

to lib...@googlegroups.com

I suppose it would be technically possible, as the PDF format is designed to be appended to... quote:

http://www.mactech.com/articles/mactech/Vol.15/15.09/PDFIntro/index.html

The Incremental Update Mechanism

The trailer, it turns out, plays an important role in the way PDF implements incremental updating. The key concept to understand here is that a PDF file is never overwritten, only added to. That goes for all portions of the PDF file - even the trailer itself, and the end-of-file marker. In other words, a multiply-updated PDF document may contain multiple trailers - and multiple end-of-file markers! (There may be numerous occurrences of %%EOF.) Each time the file is edited, an addendum is written to the tail of the file, consisting of the content objects that have changed, a new xref section, and a new trailer containing all the information that was in the previous trailer, as well as a /Prev key specifying the byte offset (from the beginning of the file) of the previous xref section. The cross-reference info will then be distributed across more than one xref section. To access all of the cross-references, the reader must walk the list of /Prev keys in all the trailers, in reverse order.

Thats the incremental feature.

Back to more standard usage, you would also imagine that libharu could potentially just write each page's content as it goes, and just remember what needs to be in the footer for when it finally writes the end of the file.