Modifying PDF

Paul Slocum (SOFTOFT TECHECH)

unread,

Jun 24, 2012, 10:29:19 AM6/24/12

to pdfhummus-in...@googlegroups.com

I need to modify objects in a PDF, and I'm using PDFWriter's parser to find the objects that I want to modify, which works great. My current plan is to write my own simple code to append the PDF with a new xref table and trailer, and write the new modified objects by grabbing the object number, version, and dictionary contents with the PDFWriter parser.

However, since PDFWriter already does most of what is needed to append a PDF, I wonder what it would take to modify the library to do this? Seems like I'd have to be able to offset the xref table pointers, add a /Prev entry, and prevent creation of header and anything else I don't need (like catalog)? Then I could write the modified objects with PDFWriter and append the output to the original PDF. I appreciate any feedback on this.

Gal Kahana

unread,

Jun 27, 2012, 3:48:14 AM6/27/12

to pdfhummus-in...@googlegroups.com

This makes a lot of sense. thinking you might need to also add some versioning component to the objects registry.

can you describe the usage, though?

i mean, why would you want to modify an existing PDF, using the incremental changes support of PDF, instead of just writing the PDF anew by importing the existing PDF to a new one?

Gal.

Paul Slocum

unread,

Jun 27, 2012, 10:03:52 AM6/27/12

to pdfhummus-in...@googlegroups.com

can you describe the usage, though?
i mean, why would you want to modify an existing PDF, using the incremental changes support of PDF, instead of just writing the PDF anew by importing the existing PDF to a new one?

I plan to use it to add graphics and notations, and also to fill interactive forms. Maybe I'm being way too careful, but I think that appending the PDF is the most robust method. If I import the PDF into PDFWriter, then any oddities in the PDF are much more likely to be a showstopper since your importer has to parse and restructure most of the PDF. I'm also worried that I may lose some metadata in the PDF file that users may need, or that in the case of large files, the importation could be slow? Before I started using your parser, I wrote my own parser and when I was testing it against 500 random PDFs that I downloaded from the internet, and there were so many weird little problems and oddities in the files that my parser choked on! So my philosophy right now is to touch the original file as little as possible.

-paul

Gal Kahana

unread,

Jun 27, 2012, 2:04:57 PM6/27/12

to pdfhummus-in...@googlegroups.com, paul....@gmail.com

i see.

yeah. makes sense. the lib parser primary concern is with pages & graphics. it is probably losing elements with embedding, sure.

understood.

Gal.

Paul Slocum (SOFTOFT TECHECH)

unread,

Jul 4, 2012, 2:03:40 PM7/4/12

to pdfhummus-in...@googlegroups.com, paul....@gmail.com

Question: is there currently a way to take a PDFObject* that I got from the parser and write it into a PDF using PDFWriter?

-paul

Gal Kahana

unread,

Jul 4, 2012, 2:16:04 PM7/4/12

to pdfhummus-in...@googlegroups.com, paul....@gmail.com

try working with the PDFDocumentCopyingContext.

it has a

EStatusCodeAndObjectIDTypeList CopyDirectObject(PDFObject* inObject);

that copies direct object.

you can also use

EStatusCodeAndObjectIDType CopyObject(ObjectIDType inSourceObjectID);

which copies an object according to an ID.

"copying" is writing, essentially, and it's recursive [unless as stated otherwise in the comment for CopyDirectObject]

HTH,

Gal.

Paul Slocum (SOFTOFT TECHECH)

unread,

Jul 10, 2012, 1:05:13 AM7/10/12

to pdfhummus-in...@googlegroups.com, paul....@gmail.com

Thanks that's helpful, although I decided it was actually simpler just to write my own code to write to the PDF, especially since I'm a little weak at C++. I'm using your parser to get and modify the objects that I want, and then using my own code to write them out along with an xref table/stream. Working pretty good so far!

-paul

Reply all

Reply to author

Forward