Optimize PDF by removing duplicate image resources

889 views
Skip to first unread message

Support

unread,
May 30, 2013, 6:23:22 PM5/30/13
to pdfne...@googlegroups.com
Q:

Our company has a requirement to have a large image inside a PDF that will be duplicated to say 50,000 PDF files. We have been told that your tool can use that single image as a resource with all PDF files referencing it there by reducing the final size of each PDF.

------------
A:
 
First, you need to extract the image from the source PDF. You can check out the ImageExtract project on how to do this: http://www.pdftron.com/pdfnet/samplecode.html#ImageExtract 

You can then use the image as the same PDF resource to be used to embed into other PDFs. If you take a look at the first code sample in ElementBuilder test

sample project (http://www.pdftron.com/pdfnet/samplecode.html#ElementBuilder) you will notice that the same image instance is used three times on the same page. So, to reuse an existing PDF resource (e.g. an image, a font, a form xobject, etc.), you would do something like this:

static pdftron.PDF.Image my_image = null;
pdftron.PDF.Image GetImage(PDFDoc doc) {
if (my_image == null) {
   my_image = pdftron.PDF.Image.Create(doc, "my.jpg");
}
   return my_image;
}

Then each time you need to place the image, you would use the same image instance. For example:

Element element = element_builder.CreateImage( GetImage(doc) , new Matrix2D(200, 0, 0, 250, 50, 500));
writer.WritePlacedElement(element);  
 
As an alternative you also be able to use PDFNet optimizer add-on to automatically find and remove duplicate images and other resources with. For a concrete sample, please see Optimizer sample: 
    http://www.pdftron.com/pdfnet/samplecode.html#Optimizer
 

Support

unread,
May 31, 2013, 7:24:09 PM5/31/13
to pdfne...@googlegroups.com
Q:

We have to generate thousands (500k) of personalized PDFs to print and mail out to customers. Each of these PDFs are currently generated from an html page using a pdf converter. Application talks to the same url with a different customer id to customize the address fields for the customer to mail out the print material. Other than these customized address fields, it is pretty much the same content.
 
The problem is that the html contains high quality post card image(s) and therefore the size of each pdf file is close to say 3 MB. This times 500k results in a huge storage that needs to be transferred. Each pdf is pretty much the same except for the custom addresses in the html content for each customer.
 
My requirement is to reduce the size of these pdfs and upon talking to a print vendor, they told us that PDFTron has the ability to use the large image once in a single pdf and have the other pdfs refer this image, resulting in smaller file sizes. I believe I have some information I can play with right now but not 100% sure if this is achievable.

-----
A: Image sharing is only possible within a single PDF document (i.e. between pages). 
 
You can use PDFNet to reuse images within the same document as well as to shrink images to smaller size (e.g. convert pre-press quality images to screen resolution etc.), but you can't use it to make thousands of PDF files point to the same image.
 
Please note that even though PDF describes /F, /FFilter, /FDecodeParms entries in stream dictionary as a way to reference external files, the feature is due to security reasons not supported by most PDF consumers . Also, even Acrobat does not allow access to external streams (by default). 
 
Btw. you can use PDFNet to add /F, /FFilter, /FDecodeParms entries to any stream (so the problem is with third parties).
Reply all
Reply to author
Forward
0 new messages