How do I remove duplicate images?

34 views

Skip to first unread message

Ryan

unread,

Jan 20, 2020, 4:08:14 PM1/20/20

to PDFTron PDFNet SDK

Question:

We have a PDF template, and we marge many pages from this template into a single PDF document. This is currently resulting in the same image being created in the master/final document over and over again.

How do we get the final PDF to just have a single version of each image?

Answer:

Ideally you import all the pages from pdf X to pdf Y in one call to PDFDoc.ImportPages, which will take care of only importing a single image for you, if multiple pages reference the same image.

However, based on the use case, it sounds like the same page is being imported over and over again, with multiple calls to PDFDoc.ImportPages. In which case the attached C# code can be used to consolidate the images to one.

Note this is not a general solution. In particular there is more to a PDF image than just the raw pixel data. For instance there can be different color spaces applied. This meta information is being discarded but could be different. That is two images could have identical raw data, but two different colorspaces for that pixel data. In practice this should be astronomically rare.