Identifying color PDF pages

Skip to first unread message

Norris, Chuck

Sep 11, 2008, 9:04:01 AM9/11/08

Our current project has a preliminary req to examine each page in a PDF, and export any pages with color to jpg, and the rest to bw tif.  Do any of the pdftron toolkits contain an easy method for making this identification?






Sep 12, 2008, 3:34:59 PM9/12/08
One way to implement a function to check if a PDF page contains some
color information is by inspecting all Elements on a page (e.g. using
ElementReader class as shown in ElementReaderAdv sample project - The main
problem with this approach is that it is fairly complex to implement -
there are many cases, you need to implement visibility checking etc).

A much simple approach is to use PDFDraw class to obtain a page bitmap
(e.g. at 96dpi; pdfdraw.GetBitmap();
samplecode.html#PDFDraw). You can then traverse all pixels on the
page. If there is any pixel where RGB (red, green, and blue) color
components do not match you have a PDF page with some color
information, otherwise you can treat the page as grayscale.

You can use the same approach to detect if a PDF page is blank (i.e.
if all pixels are transparent or white).

Please keep in mind that there are some important differences between
the above approaches. For example, rasterizing the page will not tell
you whether page contains some colored elements which are not visible
(e.g. because they are obscured by other elements or are outside of
page boundaries). So in case you need to implement a preflight like
tool for PDF, the approach using ElementReader is the way to go.
Reply all
Reply to author
0 new messages