Identifying color PDF pages

202 views
Skip to first unread message

Norris, Chuck

unread,
Sep 11, 2008, 9:04:01 AM9/11/08
to pdfne...@googlegroups.com

Our current project has a preliminary req to examine each page in a PDF, and export any pages with color to jpg, and the rest to bw tif.  Do any of the pdftron toolkits contain an easy method for making this identification?

 

Tks,

 

Chuck

Support

unread,
Sep 12, 2008, 3:34:59 PM9/12/08
to PDFTron PDFNet SDK
One way to implement a function to check if a PDF page contains some
color information is by inspecting all Elements on a page (e.g. using
ElementReader class as shown in ElementReaderAdv sample project -
http://www.pdftron.com/net/samplecode.html#ElementReaderAdv) The main
problem with this approach is that it is fairly complex to implement -
there are many cases, you need to implement visibility checking etc).

A much simple approach is to use PDFDraw class to obtain a page bitmap
(e.g. at 96dpi; pdfdraw.GetBitmap();http://www.pdftron.com/net/
samplecode.html#PDFDraw). You can then traverse all pixels on the
page. If there is any pixel where RGB (red, green, and blue) color
components do not match you have a PDF page with some color
information, otherwise you can treat the page as grayscale.

You can use the same approach to detect if a PDF page is blank (i.e.
if all pixels are transparent or white).

Please keep in mind that there are some important differences between
the above approaches. For example, rasterizing the page will not tell
you whether page contains some colored elements which are not visible
(e.g. because they are obscured by other elements or are outside of
page boundaries). So in case you need to implement a preflight like
tool for PDF, the approach using ElementReader is the way to go.
Reply all
Reply to author
Forward
0 new messages