19 0 obj
<<
/Type /XObject
/Subtype /Image
/Name /Image1
/Filter /CCITTFaxDecode
/Width 2537
/Height 3380
/BitsPerComponent 1
/ColorSpace /DeviceGray
/Length 80544
/DecodeParms <<
/K -1 /Columns 2537
>> /Decode [0 1]
>>
stream
˙ňÝU 4B"\Ąu Á¬ŤA.....
endstream
endobj
> I'm trying to extract a TIFF image from a pdf file using Java. I am
> able to read the file, and get to the stream data and read all the
> bytes. However when I try to interpret the data as a tiff (either by
> writing the bytes to separate tiff file, or using Java Advanced
> Imaging to construct a TIFF in memory) it is not valid.
The image data in the PDF is not a TIFF. It's basically just a 2D
array of pixels.
> What am I missing to make this work? If the 'filter' in the PDF
> says it is 'CCITTFaxDecode' shouldn't the data be a valid tiff
> stream?
No. You should refer to the PDF spec, specifically pp. 263-275.
--
A pdf Image stream (as you said) is esentially a 2D array of pixels.
However if it is encoded in 'CCITTFaxDecode' then I would first need
to run this stream through a 'decoder' of some sort, and then
reference those 2D array of pixels... is that correct? If it is, I'm
assuming that the 'CCITTFaxDecode' algorithm would be the same
encoding/decoding that is described in the specification for the TIFF
File format right? But how do I find out if the image is Group 4 or
Group 3? I pretty sure I'm still missing something...
Thanks for the help.
-Ryan
> > > What am I missing to make this work? If the 'filter' in the PDF
> > > says it is 'CCITTFaxDecode' shouldn't the data be a valid tiff
> > > stream?
> >
> > No. You should refer to the PDF spec, specifically pp. 263-275.
> >
> > --
> Thanks for replying!
> I have the pdf Spec 1.4 and I'm assuming you mean the part under
> graphics-->Images right? I read it once, and after reading it again,
> I think I understand a little more, but I was hoping you might be able
> to help me out again.
>
> A pdf Image stream (as you said) is esentially a 2D array of pixels.
> However if it is encoded in 'CCITTFaxDecode' then I would first need
> to run this stream through a 'decoder' of some sort, and then
> reference those 2D array of pixels... is that correct?
That would be the general idea. You can probably find java code out
there somewhere to do the decoding.
> If it is, I'm assuming that the 'CCITTFaxDecode' algorithm would be the
> same encoding/decoding that is described in the specification for the TIFF
> File format right? But how do I find out if the image is Group 4 or
> Group 3?
> 19 0 obj
> <<
> /Type /XObject
> /Subtype /Image
> /Name /Image1
> /Filter /CCITTFaxDecode
> /Width 2537
> /Height 3380
> /BitsPerComponent 1
> /ColorSpace /DeviceGray
> /Length 80544
> /DecodeParms <<
> /K -1 /Columns 2537
"/K -1" indicates that it's Group 4.
--