how to decode a flatedecode(png) stream to a png file

167 views
Skip to first unread message

张猛

unread,
Jan 1, 2025, 10:51:41 PM1/1/25
to pdfium
OK, I use FPDFImageObj_GetImageFilterCount and FPDFImageObj_GetImageFilter api to determine what's the type of one image object. And I use FPDFImageObj_GetImageDataRaw to get the compressed stream in pdf.

ps: I use go-pdfium

When the filter is DCTDecode, life is easy.
I can use the code like this.
img, format, err = image.Decode(bytes.NewReader(dataRawRes.Data))
if err != nil {
return fmt.Errorf("无法解码图片: %v", err)
}

I can get the img.Image and I can get the right format. So I can save img to a jpeg file.

But When it comes to a FlateDecode(what I think is a png file compressed stream), it just doesn't work like a DCTDecode.

I get unknown format error. 

So what should I do ?


Alan Screen

unread,
Jan 6, 2025, 5:40:03 PM1/6/25
to 张猛, pdfium
The go-pdfium wrapper is separate from what is supported by this group, which is the underlying PDFium.  If there is a limitation in that wrapper, then you should contact the developers of that library.

If dealing with image data that has different filters applied to it is a problem, would it work for you to use FPDFImageObj_GetImageDataDecoded() instead of FPDFImageObj_GetImageDataRaw()?


--
You received this message because you are subscribed to the Google Groups "pdfium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pdfium+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/pdfium/e93cb00b-bc35-4d88-a463-c177cec0196an%40googlegroups.com.

张猛

unread,
Jan 6, 2025, 9:23:38 PM1/6/25
to pdfium
If the filter is FlateDecode, there is no difference between FPDFImageObj_GetImageDataDecoded and FPDFImageObj_GetImageDataRaw, I can't decode what I get to a Image type in go, And I get this 'unknown format' error. 
And I use the FPDFImageObj_GetRenderedBitmap api, But the resolution of image becomes very small.

However, all of the above problems are not encountered when DCTDecode is encoded.

geisserml

unread,
Jan 7, 2025, 9:49:27 AM1/7/25
to pdfium
> If the filter is FlateDecode, there is no difference between FPDFImageObj_GetImageDataDecoded and FPDFImageObj_GetImageDataRaw

Actually, there should be a difference in the output. FPDFImageObj_GetImageDataDecoded() should decode the flate filter as per https://crbug.com/pdfium/1203#comment8
So you'd get the raw pixel data. However, note that this is not like an image file, so with an image library, you may have to specify some kind of "raw" decoder, and pass size and pixel format so it can interpret the data.
See e.g. https://github.com/pypdfium2-team/pypdfium2/blob/9e55a45e1ad7874de6fb2b6273cf7d0edfcea624/src/pypdfium2/_helpers/pageobjects.py#L377

Also note that there's FPDFImageObj_GetBitmap(), which (unlike FPDFImageObj_GetRenderedBitmap()) will not lead to a resolution degradation. However, it may transcode the pixel format, and as FPDFImageObj_GetImageData{Raw,Decoded} it does not handle alpha masks.

Jeroen Bobbeldijk

unread,
Jan 7, 2025, 11:30:19 AM1/7/25
to pdfium
Maintainer of go-pdfium here. There is no internal method of decoding image data into a Go image, so it's not a limitation of the wrapper, there is no method to do that (yet), it mostly just allows you to call pdfium. 
It sounds like the OP is just trying to put the raw image data provided by pdfium into Go's image library, which it can't understand in all cases (DCTDecode works because it's JPEG), because it's probably not actually PNG. Perhaps he can share the PDF or the byte stream of the image so that we can check what it is.

However, it might not handle alpha masks properly like geisserml mentioned, and I think that is what the OP is trying to solve.

张猛

unread,
Jan 7, 2025, 9:47:04 PM1/7/25
to pdfium
I can conclude this: there is no way to get a png image without lowering resolution, right? Because 1)FPDFImageObj_GetRenderedBitmap() will lower the resolution, 2)FPDFImageObj_GetBitmap() will not lower the resolution, but will not deal with alpha channel 3)FPDFImageObj_GetImageData{Raw,Decoded} it does not handle alpha masks as you say.

geisserml

unread,
Jan 7, 2025, 10:42:15 PM1/7/25
to pdfium
I think so. Either we'd need some way to improve resolution with GetRenderedBitmap(), or maybe a new API to expose the alpha mask to the caller.

Here's an idea for a workaround though: you could try to move the image onto a new page (so that image and page size match), and render that at the desired resolution.

Lei Zhang

unread,
Jan 10, 2025, 8:19:00 PM1/10/25
to 张猛, pdfium
Try scaling the image object to its native resolution, render with FPDFImageObj_GetRenderedBitmap(), and then restoring the original resolution.

Also, just to be clear, images inside PDFs are not stored as PNGs, so it is not possible to read an image object's stream, save it out directly, and get a PNG.

geisserml

unread,
Jan 15, 2025, 11:04:40 AM1/15/25
to pdfium
Thanks for the suggestion, Lei. Here's a stab at an implementation (with pypdfium2):
```
if scale:
px_w, px_h = self.get_px_size() # FPDFImageObj_GetImagePixelSize()
l, b, r, t = self.get_bounds() # FPDFPageObj_GetBounds()
content_w, content_h = r-l, t-b
# align pixel and content width/height relation if swapped due to rotation (e.g. 90°, 270°)
swap = (px_w < px_h) != (content_w < content_h)
if swap:
px_w, px_h = px_h, px_w
orig_mat = self.get_matrix() # FPDFPageObj_GetMatrix()
x_scale, y_scale = px_w/content_w, px_h/content_h
scaled_mat = orig_mat.scale(x_scale, y_scale)
logger.debug(
f"Pixel size: {px_w}, {px_h} (did swap? {swap})\n"
f"Size in page coords: {content_w}, {content_h}\n"
f"X/Y scale: {x_scale}, {y_scale}\n"
f"Current matrix: {orig_mat}\n"
f"Scaled matrix: {scaled_mat}"
)
self.set_matrix(scaled_mat) # FPDFPageObj_SetMatrix()
try:
raw_bitmap = pdfium_c.FPDFImageObj_GetRenderedBitmap(self.pdf, self.page, self)
finally:
if scale: self.set_matrix(orig_mat)
```
This is better than moving the object and using FPDF_RenderPageBitmap() (which I figured might not even work, due to https://crbug.com/42271024).

geisserml

unread,
Jan 15, 2025, 11:25:45 AM1/15/25
to pdfium
(Sorry, I mashed up the indent when pasting. The last try/finally block should be unindented. And BTW, `self` is the pageobject.)
Reply all
Reply to author
Forward
0 new messages