In general, I've got some suggestions how you might be able to improve the algorithm:
- I'd recommend FPDFImageObj_GetImageFilter() to check the image's current filters and exclude those with existing high-compression codecs such as JPX, JBIG2, CCITTFax, and also DCT itself to avoid [generation loss](
https://en.wikipedia.org/wiki/Generation_loss).
- Another concern are 1bpp B/W images, which FPDFImageObj_GetBitmap() would convert to 8-bit Grayscale, leading to a major size increase. Supposedly you could check with FPDFImageObj_GetImageMetadata() to exclude such images from the processing. For quality reasons, I would also suggest to check the colorspace and exclude CMYK images, since GetBitmap() would transcode to RGB.
However, I would expect that replacing an RGB Flate image with a correctly encoded DCT equivalent should almost always lead to higher compression, so likely none of these points explain the size increase you're experiencing.
In that case, it would be helpful if you could share a before/after sample to see what's going on.
It might even be an issue with pdfium not removing the old Flate stream from the PDF or something? e.g. similar to how FPDFDoc_DeleteAttachment() does not actually remove the attachment stream, but merely unlink it from the view...