I'm using tesseract to output to pdf, using pgm files as input. The resulting PDF shows jpeg compression. Is there any way to avoid this? TIA.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a3c3588d-34ba-44f8-9921-b8df25ca38f2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Which tesseract version you use?Zdenko
The current logic (tesseract 3.05/4.00) is that for png is used flate[1] compression and for rest of formats is used leptonica function l_generateCIDataForPdf[2], that should used jpeg compression only for jpeg files...[1] https://github.com/tesseract-ocr/tesseract/blob/master/api/pdfrenderer.cpp#L720
[2] https://github.com/DanBloomberg/leptonica/blob/master/src/pdfio2.c#L519Zdenko
10:35 ~/ > tesseract a11.pgm test pdf
Tesseract Open Source OCR Engine v3.05.00 with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
10:40 ~/ > pdfimages -list test.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 1099 1705 gray 1 8 jpeg no 11 0 70 71 397K 22%