Warning. Invalid resolution 0 dpi. Using 70 instead.

Naomi

unread,

Jan 18, 2019, 2:51:49 PM1/18/19

to tesseract-ocr

I understand this question is asked a lot, but I'm getting

Warning. Invalid resolution 0 dpi. Using 70 instead.

when I set --psm to 0.

I used ImageMagick to convert the PDF to a tif, and as required, I did set the units and density:

convert -density 300 -units PixelsPerCentimeter InputPdf.pdf -depth 8 -strip -background white -alpha off file.tiff

From there, I try and run tesseract as follows:

tesseract file.tiff searchable-pdf -l eng --psm 0 pdf tesseract_parsley_config.txt

This produces the error message. Running `magick identify` produces the following and confirms that the metadata is set.

Format: TIFF (Tagged Image File Format)

Mime type: image/tiff

Class: DirectClass

Geometry: 2550x3300+0+0

Resolution: 300x300

Print size: 8.5x11

Units: PixelsPerCentimeter

How can I remove the error?

Zdenko Podobny

unread,

Jan 18, 2019, 2:54:54 PM1/18/19

to tesser...@googlegroups.com

please provide testing file + info anout tesseract version.

Zdenko

pi 18. 1. 2019 o 20:51 Naomi <naomi.d...@gmail.com> napísal(a):

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b0743d45-6fa6-4e3c-9b43-6943f7adc8a3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Naomi

unread,

Jan 18, 2019, 3:39:01 PM1/18/19

to tesseract-ocr

tesseract -v

tesseract 4.0.0

leptonica-1.77.0

libgif 5.1.4 : libjpeg 9c : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.1 : libopenjp2 2.3.0

Found AVX2

Found AVX

Found SSE

Can't provide the file publicly unfortunately but I can look into any specific metadata needed.

Naomi

unread,

Jan 18, 2019, 4:05:33 PM1/18/19

to tesseract-ocr

Additionally, I'm getting somewhat poor output on the following scan. You can see that Tesseract is binarizing it but leaving a lot of black pixels. Is there a method to denoise those white pixels that are part of the background?

Before:

After Tesseract tries to pre-process:

Naomi

unread,

Jan 18, 2019, 4:21:58 PM1/18/19

to tesseract-ocr

I'm realizing on the above image I posted that the issue isn't because of the stray pixels, but the white on black text. All of the document is black on white text except for this table header, which is black on white. Tesseract is picking up the black in that image as characters and turning it into gibberish. Does anyone know how I would pre-process the image to invert only the white on black text?

Reply all

Reply to author

Forward