Tesseract unable to recognize simple text when image is closely cropped

88 views
Skip to first unread message

Teofilis Martisius

unread,
Sep 20, 2021, 12:50:28 PM9/20/21
to tesseract-ocr
Hello,

I have tried OCRing this image (hello.png, attached). Results come out empty. It works if I add a border (image attached).

I run:

$ tesseract hello.png stdout
Estimating resolution as 528
Empty page!!
Estimating resolution as 528
Empty page!!

$ tesseract hello_with_border.png stdout
Hello

I have following version:
$ tesseract -v
tesseract 5.0.0-beta-20210815-6-g1d3d
 leptonica-1.79.0
  libgif 5.1.9 : libjpeg 6b (libjpeg-turbo 2.0.6) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.4.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found OpenMP 201511
 Found libarchive 3.4.3 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.8 liblz4/1.9.3 libzstd/1.4.8

I'm on Debian/Sid Linux. I had same issue with Tesseract v4.

I don't see error message "Error in boxClipToRectangle" so I don't think this is issue #427

Should this issue be reported?

Sincerely,
Teofilis Martisius

hello_with_border.png
hello.png

Zdenko Podobny

unread,
Sep 20, 2021, 12:56:03 PM9/20/21
to tesser...@googlegroups.com
Should this issue be reported?

Absolutely no. Just follow the doc and result will be fine.

Zdenko


po 20. 9. 2021 o 18:50 Teofilis Martisius <teofilis....@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3332ec29-c136-4a48-8999-b9a3fe0df84en%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages