Tesseract does not recognise these numbers

484 views
Skip to first unread message

Juanjo Gómez Navarro

unread,
Jun 17, 2021, 9:51:18 AM6/17/21
to tesseract-ocr
I have this simple image with a date:
test.png
Tesseract produces the output: 

$ tesseract test.png -
Estimating resolution as 233
03:41 pm

In similar images, I have the problem that it misunderstands 1's for 7's and the other way around. How can I help Tesseract to recognise these characters?

My version of Tesseract is:

$ tesseract -v
tesseract 5.0.0-alpha-20210401-130-g7a308
 leptonica-1.79.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found OpenMP 201511

Zdenko Podobny

unread,
Jun 18, 2021, 6:53:00 AM6/18/21
to tesser...@googlegroups.com
With tessdata from  [1]  and oem 0 you can get:

tesseract unnamed.png - --psm 7 --oem 0
09:41 Dm

Otherwise:

tesseract unnamed.png - --psm 7
0%:41 pm

With small preprocessing (blur and resize, so letter have high around 30 points) you can get :

tesseract time.png - --psm 7
09:41 pm

št 17. 6. 2021 o 15:51 Juanjo Gómez Navarro <juanjo.go...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/801b7f63-2f79-41d0-8d48-b00cfe3f292en%40googlegroups.com.
time.png

Juanjo Gómez Navarro

unread,
Jun 18, 2021, 11:53:37 AM6/18/21
to tesseract-ocr
Thanks for the hint. A little bit of blur and resizing indeed helped.
Reply all
Reply to author
Forward
0 new messages