OCRing simple numbers unreliable

63 views
Skip to first unread message

Borek Lupoměský

unread,
May 22, 2019, 5:05:20 AM5/22/19
to tesseract-ocr
I am OCRing numbers from images. I do all the processing with ImageMagick to end up with single isolated number (up to three numerals) Tesseract does good job most of the time, but sometimes it doesn't recognize number correctly. For example this image of number 27 gets recognized as "2/". How can I improve the reliability? Is expecting 100% success rate too much?

Lorenzo Bolzani

unread,
May 22, 2019, 9:04:11 AM5/22/19
to tesser...@googlegroups.com
Hi,
try these (in any combination):

psm 6 or 7
remove white border (all or most)
downscale so that the font is 20/50px tall
fine tune a model to recognize only numbers
threshold

Otherwise post more details about how you are using tesseract.


Bye

Lorenzo



Il giorno mer 22 mag 2019 alle ore 11:05 Borek Lupoměský <borek.l...@gmail.com> ha scritto:
I am OCRing numbers from images. I do all the processing with ImageMagick to end up with single isolated number (up to three numerals) Tesseract does good job most of the time, but sometimes it doesn't recognize number correctly. For example this image of number 27 gets recognized as "2/". How can I improve the reliability? Is expecting 100% success rate too much?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a9d54920-2112-4322-9dea-f0fba5762a6c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages