Tesseract accuracy.

197 views

Skip to first unread message

Kyle Zeneki

unread,

Mar 25, 2023, 3:39:08 AM3/25/23

to tesseract-ocr

Hello, I have these images and I'm trying to print their output using Tesseract. I spent 2 hours fine-tuning Tesseract for a specific font, and the error rate was 0.163. I used multiple font-detecting websites, and the closest match was "Futura Now." However, Tesseract sometimes fails to read the "E" from "D V E O" but successfully reads the "E" from "EOPEO." It also occasionally misreads "S E G I E" as "Ss Ee G I E." etc. I'm wondering if there's a way to train Tesseract by image rather than by font. Alternatively, is there a better tool than Tesseract, such as EasyOCR?"

Zdenko Podobny

unread,

Apr 1, 2023, 3:20:00 AM4/1/23

to tesser...@googlegroups.com

As the first step, I would suggest you read https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md

Next: LSTM model is training on words/lines of text so it could have a problem with "code". For images like these legacy mode is perfect. E.g.:

tesseract WCAZ.png - --psm 6 --oem 0
W C A Z

tesseract DVEO.png - --psm 6 --oem 0
D V E O

The legacy engine model is available in languages files in tessdata repository (https://github.com/tesseract-ocr/tessdata). Many installations prefer to use fast model (without legacy model)

Zdenko

so 25. 3. 2023 o 8:39 Kyle Zeneki <kylez...@gmail.com> napísal(a):

Hello, I have these images and I'm trying to print their output using Tesseract. I spent 2 hours fine-tuning Tesseract for a specific font, and the error rate was 0.163. I used multiple font-detecting websites, and the closest match was "Futura Now." However, Tesseract sometimes fails to read the "E" from "D V E O" but successfully reads the "E" from "EOPEO." It also occasionally misreads "S E G I E" as "Ss Ee G I E." etc. I'm wondering if there's a way to train Tesseract by image rather than by font. Alternatively, is there a better tool than Tesseract, such as EasyOCR?"

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/fffda6e4-5754-4b87-b397-0365793d8c4en%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages