Digits recognition

148 views
Skip to first unread message

Yohei Sasaki

unread,
Jul 15, 2021, 9:18:29 AM7/15/21
to tesseract-ocr
Hi, 

I'm n00b for tesseract and wondering how I can solve my issue. 

I'm trying to capture digits on gaming screenshots to track damages in a game and here is an example image of the digits. 

sample.png

Then I did `preprocess` the image (remove noises by thresholding) as follows:

processed-res.png

Then I pass it to tesseract (tesseract 4.1.1-rc2-25-g9707) but it doesn't recognize as 115 but as `ns`

$ tesseract -l eng --psm 6 --dpi 300 -c tessedit_char_whitelist="0123456789" processed-res.png -

$ tesseract -l eng --psm 6 --dpi 300 processed-res.png -
nS

Anyone can advise me how to recognize digits? I don't have a font file for digits so wondering what else I can take....

Thanks!

Ajinkya Bobade

unread,
Aug 12, 2021, 12:58:50 AM8/12/21
to tesseract-ocr
Hello,

Tesseract 4 isn't designed for digit recognition, Tesseract 4 identifies relationship between digits and then predicts the word/ sentence as a whole

Regards
Ajinkya
Creator of AI Scanner https://imagescanner-online.com/

zdenop

unread,
Aug 12, 2021, 2:17:20 AM8/12/21
to tesseract-ocr
Use legacy engine instead of LSTM (you will need language model from https://github.com/tesseract-ocr/tessdata):

tesseract processed-res.png -  --oem 0
Estimating resolution as 878
115

Dátum: štvrtok 15. júla 2021, čas: 15:18:29 UTC+2, odosielateľ: yss...@gmail.com
Reply all
Reply to author
Forward
0 new messages