Finetuning Tesseract to detect numbers

289 views
Skip to first unread message

Somayah Alharbi

unread,
Nov 1, 2022, 6:29:41 AM11/1/22
to tesseract-ocr

I'm trying to finetune Tesseract to recognize digits only but I'm not getting good results so far. I continued the training from Arabic language "ara" since the digits I'm trying to recognize are Arabic numbers. 
The training will stop early at 0.01 error rate but the results on testing data is really bad. 

I'm using my box/tif files and my training text with Tesstrain.h

Any recommendation on what should I do to get better results? 


Alessandro Weber

unread,
Dec 13, 2022, 10:35:57 AM12/13/22
to tesseract-ocr
Hi Soma,

to limit character recognition you can use the parameter "tessedit_char_whitelist" = 0123456789.
For usage see parameter overview: https://muthu.co/all-tesseract-ocr-options/

Ale
Reply all
Reply to author
Forward
0 new messages