Finetuning Tesseract to detect numbers

289 views

Skip to first unread message

Somayah Alharbi

unread,

Nov 1, 2022, 6:29:41 AM11/1/22

to tesseract-ocr

I'm trying to finetune Tesseract to recognize digits only but I'm not getting good results so far. I continued the training from Arabic language "ara" since the digits I'm trying to recognize are Arabic numbers.

The training will stop early at 0.01 error rate but the results on testing data is really bad.

I'm using my box/tif files and my training text with Tesstrain.h

Any recommendation on what should I do to get better results?

Alessandro Weber

unread,

Dec 13, 2022, 10:35:57 AM12/13/22

to tesseract-ocr

Hi Soma,

to limit character recognition you can use the parameter "tessedit_char_whitelist" = 0123456789.

For usage see parameter overview: https://muthu.co/all-tesseract-ocr-options/

Ale

Reply all

Reply to author

Forward

0 new messages