Digit recognition errors / training

Suppressed

unread,

Apr 2, 2020, 2:21:39 PM4/2/20

to tesseract-ocr

Im working on a project in which I need to read digit values from an image, then do tasks based on the values that get extracted.

Because of this, mistakes arent really acceptable. I attached the picture as an example of what the images look like.

The digits barely change, they dont change positioning or angle, only some have more or less pixels each time but it isnt much.

23999

29999

30999

40000

1

43000

44000

44500

This is what tesseract extracts from the image. As you can see its mostly fine but instead for 4111 it extracts 1. Now, this can vary if I change the languages or change some thresholding values, but that might work for this case, but it wont work for the other ones.

I guess only training would be a possibility to fix errors, but I couldnt really do it. The positions or angles of the data doesnt change, its just the font I Would need to train, but I dont know how to generate a lot of training information.

code:

img = cv2.imread(xy.png',cv2.IMREAD_GRAYSCALE)

ret,thresh1 = cv2.threshold(img,150,255,cv2.THRESH_BINARY_INV)

ROI1 = thresh1[130:1050,1280:1420]

text = pytesseract.image_to_string(ROI1,config="digits")

I imagegrab the screen and select ROI.

Any suggestion? Maybe theres some training data that with some digits in it that I could change to my font?

pic.png

Shree Devi Kumar

unread,

Apr 2, 2020, 9:54:11 PM4/2/20

to tesseract-ocr

try finetune for impact using your font.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a0fd3ccf-f681-4c34-8113-7d15f3a44101%40googlegroups.com.

--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Suppressed

unread,

Apr 3, 2020, 2:26:58 AM4/3/20

to tesseract-ocr

You got any guides or threads that could help me in the process? Im kinda lost, not gonna lie.

To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a0fd3ccf-f681-4c34-8113-7d15f3a44101%40googlegroups.com.

Shree Devi Kumar

unread,

Apr 3, 2020, 7:32:11 AM4/3/20

to tesseract-ocr

https://github.com/tesseract-ocr/tessdoc/blob/master/TrainingTesseract-4.00.md#fine-tuning-for-impact

https://github.com/Shreeshrii/tess4training/blob/master/1-makedata.sh

https://github.com/Shreeshrii/tess4training/blob/master/4-impact_from_full.sh

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c30448a9-7027-4288-8945-f3a59342b1ea%40googlegroups.com.

Reply all

Reply to author

Forward