Digit recognition errors / training

122 views
Skip to first unread message

Suppressed

unread,
Apr 2, 2020, 2:21:39 PM4/2/20
to tesseract-ocr
Im working on a project in which I need to read digit values from an image, then do tasks based on the values that get extracted.  
Because of this, mistakes arent really acceptable. I attached the picture as an example of what the images look like. 
The digits barely change, they dont change positioning or angle, only some have more or less pixels each time but it isnt much.

23999
29999
30999
40000
40000
40000
40000
1
43000
44000

44000

44500

This is what tesseract extracts from the image. As you can see its mostly fine but instead for 4111 it extracts 1. Now, this can vary if I change the languages or change some thresholding values, but that might work for this case, but it wont work for the other ones.
I guess only training would be a possibility to fix errors, but I couldnt really do it. The positions or angles of the data doesnt change, its just the font I Would need to train, but I dont know how to generate a lot of training information.

code:
img = cv2.imread(xy.png',cv2.IMREAD_GRAYSCALE)
ret,thresh1 = cv2.threshold(img,150,255,cv2.THRESH_BINARY_INV)
ROI1 = thresh1[130:1050,1280:1420]
text = pytesseract.image_to_string(ROI1,config="digits")

I imagegrab the screen and select ROI.

Any suggestion? Maybe theres some training data that with some digits in it that I could change to my font?

pic.png

Shree Devi Kumar

unread,
Apr 2, 2020, 9:54:11 PM4/2/20
to tesseract-ocr
try finetune for impact using your font.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a0fd3ccf-f681-4c34-8113-7d15f3a44101%40googlegroups.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Suppressed

unread,
Apr 3, 2020, 2:26:58 AM4/3/20
to tesseract-ocr
You got any guides or threads that could help me in the process? Im kinda lost, not gonna lie.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Shree Devi Kumar

unread,
Apr 3, 2020, 7:32:11 AM4/3/20
to tesseract-ocr
Reply all
Reply to author
Forward
0 new messages