Checkbox Extraction as text after Fine tuning for new characters .

108 views
Skip to first unread message

Apoorv Khanna

unread,
Apr 3, 2018, 4:59:38 AM4/3/18
to tesseract-ocr
Hi all,

I am able to extract few check boxes after fine tuning the English model but tesseract is not able to extract all the check boxes .

Thanks in advance

version Used : tesseract 4 beta
Font used for training : Dejavu Sans
No of symbols inserted in training text is 14 each

Extracted text:
☐not reported wnot reported zpnot reported
cno Byes tno ☒yes ☐no ☑pyes
not reported not reported ☐not reported
ss.tif
ss.txt

ShreeDevi Kumar

unread,
Apr 3, 2018, 8:05:32 AM4/3/18
to tesser...@googlegroups.com
Try to train with a large number of fonts and see if that improves the result.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/78dcd45b-eb3a-441c-8800-f056285998f4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Piyush Chandra

unread,
Apr 22, 2020, 7:25:26 AM4/22/20
to tesseract-ocr
Hi Apoorva,

Were you able to get the 3 check boxes OCRed? Did you get any errors while training and how did you complete the training for your model?

Thanks & Regards,
Piyush

Muhammad Shamim

unread,
Jun 4, 2020, 6:23:33 AM6/4/20
to tesseract-ocr
HI,

I think, you should use image processing to detect the checkbox and then do OCR to get the text .

thanks
Shamim

Piyush Chandra

unread,
Jun 4, 2020, 12:08:16 PM6/4/20
to tesseract-ocr
Thanks Shamim, That's what I am doing now. :)
Reply all
Reply to author
Forward
0 new messages