checkbox recognition-Tesseract 4

710 views
Skip to first unread message

PD

unread,
Feb 13, 2020, 9:37:59 AM2/13/20
to tesseract-ocr
0

Hello

Is there anyway where Tesseract 4 can be trained for checkbox ? I want to train Tesseract for empty checkbox , checkbox with cross/check sign. Default English trained data does not identify checkbox.I tried defining new font using jTessBoxEditor and trained it using this tool. but no success.

Josh Wieder

unread,
Feb 14, 2020, 1:52:30 PM2/14/20
to tesseract-ocr
You will have a better chance of a successful response if you can provide some additional information about your situation. At a minimum, please provide:

- your exact version of jTessBoxEditor, tesseract (ie 4.0.1 rather than 4) & all of the pre-requisites listed on the jtessboxeditor website (http://vietocr.sourceforge.net/usage.html) eg javascript
- some minimal information about your environment ... linux/windows? python/.NET?
- the exact error message that you receive in jTessBoxEditor and exact steps to reproduce it

Assuming this is occurring immediately post-install for you, providing step-by-step of how you installed jTessBoxEditor would likely also help.

Cheers,
Josh

Quan Nguyen

unread,
Feb 14, 2020, 11:45:15 PM2/14/20
to tesseract-ocr
jTessBoxEditor is for training for Tesseract 3.0x format only. For 4.0x, please consult https://github.com/tesseract-ocr/tessdoc/blob/master/TrainingTesseract-4.00.md 

Josh Wieder

unread,
Feb 15, 2020, 6:59:00 AM2/15/20
to tesser...@googlegroups.com
Im not sure that v4 incompatibility claim is accurate. The landing page of the website for jtessboxeditor only lists compatibility with v2 & v3. The changelog for the application itself specifies that the latest update offers support for tesseract 4.1.1 (which is why I requested clarification on version numbering ... using an earlier version with tesseract 4 would not work)


--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/bpxTF3vfB-I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/cf6226d5-3c88-4282-acec-b49363988f4c%40googlegroups.com.

Josh Wieder

unread,
Feb 15, 2020, 7:04:12 AM2/15/20
to tesser...@googlegroups.com
correction: my bad guys, previous poster is correct. the changelog on the site is a mishmash of changes for 3-4 different applications.

latest available jtessbox version supports no later than tesseract 3.05-dev. for what its worth, I havent made up my own mind on the best option for zone selection. my own preference is something that wont lock me into v3, though.


On Fri, Feb 14, 2020, 11:45 PM Quan Nguyen <nguy...@gmail.com> wrote:
--

kamran hamid

unread,
Feb 15, 2020, 8:28:32 AM2/15/20
to tesser...@googlegroups.com
i have some problem of tesseract for the Urdu language.tesseract did not recognize the text from the picture.

You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAMdLX2ae3wGY-cb1zjYcN7-2v3QqiysttaDyrT7xQ6bq5joxtg%40mail.gmail.com.

Prasanna Diwadkar

unread,
Feb 16, 2020, 8:53:23 AM2/16/20
to tesser...@googlegroups.com
Thanks. So what the is the procedure for training character with new font?
How do you train checkbox?
I am using Window 10. 
Regards
PD

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages