Tesseract (4 alpha ) Amibiguos Situation while Correcting Chars in box file

70 views
Skip to first unread message

srn...@gmail.com

unread,
Apr 5, 2017, 1:55:37 AM4/5/17
to tesseract-ocr
I am trying to correct box files, so i can train tesseract.

But I have got strange problem,


1) Tesseract is recognizing some alphabet as two letters, then how to edit the box file then.. (screenshot 1).
2) Tesseract is not recognizing some alphabets so how to edit the box file then.. (screenshot 2).
Screenshot from 2017-04-05 11:20:38.png
Screenshot from 2017-04-05 11:22:09.png

ShreeDevi Kumar

unread,
Apr 5, 2017, 3:59:40 AM4/5/17
to tesser...@googlegroups.com
Have you tried just using the eng.traineddata directly with tess 3.04/ 3.05 / 4.0?

You don't need to train unless it is a very special case. You can try changing the dictionary dawg files with tess 3.0x.




ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8acd28ca-fa7f-4be6-a293-ec3008ffd288%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Quan Nguyen

unread,
Apr 10, 2017, 11:40:14 PM4/10/17
to tesseract-ocr
For Case 1, you'll need to merge the two boxes. For Case 2, you'll correct by splitting the box.

srn...@gmail.com

unread,
Apr 12, 2017, 5:53:22 AM4/12/17
to tesseract-ocr
Can you please tell me how to split box and and merge two boxes respectively. I am not able to find any options regarding this. If you specify, it will be helpful to me and others also.

Thank You.

ShreeDevi Kumar

unread,
Apr 12, 2017, 6:11:46 AM4/12/17
to tesser...@googlegroups.com
You can use jtessboxeditor to edit the box files. Make sure to mark EOL if you are trying to train using scanned images.

Also note that this part of code is untested - training 4.0 using pre-existing images and box files.

Ray has only explained method for using images created by text2image. 

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
Reply all
Reply to author
Forward
0 new messages