Steps to train with plenty of source files

45 views
Skip to first unread message

Shawn Chen

unread,
Aug 31, 2017, 6:54:09 AM8/31/17
to tesseract-ocr
Hi All,
I am new to Tesseract and want to use it to recognize plenty of image files.
Followed the training instructions I know how to do the training just for one file and generate the traineddata.
But for multiple files i am not very clear about how to automate the process based on the generated traineddata.
It seems that I have to modify the box file manually to correct the wrongly recognized characters.
Is there any way to automate this process?

Thanks.

ShreeDevi Kumar

unread,
Aug 31, 2017, 9:01:31 AM8/31/17
to tesser...@googlegroups.com
There are traineddata available for most languages, in different versions -

for 3.04/3.05
for 4.00.00alpha intial version from 2016
best traineddata for 4.00.00alpha released last month
best traineddata for the script  for 4.00.00alpha released last month


You should try training only if these do not work for you. 

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/57fa0441-4410-4228-8808-73fcd743d6fe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Shawn Chen

unread,
Aug 31, 2017, 11:34:31 PM8/31/17
to tesseract-ocr
Thanks Shree.
I will try these.


On Thursday, August 31, 2017 at 9:01:31 PM UTC+8, shree wrote:
There are traineddata available for most languages, in different versions -

for 3.04/3.05
for 4.00.00alpha intial version from 2016
best traineddata for 4.00.00alpha released last month
best traineddata for the script  for 4.00.00alpha released last month


You should try training only if these do not work for you. 

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Aug 31, 2017 at 2:24 PM, Shawn Chen <chenxi...@gmail.com> wrote:
Hi All,
I am new to Tesseract and want to use it to recognize plenty of image files.
Followed the training instructions I know how to do the training just for one file and generate the traineddata.
But for multiple files i am not very clear about how to automate the process based on the generated traineddata.
It seems that I have to modify the box file manually to correct the wrongly recognized characters.
Is there any way to automate this process?

Thanks.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages