Training tesseract 4.0 with large training text

91 views
Skip to first unread message

john.d...@gmail.com

unread,
Mar 12, 2018, 9:24:54 PM3/12/18
to tesseract-ocr
Dear all,

I'm trying to train lstm using a large training text, different fonts, colors etc. I'm trying to use text2image to generate my tif / box file combinations, however text2image appears to be limited to 3 pages and thus truncates my training text. How should I solve this? Call text2image in a loop on the remaining training text and generate hundreds, if not thousands, of tif / box file combos for all of my training text, fonts etc?

Thanks for the help!

John.

ShreeDevi Kumar

unread,
Mar 12, 2018, 9:57:29 PM3/12/18
to tesser...@googlegroups.com
Please look at tesstrain.sh

It is setting max-pages to 3 for text2image invocation. You can change it there.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/10bc983a-83a5-4434-afca-18cc2d5d1ce4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

이경준

unread,
Mar 13, 2018, 2:52:42 AM3/13/18
to tesseract-ocr
Hi Shree . I saw the tesstrain.sh file.

But I cannot point to max-pages to 3 ??? where ??? 

Could you tell me about it more details

2018년 3월 13일 화요일 오전 10시 57분 29초 UTC+9, shree 님의 말:

ShreeDevi Kumar

unread,
Mar 13, 2018, 3:21:23 AM3/13/18
to tesser...@googlegroups.com
You have to look in the file called by it


tesstrain_utils.sh

이경준

unread,
Mar 13, 2018, 3:24:15 AM3/13/18
to tesseract-ocr
Thank U 

2018년 3월 13일 화요일 오후 4시 21분 23초 UTC+9, shree 님의 말:

john.d...@gmail.com

unread,
Mar 16, 2018, 8:14:29 PM3/16/18
to tesseract-ocr
Thanks Shree, I managed to get this bit working.

Op dinsdag 13 maart 2018 02:24:54 UTC+1 schreef john.d...@gmail.com:
Reply all
Reply to author
Forward
0 new messages