How to train for multiple languages?

69 views
Skip to first unread message

Fanatico

unread,
Apr 10, 2018, 9:49:34 PM4/10/18
to tesseract-ocr
I want to train fo kor+chi how can I do it?

ShreeDevi Kumar

unread,
Apr 11, 2018, 1:51:32 AM4/11/18
to tesser...@googlegroups.com
Ray has not given instructions for multi language or script type training.

You can try to concatenate the two training texts, word lists, merge the unicharsets (merge_unicharsets command), and then do replace a layer training with your primary language as base.

Also, unpack the Han and Hangul script traineddata using combine_tessdata -u and look at the unicharset, word lists etc in it.

On Wed 11 Apr, 2018, 7:19 AM Fanatico, <fanati...@gmail.com> wrote:
I want to train fo kor+chi how can I do it?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c5be93c0-125e-4e22-9f3d-cc162159178c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Fanatico

unread,
Apr 11, 2018, 9:45:12 AM4/11/18
to tesseract-ocr
Thanks, I was going to do this, just to be sure if there wasn't a way to train 2 traineddata like the actual.
Reply all
Reply to author
Forward
0 new messages