Training help

47 views
Skip to first unread message

Mox Betex

unread,
Jun 9, 2019, 2:27:23 AM6/9/19
to tesseract-ocr
Can someone explain me how to create training data for tesseract 4.0?
I read tutorial on web but I really don't understand.
Is there some GUI software for training?
Do I have to create training data  with single font or image of text lines?

ElGato ElMago

unread,
Jun 10, 2019, 10:10:27 PM6/10/19
to tesseract-ocr
Did you try the tutorial at all? It's a pretty good guidance though you might need help here and there.

2019年6月9日日曜日 15時27分23秒 UTC+9 Mox Betex:

Mox Betex

unread,
Jun 11, 2019, 3:00:51 AM6/11/19
to tesseract-ocr
I have, but I have stumbled upon a problem that I can't solve.

I am trying to build training data for Tesseract 4.00

When I execute this command:

combine_lang_model --input_unicharset data/unicharset --script_dir data/tessdata --output_dir data/output --pass_through_recoder  --lang MyModel

I get error "Failed to load script unicharset from:data/tessdata/Latin.unicharset".

File Latin.unicharset is in data/tessdata folder, I don't understand how to fix this.

Can you help me?

ElGato ElMago

unread,
Jun 12, 2019, 4:47:13 AM6/12/19
to tesseract-ocr
I guess you ran tesstrain.sh and had a problem.  I had a problem there, too, but it seems different.  Anyway, I got away with it by a work of a guy on this board.  This one does the same thing as the tutorial but without an error.

https://github.com/Shreeshrii/tess4training 

Try this one.

2019年6月11日火曜日 16時00分51秒 UTC+9 Mox Betex:
Reply all
Reply to author
Forward
0 new messages