Tesstrain.sh not generating TrainedData

103 views
Skip to first unread message

kamra....@gmail.com

unread,
Apr 24, 2021, 10:55:26 AM4/24/21
to tesseract-ocr
Hi,

I am running the following command to create trained data:
tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only --fontlist "FreeMono" --noextract_font_properties --langdata_dir /home/administrator/Downloads/tesseract-4.0.0/langdata --my_boxtiff_dir /home/administrator/pooja/testImages/ --tessdata_dir /home/administrator/Downloads/tesseract-4.0.0/tessdata --output_dir /home/administrator/images/output_folder_1/

After this it is printing:
=== Starting training for language 'eng'
[Sat Apr 24 20:15:19 IST 2021] /usr/local/bin/text2image --fonts_dir=/usr/share/fonts --font=FreeMono --outputbase=/tmp/font_tmp.e9Fi4vFUQQ/sample_text.txt --text=/tmp/font_tmp.e9Fi4vFUQQ/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.e9Fi4vFUQQ
Rendered page 0 to file /tmp/font_tmp.e9Fi4vFUQQ/sample_text.txt.tif

=== Phase I: Generating training images ===
Rendering using FreeMono
[Sat Apr 24 20:15:23 IST 2021] /usr/local/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.e9Fi4vFUQQ --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/eng-2021-04-24.Y8g/eng.FreeMono.exp0 --max_pages=0 --font=FreeMono --text=/home/administrator/Downloads/tesseract-4.0.0/langdata/eng/eng.training_text
Rendered page 0 to file /tmp/eng-2021-04-24.Y8g/eng.FreeMono.exp0.tif
Rendered page 1 to file /tmp/eng-2021-04-24.Y8g/eng.FreeMono.exp0.tif

=== Phase UP: Generating unicharset and unichar properties files ===
[Sat Apr 24 20:15:25 IST 2021] /usr/local/bin/unicharset_extractor --output_unicharset /tmp/eng-2021-04-24.Y8g/eng.unicharset --norm_mode 1
Usage: /usr/local/bin/unicharset_extractor [--output_unicharset filename] [--norm_mode mode] box_or_text_file [...]
Where mode means:
 1=combine graphemes (use for Latin and other simple scripts)
 2=split graphemes (use for Indic/Khmer/Myanmar)
 3=pure unicode (use for Arabic/Hebrew/Thai/Tibetan)

As per specification it should be end with:
Created starter traineddata for LSTM training of language 'eng' 
Run 'lstmtraining' command to continue LSTM training for language 'eng

Please help.

Regards,
Pooja

kamra....@gmail.com

unread,
Apr 29, 2021, 1:20:17 AM4/29/21
to tesseract-ocr
Resolved. There were mismatch between trainedata used. Tesseract installed was of version 4.1.0. And i was giving path of downloaded tesseract 4.0.
It was causing issue.

Reply all
Reply to author
Forward
0 new messages