Error in creating LSTM training data using tesstrain.sh

876 views
Skip to first unread message

Shandigutt

unread,
Sep 1, 2018, 6:10:45 PM9/1/18
to tesseract-ocr
Hi,

I was trying to create LSTM training data using tesstrain.sh. I got the below error. Can somebody explain me what has gone wrong,

Command I used:
./src/training/tesstrain.sh --fonts_dir ../Support/font --lang sin --linedata_only \
  --noextract_font_properties --langdata_dir ../langdata \
  --tessdata_dir ../tessdata --output_dir ../training/sintrain --fontlist "BhashitaComplex" --training_text ../langdata/sin/sin.training_text 

Extract of the output:
=== Phase E: Generating lstmf files ===
Using TESSDATA_PREFIX=../tessdata
[2018 සැප්තැම්බර් 1 වැනි සෙනසුරාදා 21:41:25 +0300] /usr/local/bin/tesseract /tmp/sin-2018-09-01.E4T/sin.BhashitaComplex.exp0.tif /tmp/sin-2018-09-01.E4T/sin.BhashitaComplex.exp0 --psm 6 lstm.train ../langdata/sin/sin.config
read_params_file: Can't open lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.4-74-gd8237 with Leptonica
Page 1
Page 2
Page 3
ERROR: /tmp/sin-2018-09-01.E4T/sin.BhashitaComplex.exp0.lstmf does not exist or is not readable

For the complete output please see the attached err.txt

After executing the command I checked the tmp directory it created. It was shown as below,

tharaka@tharaka-laptop-ubuntu:~$ cd /tmp/sin-2018-09-01.E4T/
tharaka@tharaka-laptop-ubuntu:/tmp/sin-2018-09-01.E4T$ ll
total 776
drwx------  2 tharaka tharaka   4096 සැප්   1 21:41 ./
drwxrwxrwt 50 root    root      4096 සැප්   2 00:10 ../
-rw-r--r--  1 tharaka tharaka 249413 සැප්   1 21:41 sin.BhashitaComplex.exp0.box
-rw-r--r--  1 tharaka tharaka 436290 සැප්   1 21:41 sin.BhashitaComplex.exp0.tif
-rw-r--r--  1 tharaka tharaka   9099 සැප්   1 23:27 sin.BhashitaComplex.exp0.txt
-rw-r--r--  1 tharaka tharaka   6543 සැප්   1 21:41 sin.unicharset
-rw-r--r--  1 tharaka tharaka   3053 සැප්   1 21:41 sin.xheights
-rw-r--r--  1 tharaka tharaka  71704 සැප්   1 23:27 tesstrain.log
tharaka@tharaka-laptop-ubuntu:/tmp/sin-2018-09-01.E4T$

My tesseract  version:
tesseract 4.0.0-beta.4-74-gd8237
 leptonica-1.77.0
  libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11
 Found SSE

My OS details,
tharaka@tharaka-laptop-ubuntu:/tmp/sin-2018-09-01.E4T$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.1 LTS
Release: 18.04
Codename: bionic

Appreciate your support on this.
Thanks
err.txt

Shree Devi Kumar

unread,
Sep 1, 2018, 11:41:28 PM9/1/18
to tesser...@googlegroups.com
> read_params_file: Can't open lstm.train

lstm.train is a config file which is not found.

It is there in tesseract/tessdata/configs

Make sure it is there in your tessdata directory or your path and can be found.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7d771008-c142-4302-8b5e-e1fd130cc140%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Shandigutt

unread,
Sep 2, 2018, 5:20:11 PM9/2/18
to tesseract-ocr
Thank you Shree. Now it works fine
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
Message has been deleted

kamra....@gmail.com

unread,
Apr 22, 2021, 2:48:43 PM4/22/21
to tesseract-ocr
I am facing the same issue. I have used following command:
/tesstrain.sh --fonts_dir /usr/share/fonts/ --lang eng --linedata_only --noextract_font_properties --exposures "0" --langdata_dir /home/administrator/Downloads/tesseract-4.0.0/langdata --tessdata_dir /home/administrator/Downloads/tesseract-4.0.0/tessdata --output_dir /home/administrator/pooja/output --fontlist 'FreeMono'

It is giving same error.
=== Starting training for language 'eng'
[Fri Apr 23 00:13:06 IST 2021] /usr/bin/text2image --fonts_dir=/usr/share/fonts/ --font=FreeMono --outputbase=/tmp/font_tmp.7XXGMDw4DE/sample_text.txt --text=/tmp/font_tmp.7XXGMDw4DE/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.7XXGMDw4DE
Rendered page 0 to file /tmp/font_tmp.7XXGMDw4DE/sample_text.txt.tif

=== Phase I: Generating training images ===
Rendering using FreeMono
[Fri Apr 23 00:13:09 IST 2021] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.7XXGMDw4DE --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/eng-2021-04-23.RTo/eng.FreeMono.exp0 --max_pages=0 --font=FreeMono --text=/home/administrator/Downloads/tesseract-4.0.0/langdata/eng/eng.training_text
Rendered page 0 to file /tmp/eng-2021-04-23.RTo/eng.FreeMono.exp0.tif
Rendered page 1 to file /tmp/eng-2021-04-23.RTo/eng.FreeMono.exp0.tif
ERROR: /tmp/eng-2021-04-23.RTo/eng.FreeMono.exp0.tif does not exist or is not readable

I have checked for lstm.train file. It is present. Please help to resolve it.


kamra....@gmail.com

unread,
Apr 24, 2021, 10:39:53 AM4/24/21
to tesseract-ocr
The issue got resolved. libtiff was missing in the system so not working with tif files
Reply all
Reply to author
Forward
0 new messages