Error at training 4.0

Skip to first unread message


Apr 4, 2018, 4:06:07 PM4/4/18
to tesseract-ocr
Hi, I'm new to tesseract and ocr in general, and need some help to train my tesseract.

Platform: Mac OS X 10.13.3
Tesseract Version: 4.0.0-beta.1
leptonica: 1.75.3
  libjpeg 9c : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11

images used



Step by step
I'm trying to train (fine tuning) my tesseract to better detect commas (") and dot (.) in korean, but I'm getting some errors. Here what I did until now:

1 - Got the Images, I'm using 2 images .tif (both images has only 1 line and few characters)
2 - Renamed the images to kor.AppleMyungjo.exp0.tif and kor.AppleMyungjo.exp1.tif
3 - Created the .box file for each image ```tesseract [language].[fontname].exp[samplenumber].tif [language].[fontname].exp[samplenumber] -l [language] batch.nochop makebox``` (one of them come empty)
4 - Corrected the .box files using the site (I just pasted the positioning in the file)
5 - Created the .tr files for each image ```tesseract kor.AppleMyungjo.exp0.tif kor.AppleMyungjo.exp0 -l kor box.train ``` (both image got an empty .tr file)
6 - Created the unicharset file ```unicharset_extractor [box file 0] [box file 1]...```
7 - Created the font_properties, only has the ```AppleMyungjo 0 0 1 0 0```
8 - Cloned the tesseract repo to my mac, path ```~/projects/tesseract```
9 - cloned the langdata repo to my mac, path ```~/projects/langdata```
10 - Found the folder where the brew installed my tesseract, path ```/usr/local/Cellar/tesseract/HEAD-f8e26ee/share/tessdata```
11 - Executed the ```~/projects/tesseract/training/``` file

sudo ~/projects/tesseract/training/ \
  --fonts_dir /Library/Fonts  \
  --lang kor \
  --linedata_only  \
  --noextract_font_properties  \
  --exposures "0"    \
  --langdata_dir ~/projects/langdata \
  --tessdata_dir /usr/local/Cellar/tesseract/HEAD-f8e26ee/share/tessdata \
  --output_dir ~/tesstutorial/kor \
  --fontlist "AppleMyungjo"
and got the error:
=== Starting training for language 'kor'
mktemp: illegal option -- -
usage: mktemp [-d] [-q] [-t prefix] [-u] template ...
       mktemp [-d] [-q] [-u] -t prefix
[Wed Apr 4 13:26:24 -03 2018] /usr/local/bin/text2image --fonts_dir=/Library/Fonts --font=AppleMyungjo --outputbase=/sample_text.txt --text=/sample_text.txt --fontconfig_tmpdir=
Fontconfig error: Cannot load default config file

=== Phase I: Generating training images ===
Rendering using AppleMyungjo
[Wed Apr 4 13:26:25 -03 2018] /usr/local/bin/text2image --fontconfig_tmpdir= --fonts_dir=/Library/Fonts --strip_unrenderable_words --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/tmp.d1OKhvnG/kor/kor.AppleMyungjo.exp0 --max_pages=3 --font=AppleMyungjo --text=/Users/fernandogot/projects/langdata/kor/kor.training_text
Fontconfig error: Cannot load default config file
ERROR: /var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/tmp.d1OKhvnG/kor/ does not exist or is not readable
ERROR: /var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/tmp.d1OKhvnG/kor/ does not exist or is not readable

I found that the ```Fontconfig error: Cannot load default config file``` was being generated because of the mktemp on mac, I fixed it replacing the code:

- export FONT_CONFIG_CACHE=$(mktemp -d --tmpdir font_tmp.XXXXXXXXXX)
+ export FONT_CONFIG_CACHE=$(mktemp -d -t font_tmp.XXXXXXXXXX)
After executing the same code I get:

=== Starting training for language 'kor'
[Wed Apr 4 14:13:38 -03 2018] /usr/local/bin/text2image --fonts_dir=/Library/Fonts --font=AppleMyungjo --outputbase=/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/font_tmp.XXXXXXXXXX.X52wexDs/sample_text.txt --text=/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/font_tmp.XXXXXXXXXX.X52wexDs/sample_text.txt --fontconfig_tmpdir=/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/font_tmp.XXXXXXXXXX.X52wexDs

=== Phase I: Generating training images ===
Rendering using AppleMyungjo
[Wed Apr 4 14:13:40 -03 2018] /usr/local/bin/text2image --fontconfig_tmpdir=/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/font_tmp.XXXXXXXXXX.X52wexDs --fonts_dir=/Library/Fonts --strip_unrenderable_words --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/tmp.pydbGWuE/kor/kor.AppleMyungjo.exp0 --max_pages=3 --font=AppleMyungjo --text=/Users/fernandogot/projects/langdata/kor/kor.training_text
ERROR: /var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/tmp.pydbGWuE/kor/ does not exist or is not readable
ERROR: /var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/tmp.pydbGWuE/kor/ does not exist or is not readable

So I'm stuck at these 2 erros, I do have this file in the folder that Im executing the code ```~/projects/ocr/trainning/```, but what can I do to make it work?

Thanks for reading all this text and for your time

ShreeDevi Kumar

Apr 4, 2018, 9:53:21 PM4/4/18
Training tesseract 4.0.0 is different from process for 3.0x.

Training  using images is not supported for tesseract 4.0.0.

You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To post to this group, send email to
Visit this group at
To view this discussion on the web visit
For more options, visit


Apr 5, 2018, 1:27:08 PM4/5/18
to tesseract-ocr
Thanks for the quick response, I did not see this part in the documentation ...

My problem is that in the image "kor.AppleMyungjo.exp0.tif" the tesseract is recognizing nothing, the box file is empty and in the image "kor.AppleMyungjo.exp1.tif" it is not recognizing the last quotation marks (") and period (.) Can I fix this by running some tests with fonts?



Reply all
Reply to author
0 new messages