tesstrain.sh: /tmp/tmp.XXXXX/xxx/xxx.Font.exp0.box does not exist or is not readable

519 views
Skip to first unread message

Dan9er

unread,
Sep 7, 2017, 2:19:09 PM9/7/17
to tesseract-ocr
I'm trying to train tesseract using tesstrain and I'm getting this error: https://pastebin.com/xJj3w9jZ

Dan9er

unread,
Sep 9, 2017, 11:57:18 AM9/9/17
to tesseract-ocr
I think I now know how to do it.

I have to run training/text2image --find_fonts and then set the tesstrain --fontlist flag to the file that is generated.

ShreeDevi Kumar

unread,
Sep 9, 2017, 12:28:43 PM9/9/17
to tesser...@googlegroups.com
I don't think you can use that file directly, but yes, you can update the font list for your language either in kanguage-specific.sh or via command line.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/05711501-67dd-4008-a143-e56734c031fd%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Dan9er

unread,
Sep 9, 2017, 12:42:27 PM9/9/17
to tesseract-ocr

ShreeDevi Kumar

unread,
Sep 9, 2017, 12:51:45 PM9/9/17
to tesser...@googlegroups.com
Your command needs to be on the following lines:

training/tesstrain.sh \
  --fonts_dir /home/shree/.fonts \
  --tessdata_dir ./tessdata \
  --training_text ../langdata/ben/ben.training_text \
  --langdata_dir ../langdata \
  --lang ben  \
  --linedata_only \
  --noextract_font_properties \
  --exposures "0"    \
  --fontlist "e-Grantamil" \
             "e-Grantha OT" \
  --output_dir ~/tesstutorial/ben

See the fontlist argument, it is quoted names of the fonts. You can put one on each line with \



ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

ShreeDevi Kumar

unread,
Sep 9, 2017, 12:59:08 PM9/9/17
to tesser...@googlegroups.com

it has a command which will build a list of fonts in the format that you can copy as part of command line.

Please modify appropriately for your case.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Dan9er

unread,
Sep 9, 2017, 3:09:03 PM9/9/17
to tesseract-ocr
Ok, I made a sh that runs tesstrain.sh with all 562 compatible fonts. But now I'm getting an error saying ./langdata/common.punc does not exist... https://pastebin.com/8aaMjH6k 
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

ShreeDevi Kumar

unread,
Sep 9, 2017, 3:22:48 PM9/9/17
to tesser...@googlegroups.com
https://github.com/tesseract-ocr/langdata/blob/master/common.punc

You should read the Readme.md in langada repo for info on the files required for training g

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Dan9er

unread,
Sep 10, 2017, 12:07:27 PM9/10/17
to tesseract-ocr
I added the common.punc file. But now I'm getting the box error again: https://pastebin.com/BsNL3KJv

ShreeDevi Kumar

unread,
Sep 10, 2017, 12:13:25 PM9/10/17
to tesser...@googlegroups.com
  1. Fontconfig error: line 1: no element found
  2. Fontconfig error: Cannot load default config file
  3. Could not find font named NanumMyeongjo Semi-Bold.
  4. Pango suggested font NanumMyeongjo Bold.
  5. Please correct --font arg.
  6. ERROR: /tmp/tmp.tiLxemomPr/npn/npn.NanumMyeongjo_Semi-Bold.exp0.box does not exist or is not readable

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Dan9er

unread,
Sep 10, 2017, 2:08:10 PM9/10/17
to tesseract-ocr
Did that, and it actually started training! It almost got to the end of my font list before...
ERROR: /tmp/tmp.YLL4mGn66F/npn/npn.Aileron_Heavy.exp0.tr does not exist or is not readable

Also, there were some ERRORs, but there weren't any FATALITYies (lol), so I think i'm good.

v-room

unread,
Sep 10, 2017, 3:19:53 PM9/10/17
to tesseract-ocr
ask java

Dan9er

unread,
Sep 13, 2017, 6:29:13 PM9/13/17
to tesseract-ocr
What you mean "Ask Java"? Tesseract is 90% C++, silly.

Dan9er

unread,
Sep 15, 2017, 7:05:27 PM9/15/17
to tesseract-ocr
Bump

Dan9er

unread,
Sep 16, 2017, 7:41:39 PM9/16/17
to tesseract-ocr
I ditched my 500+ font fontlist for one with just 3. It runs much faster now, and I got to Phase M before I got a ./langdata/font_properties does not exist or is not readable error.

Dan9er

unread,
Sep 17, 2017, 12:51:20 PM9/17/17
to tesseract-ocr
Added that and it worked perfectly.

I'm finally done.
Reply all
Reply to author
Forward
0 new messages