v4.1.1 - Training From Scratch got Segmentation fault

68 views

Skip to first unread message

unread,

Sep 29, 2021, 2:21:26 AM9/29/21

to tesseract-ocr

Hello,

Following the tutorial "Training From Scratch", use langdata_lstm and tesstrain.sh.

I got an error "Segmentation fault" when I executed tesstrain.sh.

Error log:

=== Phase E: Generating lstmf files ===

Loaded 89754/89754 lines (1-89754) of document /tmp/chi_tra-2021-09-09.CGU/chi_tra.AR_PL_UKai_TW.exp0.lstmf

tesseract/src/training/tesstrain_utils.sh: line 73: 3787663 Segmentation fault (core dumped) "${cmd}" "$@" 2>&1

3787664 Done | tee -a "${LOG_FILE}"

ERROR: Program tesseract failed. Abort.

There are three questions about this error.

1. Is tessdata_best/lang.traineddata trained by langdata_lstm and tesstrain.sh?

2. How could I reproduce tessdata_best/lang.traineddata?

3. If training_text is too large, how could I avoid this error?

Thank you in advance!

Environment:

Ubuntu 20.04

tesseract 4.1.1

leptonica-1.79.0

libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1

Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.4

Reply all

Reply to author

Forward

0 new messages