This time I ran the following command to try and prepare 1 font for training
src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only --noextract_font_properties --langdata_dir ~/tesstutorial/langdata \
--tessdata_dir ~/tesstutorial/tesseract/tessdata --output_dir ~/tesstutorial/engtrain --fontlist "Courier New" --overwrite
This gets much further than the command in post titled "Unclear error message when running tesstrain.sh".
It now ran for hours and resulted in this:
Page 3302
Loaded 171652/171652 lines (1-171652) of document /tmp/eng-2019-09-16.2SS/eng.Courier_New.exp0.lstmf
src/training/tesstrain_utils.sh: line 72: 12131 Segmentation fault (core dumped) "${cmd}" "$@" 2>&1
12132 Done | tee -a "${LOG_FILE}"
ERROR: Program tesseract failed. Abort.
Frankly
this failure to get TessTutorial to work after 2 weeks of attempts is
rather unsatisfying. So are the uninformative messages.
What are the minimum system requirements for this
to work? I am using Ubuntu 16.04 in VirtualBox with 8Gb RAM and 4
cores.
David