Thank you for the link!
Here are instructions that I have figured out so far for fine-tuning an existing model:
On Ubuntu 18.04 first I double checked for right packages
dpkg -s libtesseract-dev (unsure if this package is necessary but I installed it a while ago)
~$ tesseract --version
tesseract 4.0.0-beta.1
cd to tesstrain directory
Then start the training process with the following command:
make -r training START_MODEL=Fraktur TESSDATA=~/train/tessdata/script GROUND_TRUTH_DIR=~/train/data_train_2020_1_28_16_49_54 MODEL_NAME=Frak_LV_J29
so ~/train/tessdata/script/Fraktur.traineddata will be used for start
while GROUND_TRUTH_DIR holds 6k pairs of .gt.txt and .tif files
Defaults: 10,000 epoch run and 10% of GROUND_TRUTH_DIR will be used for testing assuming wiki is correct
My only worry is that my .tif files apparently have no dpi information so default of 70 is used.
Are the warnings about lack of dpi a bad sign?
Interestingly, .png failes are used when running training so I could have perhaps skipped conversion to .tif since I started with .png! :)
Now, the big question, how long will it take to run 10,000 epochs on average 4 core Xeon v3 VM?