I have a script to train tesseract and I ran it on Arch Linux, Debian, and even a docker container and they all produce the same errors. I checked to make sure the script is correct as well.
Bug 1:
This happens when tesstrain runs text2image. The max pages parameter does not work at all. It ends up only rendering 4 pages regardless of what I pass in for the maxpages parameter. I even tried hardcoding it into the tesstrain_utils.sh file and it still does the same thing.
Bug 2:
After it finishes producing those 4 pages, i finetune it with lstmtraining and the resulting output is full of "Encoding of string failed!" errors.
Bug 3:
Along with those encoding errors, it also outputs the following text:
"Image too small to scale!! (2x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable"
I will upload my script along with the Dockerfile if anyone wants to take a look.