Hi Ali,
Do you think the starting and stopping at a specific line would also be possible for the actual training, just you have done for the text2image?
Today, I have been very surprised that tesseract always restarting from the beginning every time we interrupted the process.
This is very bad; it can definitely degrade the accuracy of the training especially for larger data sets, because the training is quintessentially running only on some lines (latter text lines are ignored).
So, if you have 800,000 text lines; and you run your training step by step:
Round 1: 10,000 iterations
Round 2: 10000, 000 iterations
Round 3: 400,000 iterations
Round 4: 400,000 iterations
Basically, you used only 400,000 text lines. The other 400,000 text lines are not used for training. They are wasted.
So, it would be great if we can have similar python script that could stop and resume the training.