It seems that when training we only have to input training_text, and then you train the training_text on different fonts. Tesseract will create images itself during training. And we don't have to give tesseract our image during training. Does this mean retrain will only help with fonts but not page layout? Meaning there's no way you can affect the way tesseract does the segmentation? (I understand that you can use --psm)
I'm just wondering whether training will help you get better result for special layout, like a tabular image, with usual fonts.
On the other hand, it seems we can also create our own .box file and so the training. I guess I have the above idea just because I was drawing conclusion from fine tuning with a few characters.