Is it right that training can only help with different font but not page layout

29 views

Skip to first unread message

Jingjing Lin

unread,

Jun 14, 2019, 11:51:17 AM6/14/19

to tesseract-ocr

It seems that when training we only have to input training_text, and then you train the training_text on different fonts. Tesseract will create images itself during training. And we don't have to give tesseract our image during training. Does this mean retrain will only help with fonts but not page layout? Meaning there's no way you can affect the way tesseract does the segmentation? (I understand that you can use --psm)

I'm just wondering whether training will help you get better result for special layout, like a tabular image, with usual fonts.

On the other hand, it seems we can also create our own .box file and so the training. I guess I have the above idea just because I was drawing conclusion from fine tuning with a few characters.

Reply all

Reply to author

Forward

0 new messages