Hi everyone,
I would like to train Tesseract on my own dataset comprising of word images. I have the bounding box information but for the whole word instead of per character. I referred to the following documentation available on the topic of training Tesseract 4.0.
On the documentation, it is mentioned that "The boxes only need to be at the textline level. It is thus far easier to make training data from existing image data.". But later in the wiki, the box format that allows boxes at text line level is said not to be implemented as of yet ("Box File Format - Second Option (NOT YET IMPLEMENTED)"). I would therefore, like to know if there is any way to train Tesseract based on just the word bounding box information instead of character level information?
Thanking you for your time in this regard.