Dear, all.
First, I'd like to thank you for maintaining the Tesseract community alive. Second I'd like to share some questions about the training process I am using.
Following the steps in the tutorial, I was able to create the
box/tiff pairs and
lstmf files with a
ttf font file. The problem I had was the recognition was barely adequate for the font provided. I realized the data I was testing was corrupted with lots of pixels missing after applying the filters for text segmentation. Is there any way to use OpenCV or any filter for deteriorating the
tiff file? I could see the Tesseract includes extra pixels in the borders of the characters. Is there any parameter to remove instead of adding?
Thank you so much!
Training tiff File contains:
Image after processing contains:
