Hello,
I've been using Tesseract 4.1 for some time. I am using Tesseract with Sinhala language. I got good results for most of the images I tried. I trained Tesseract with different fonts. But as the documentation says, I had to preprocess my images to obtain good results.
Then I tried Tesseract 5 with line images as .tif and the labels as .gt.txt. Then I used the generated .traineddata file to extract the text. But that didn't give me good results. I used image processing segmentation to obtain line images. Is it wrong to obtain line images using python segmentation?
Could someone please explain me the possible reason?
Thank you very much