Improving accuracy of Sinhala text detection

Skip to first unread message

Umanda Dikwatta

Aug 12, 2022, 12:29:50 AMAug 12
I am trying to extract Sinhala text from Sinhala memes. I used the quality improvement methods mentioned in the tesseract page and obtained 58% accuracy using 100 images. But I need to improve this to 90%.Some images have low accuracy when extracting text. Attached image has a low accuracy. I tried several images with the same type of font and received a low accuracy. Attached image has a different font and Tesseract has poor results when extracting text from this font.

As I read Tesseract documentation, we can get the text parts as ground truth and train the tesseract to improve the accuracy. We also can train tesseract for fonts. What is the method I should use? By the way, how should I know the font used in this image?

What should be the best method I can follow to improve the accuracy of the following image? Should I use Tesseract 4 or 5?
Reply all
Reply to author
0 new messages