I am trying to do do OCR using tesseract on images. I am unable to figure out a proper pre processing technique for the same.
the problems I am facing is:
1. Low contrast images: The images have different texts with different font sizes. So what should be my approach to enhance the contrast of any image.
2. Problem of touching characters: Sometimes after applying adaptive thresholding I am facing the problem of touching characters (in which two adjacent characters are touching each other) What is the best way to figure out a solution for that.
3. Problem of non uniform illumination: How should I proceed if I want to solve the problem of non uniform illumination ?
How can image segmentation solve my problem ?
I have added a sample image. Assume that the image is not rotated as it is there in the picture. But the variety of font sizes and the text segments in the image are exact replica of what I am asking about ? Apart from above mentioned steps, I would appreciate any kind of suggestion for pre - processing of the above image. Let me know if you have worked out a solution for something related to this.
Thanks