Hi Shree,
Thank you for your suggestion. As per the suggested method, it improves the pass percentage of the test cases. but the consistency of the extraction of mixed language text is not up to the mark. Some times tesseract is able to extract the characters correctly but not all the time.
e.g. in one of the scenarios, it is able to detect English alphabets that come at the start of the text but in the next text, the English alphabet coming at the end of the text is not getting extracted properly.
One more problem we have identified that in a few of the images we have numbers present in the superscripts, while applying OCR, the superscripts numbers are not getting extracted.
Please suggest.