Some underlines in my image are are very close to the text. For that particular text tesseract is unable to produce accurate results. Most of the places where results are not accurate there the text is so close to lines. I have attached the image and text file. Is there any way i can increase accuracy of the text?
I have tried to remove the underlines with some of the image processing techniques, but the problem is those lines which are close to the text are not getting removed and the rest are removed.
And are there any parameters in tesseract which i can use to improve the accuracy? Thanks in advance.
Hi,
I have posted about this approach before but I tried the line removal example [1] included with leptonica on your pdf. I have had luck before using it with tesseract for images with horizontal lines. It takes out parts of letters in this case but I think the OCR is improved, the resulting image is here [2].
art
---
1. http://www.leptonica.com/line-removal.html
2. https://drive.google.com/file/d/0B-PK1n92dlzwa24zd1NQMC14WTQ/view?usp=sharing
Hi Guna,
If you look for the program “lineremoval.c” in the leptonica distribution, it shows the programming for the logic described in [1]. B