Improve accuracy of underlined text

200 views
Skip to first unread message

akhil katpally

unread,
Mar 20, 2016, 4:18:38 PM3/20/16
to tesseract-ocr

Some underlines in my image are are very close to the text. For that particular text tesseract is unable to produce accurate results. Most of the places where results are not accurate there the text is so close to lines. I have attached the image and text file. Is there any way i can increase accuracy of the text?


I have tried to remove the underlines with some of the image processing techniques, but the problem is those lines which are close to the text are not getting removed and the rest are removed. 


And are there any parameters in tesseract which i can use to improve the accuracy? Thanks in advance.  

abc.pdf
abc.txt

Art Rhyno.

unread,
Mar 23, 2016, 12:26:58 PM3/23/16
to tesser...@googlegroups.com

Hi,

 

I have posted about this approach before but I tried the line removal example [1] included with leptonica on your pdf. I have had luck before using it with tesseract for images with horizontal lines. It takes out parts of letters in this case but I think the OCR is improved, the resulting image is here [2].

 

art

---

1. http://www.leptonica.com/line-removal.html

2. https://drive.google.com/file/d/0B-PK1n92dlzwa24zd1NQMC14WTQ/view?usp=sharing

Gunasekaran Velu

unread,
Mar 24, 2016, 2:47:01 AM3/24/16
to tesseract-ocr
Hi Art Rhyno

I have same problem(underline issue).

If you can, can you share line removal code?

Looking forward your reply.

Regards
Guna

Art Rhyno.

unread,
Mar 24, 2016, 9:11:47 AM3/24/16
to tesser...@googlegroups.com

Hi Guna,

 

If you look for the program “lineremoval.c” in the leptonica distribution, it shows the programming for the logic described in [1]. B

akhil katpally

unread,
Mar 25, 2016, 4:27:00 PM3/25/16
to tesseract-ocr
Thank you so much for your help. 

Gunasekaran Velu

unread,
Mar 29, 2016, 1:41:56 AM3/29/16
to tesseract-ocr
Hi

Thanks for the information.


Regards
Guna
Reply all
Reply to author
Forward
0 new messages