Improve accuracy of underlined text

akhil katpally

unread,

Mar 20, 2016, 4:18:38 PM3/20/16

to tesseract-ocr

Some underlines in my image are are very close to the text. For that particular text tesseract is unable to produce accurate results. Most of the places where results are not accurate there the text is so close to lines. I have attached the image and text file. Is there any way i can increase accuracy of the text?

I have tried to remove the underlines with some of the image processing techniques, but the problem is those lines which are close to the text are not getting removed and the rest are removed.

And are there any parameters in tesseract which i can use to improve the accuracy? Thanks in advance.

abc.pdf

abc.txt

Art Rhyno.

unread,

Mar 23, 2016, 12:26:58 PM3/23/16

to tesser...@googlegroups.com

Hi,

I have posted about this approach before but I tried the line removal example [1] included with leptonica on your pdf. I have had luck before using it with tesseract for images with horizontal lines. It takes out parts of letters in this case but I think the OCR is improved, the resulting image is here [2].

art

---

1. http://www.leptonica.com/line-removal.html

2. https://drive.google.com/file/d/0B-PK1n92dlzwa24zd1NQMC14WTQ/view?usp=sharing

Gunasekaran Velu

unread,

Mar 24, 2016, 2:47:01 AM3/24/16

to tesseract-ocr

Hi Art Rhyno

I have same problem(underline issue).

If you can, can you share line removal code?

Looking forward your reply.

Regards

Guna

Art Rhyno.

unread,

Mar 24, 2016, 9:11:47 AM3/24/16

to tesser...@googlegroups.com

Hi Guna,

If you look for the program “lineremoval.c” in the leptonica distribution, it shows the programming for the logic described in [1]. B

art

---

1. http://www.leptonica.com/line-removal.html

akhil katpally

unread,

Mar 25, 2016, 4:27:00 PM3/25/16

to tesseract-ocr

Thank you so much for your help.

Gunasekaran Velu

unread,

Mar 29, 2016, 1:41:56 AM3/29/16

to tesseract-ocr

Hi

Thanks for the information.

Regards

Guna

Reply all

Reply to author

Forward