Re: [tesseract-ocr] Re: How to increase the accuracy of Tesseract OCR

467 views
Skip to first unread message
Message has been deleted
Message has been deleted

Art W Rhyno

unread,
Aug 26, 2014, 11:44:40 AM8/26/14
to tesser...@googlegroups.com
> Kindly advise the solution to make the image readable by Tesseract OCR.

Hi,

Since you already have leptonica with tesseract, you might have some luck by going through its line removal example [1]. I converted your sample image to a grayscale and ran the lineremoval code, and then put the results through tesseract. It could be refined more but it might be useful as a starting point.

art
---


1. http://www.leptonica.com/line-removal.html
test.jpg
test.txt
Message has been deleted

Art W Rhyno

unread,
Aug 27, 2014, 8:16:02 AM8/27/14
to tesser...@googlegroups.com
> Is it possible to do the same line removal in leptonica using Java?

Hi,

The lineremoval code is located in the "prog" subdirectory of the leptonica distribution. You could copy "lineremoval.c" into a separate subdirectory and use it as a starting point. I don't know how hard it would be to implement it in java but you could probably use the Java Native Interface to call the code directly from a java program. The tesseract-android-tools [1] project might have some building blocks for java integration, alas, I have not done much with tesseract/leptonica in java. This project [2] also mentions a Java API for accessing natively-compiled Tesseract and Leptonica APIs but I haven't looked into it. The lineremoval source code documents each step, it is not always effective for all types of lines but I think it would work well for the images you are dealing with.

art
---
1. https://code.google.com/p/tesseract-android-tools
2. https://github.com/rmtheis/tess-two

Albrecht Hilker

unread,
Sep 18, 2014, 1:07:48 PM9/18/14
to tesser...@googlegroups.com
Is this really necessary ?

When you switch Tesseract to PSM_AUTO mode it detects horizontal lines automatically.
Reply all
Reply to author
Forward
0 new messages