Reading text from a table

147 views
Skip to first unread message

max.r...@auxilion.de

unread,
May 4, 2016, 5:26:56 AM5/4/16
to tesseract-ocr
Hi,

I am trying to use Tesseract to extract the text from the attached image.
I am particularly interested in reading the values from the table. Unfortunately the table is not read while everything else is fine.
I have tried to crop the image so that it only contains the table which does not help. But when just reading a single line of the table it works fine.
So I assume the problem is caused by the thick lines of the table? What can I try to solve the problem?

Thanks and Best Regards,
Max

test_row.png
test_table.png
test.png

Quan Nguyen

unread,
May 4, 2016, 11:18:55 PM5/4/16
to tesseract-ocr
You'll need to remove the lines first. Try the following algorithm outlined in Leptonica example:


Or use OpenCV:

max.r...@auxilion.de

unread,
May 9, 2016, 10:30:45 AM5/9/16
to tesseract-ocr
Thank you very much for your quick answer and the provided links.
I had hoped for a slightly simpler solution.
I had found the first link already myself. But the second also seems to be interesting.

Reply all
Reply to author
Forward
0 new messages