Column based detection

303 views
Skip to first unread message

temp name

unread,
Apr 7, 2014, 8:36:56 AM4/7/14
to tesser...@googlegroups.com
Hello,

I have a image which has text in tabular format, but borders of the table has been removed. The table has two rows an two columns. 

The first row first column and second column has some text too. 
But the second row first column is empty and second row second column has some text.

When I tried this image with Tesseract it recognizes the text as " DE Abc FG" .

Do anyone know how to fore tesseract to recognize text from first column first then from second column.

Thanks in Advance!
Capture.PNG

Eugene Shkel'

unread,
Apr 24, 2014, 3:59:52 PM4/24/14
to tesser...@googlegroups.com
You can perform segmentation by columns. After that you will have two regions (each region correspond to column). Next - recognize each region separately. Text recognized from first region will be first column, text recognized from second region will be second column.

For segmentation by columns you can use constant widths if you know or you can find it finding specific number of "empty" (without colored pixels) vertical lines on your picture. You can configure number of such lines depending of gap between columns on picture, small gap can lead to incorrect segmentation (it will separate letters instead of columns). 

понедельник, 7 апреля 2014 г., 15:36:56 UTC+3 пользователь temp name написал:
Reply all
Reply to author
Forward
0 new messages