V S Rawat
unread,Aug 10, 2014, 10:17:07 AM8/10/14Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to tocr
We often get text in which images or pdf have tables.
Text is in several columns, which should be treated separated and should
be put in the same line with some separator like tab and quotes to get
csv format.
However my method of tesseract at vietocr.Net doesn't help there.
It does recognizes separate areas, and ocrs them separately, but puts
that one column below the other, say, all rows of first column at top,
then all rows of second column, then all rows of next column so on.
It is not much helpful because it takes lots of efforts to put all text
of one row together.
Is there any method of making tesseract identify tables and do ocr in
some helpful way?
or should this problem be addressed to frontend vietocr.Net developers?
Thanks.
--
Rawat