How to OCR two or three column document using tesseract

1,435 views
Skip to first unread message

Justin erno

unread,
Jul 27, 2015, 5:40:45 AM7/27/15
to tesseract-ocr
Hi all,
   My goal is ocr, multiple column including text in a document. And get out put file in a correct format.
Is there any method to identify column in a document using tesseract?


Thanks in advance :)


Helmut Wollmersdorfer

unread,
Jul 29, 2015, 3:44:18 AM7/29/15
to tesseract-ocr, spkm...@gmail.com


Am Montag, 27. Juli 2015 11:40:45 UTC+2 schrieb Justin erno:
Hi all,
   My goal is ocr, multiple column including text in a document. And get out put file in a correct format.
Is there any method to identify column in a document using tesseract?


Did a two column document yesterday. Tesseract recognices it automatically. In plain text output column 2 appears after column 1.

In hOCR or PDF output you need to deal with the coordinates. 
Reply all
Reply to author
Forward
0 new messages