Table Detection in PDF

484 views

Skip to first unread message

MUHAMMAD ADNAN

unread,

Feb 10, 2017, 8:39:51 AM2/10/17

to tesseract-ocr

Hi,
I have some scanned pdf files which contain table on each page , some tables have borders and some without border and lines.
I want to extract the formatted table with data in it to a word or excel format.I am totally new to tesseract-ocr and don't know how to use this in C++ or C#.
Proper Guidance on detection of table and saving output using tesseract is highly appreciated.
Thanks

Best Regards
Adnan

John Muccigrosso

unread,

Feb 12, 2017, 2:48:49 PM2/12/17

to tesseract-ocr

You might want to use Tabula instead, provided that the pdf contains the text and numbers and not just images of them.