Table Detection in PDF

484 views
Skip to first unread message

MUHAMMAD ADNAN

unread,
Feb 10, 2017, 8:39:51 AM2/10/17
to tesseract-ocr
Hi,
I have some scanned pdf files  which contain table on each page , some tables have borders and some without border and lines.
I want to extract the formatted table with data in it to a word or excel format.I am totally new to tesseract-ocr and don't know how to use this in C++ or C#.
Proper Guidance on detection of table and saving output using tesseract is highly appreciated.
Thanks

Best Regards
Adnan

John Muccigrosso

unread,
Feb 12, 2017, 2:48:49 PM2/12/17
to tesseract-ocr
You might want to use Tabula instead, provided that the pdf contains the text and numbers and not just images of them.

Reply all
Reply to author
Forward
0 new messages