Dear all,
I am developing a tool to extract a table from an image. It is a big undertaking but I hope to release a beta version soon.
The input to the tool is a PNG/JPG/PDF image and output is a CSV/ODT/XLS table.
I have some simple tables extracted from PDF. If there are formats which govt uses often and people often need/want to digitize them, I'd like to have some samples. I am thinking of census data, GIS data etc..
There is no plan to support multi-page tables. I can use some advice on the OCR backend (I am using pytesseract from google for now).
best,
Dilawar
--