Developing tool to extract table from images

96 views
Skip to first unread message

Dilawar Singh

unread,
May 26, 2021, 9:28:30 AM5/26/21
to datameet
Dear all,

I am developing a tool to extract a table from an image. It is a big undertaking but I hope to release a beta version soon.

The input to the tool is a PNG/JPG/PDF image and output is a CSV/ODT/XLS table.

I have some simple tables extracted from PDF. If there are formats which govt uses often and people often need/want to digitize them, I'd like to have some samples. I am thinking of census data, GIS data etc..

There is no plan to support multi-page tables. I can use some advice on the OCR backend (I am using pytesseract from google for now).

best,
    Dilawar

--
Dilawar Singh, Ph.D.

Sarath Guttikunda

unread,
May 26, 2021, 11:56:03 PM5/26/21
to dilawar....@gmail.com, data...@googlegroups.com
If you can test on these
daily bulletins to automate tabulation, that will be very helpful.
Sample file attached

With best wishes,
Sarath

--
Dr. Sarath Guttikunda
http://www.urbanemissions.info


--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/datameet/CAM72-Zs9PT7CNZONjCUWM3%3D%3DiNDyfhVPg7Yhko1ALJ_Cmp25%2Bw%40mail.gmail.com.
AQI_Bulletin_20210526.pdf
Reply all
Reply to author
Forward
0 new messages