So I had run into the proverbial pdf parsing problem for a project.
Tamil + English PDF and its kinda getting a bit harrowing.
I've tried a couple of things like PyPDF and pdfminer.
The structuring of the documents and datapoints seems a bit amiss and running after them gets a bit tedious.
Just thought I should send a shoutout and ask if anyone knows of a solution that works well.
I wanted to know which python libraries are good for extracting data from unicode (tamil+english) PDFs and parsing unicode and tamil characters.
--
Regards,
Johnson Chetty