Hello all,
I am not sure if this is the right forum, but would love to get any pointers.
I am volunteering with a local Hindi newspaper and want to get their editions online in web searchable format. Here is the link to the site.
The biggest hurdle I am facing is to convert the fonts the paper is encoded in (APS-Priyanka) and converting them to unicode (assuming that I can extract the text from the pdfs and keeping the formatting issues on the side for the moment)
From what I gathered from web searches, APS Priyanka is a really old font and does not follow any specific encoding like ISCII etc. I tried some basic scripts and character maps but it does not seem like a "trivial" problem.
If anyone has experience in this and can help, it would be great.
best,
Rushabh