Avinash,
Those pdf files will have been generated from a text file in the first place.
And Air broadcasts will most probably be scripted broadcasts.
The whole idea behind Portable Document Format files is that the document itself contains all required fonts for the language(s) it employs.
Thus to be accesssible an all platforms.
Given that the language of interest is Sanskrit that font will be Devanagari. Easily obtainable by everyone.
In short, why not simply ask the supplier of those pdf's if you can access the source texts.
Especially if the pdf comes in user friendly unprotected Security mode, as is the case of those AIR pdfs.
But much more flexible than a database is an algorithm with a fuzzy speller that scans the entire text or batches of texts, digitally.
Any Database then needs only be a list of URL's of distributed texts that are already out there.
But again, such lists, fuzzy spellingly searchable already exist, select वेदान्त, right click on it, and select Search Google from the dropdown...
... and select from 202,000 odd results.
Regards,
Taff_Rivers,
Research & Development, Information Technology, retd.