Deam Ramakrishna Ji,
Would you be open to discuss projects on creating open source training corpus of Sanskrit?
Training corpus is an important element in building AI based tools. All the recent progress in AI and ML is attributed to availability of better data. It has been done in all first world languages, like English, French, German, Russian, even Hebrew. But nothing of that sort is done for Sanskrit. We should take steps.
As some early projects, I suggest direct import of most famous corpus like imagenet and tagging them in Sanskrit. This will help the general masses to train Sanskrit based models and also can make compilers using NLP techniques.
I have my own interest in such data set, as disclosure. I am working for years in this direction and have been only marginally successful due to lack of data. I built one translator system recently for English>Sanskrit, which is 0.02% complete, and is WIP.
I believe, if we unite and put a plan, the program can be funded by government as well. As this is essential for betterment of language tools, essentially Sanskrit.
Looking forward to your reply.
Namo Namah,
Prabhat