maNipravALa tagging project?

19 views
Skip to first unread message

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Jun 28, 2022, 12:51:28 PM6/28/22
to Vinodh Rajan, sanskrit-programmers
(shrI- GSSM to bcc)


On Tue, 28 Jun 2022 at 20:34, Vinodh Rajan <vinodh...@gmail.com> wrote:
Dear Vishvas,

Just a side note: Transliterating Tamil-ized Manipravala back to Tamil+(Grantha)Sanskrit is a ML endeavour. It is something that I want to do in the future. I just don't have time. It is a fun Bachelors or Masters project :) I did offer it to some people (at the University of Hamburg, when I was working there) but it had no takers. 

+ sanskrit-programmers in case some there is motivated to take this up. (details below)

One would need to tag the individual parts as Sanskrit or Tamil and then predict the Sanskrit spelling from the defective Tamil spelling. (As in both balam/phalam become பலம் (palam) in the Tamil script. Reversing it requires some amount of heuristics). There is enough training material for Manpravala though that will allow such tagging (AFAIK it is just a simple PoS tagging at the end of the day). Even, a purely heuristic rule-based prediction should more or less work.

(Have a look at: http://www.manipravala.com/wp/. This is a site maintained by me and a friend of mine, who is a Manipravala scholar, particularly Srivaishnava Manipravala. Also this: http://www.manipravala.com/wp/how-to-use-manippavalam/ if you are interested in composing in Manipravala). 

For instance:

Input:

ஆளவந்தாருடைய நியோகத்தாலே ஶ்ரீபாஷ்யகாரரை அங்கீகரித்த பூர்ணரான பெரியநம்பி
āḷavantāruṭaiya niyōkattālē śrīpāṣyakārarai aṅkīkaritta pūrṇarāṉa periyanampi

Tagging:

<ta>āḷavantāruṭaiya</ta> <sa>niyōka</sa<ta>ttālē</ta> <sa>śrīpāṣyakā</sa><ta>rarai<ta> <sa>aṅkīka</sa><ta>ritta<ta> <sa>pūrṇa</sa><ta>rāṉa<ta> <ta>periyanampi</ta>

Output:

āḷavantāruṭaiya niyōgattālē śrībhāṣyakārarai aṅgīkaritta pūrṇarāṉa periyanampi
ஆளவந்தாருடைய नियोगத்தாலே श्रीभाष्यकारரை अङ्गीकरिத்த पूर्णराன பெரியநம்பி
image.png

-------

I suppose you've transcribed the Tamilized-Manipravala to Devangari. It probably rendered even Sanskrit words with Tamilized pronunication. 

In any case dealing with Manipravala in only the Tamil script is tricky if you want to back-form the proper source text. Unfortunately, more and more publishers are printing MP text only in the Tamil script and the nuances are being lost as many Sanskrit words become very ambiguous in the Tamil script.

(This was a presentation we did some time ago regarding the same topic: https://www.youtube.com/watch?v=ZznFNpgihvw)

Cheers,

Vinodh

Reply all
Reply to author
Forward
0 new messages