Re: छन्दोनाममूलम् [and update on sanskritmetres.appspot.com]

37 views
Skip to first unread message

Shreevatsa R

unread,
Jul 7, 2014, 1:34:05 AM7/7/14
to विश्वासो वासुकिजः (Vishvas Vasuki), sanskrit-programmers
2014-07-06 23:22 GMT+05:30 विश्वासो वासुकिजः (Vishvas Vasuki) <vishvas...@gmail.com>:
मम कौतूहलमत्र वर्तते - कुतः प्राप्नोः छन्दोनामानि? किञ्च http://sanskrit.sai.uni-heidelberg.de/Chanda/HTML/list_all.html इत्यत्र अश्वधाटीति न दृश्यते। अन्यस्मात् स्रोतसः प्रात्प्तञ्चेत् कथय, येन तदुपयोगो मट्टिप्पणिसंवर्धने स्यात्।

The name is from here: https://www.youtube.com/playlist?list=PLABJEFgj0PWVXr2ERGu2xtoSXrNdBs5xS (35) -- so, "personal communication" :-) And IIRC I took a couple of names from Kolhatkar's video as well: https://www.youtube.com/watch?v=CNnUhll0zzA

Actually, now that I'm emailing the group, let me take this opportunity to 
(a) give a progress update on my metre recognizer, and 
(b) propose a metre database that we can all use. [Actually, I'll send that as a separate email]

Since the last time I sent an update to this group (December!), some changes that have been made to the metre recognizer:

0. The current version is serving at http://1e.sanskritmetres.appspot.com/ and (as of just now) on the main page http://sanskritmetres.appspot.com/ as well.

1. The metre recognition has been made much more aggressive, for robustness against a lot of common metrical errors (though of course there are also many more false positives as a result): during indexing, for each known metre, we store not only the pattern of the full verse, but also the pattern(s) of each half (ardha) and each line/quarter (pāda). For a given verse, we attempt matching the pattern of (i) the entire input, (ii) each line of the input, (iii) each half of the input, and (iv) each quarter of the input, against the indexed patterns.

2. For vṛtta metres, the input verse is printed again on the results page, with the input's word breaks respected, and with laghus and gurus marked with non-bold and bold respectively. When the verse doesn't exactly match the metre, some sort of "best alignment" is found, with deviations highlighted in red (and underlined). You can hover over any such error to see what the right syllable there should be (either 'L' or 'G', or '-' in the case when the syllable should not be present).

3. The list of metres has been greatly expanded -- it now includes the "curated" list of 40+ metres I originally had, plus some 100+ metres from Vṛtta-ratnākara that Dhaval kindly contributed (here), plus the 1000+ metres that were scaped (again by Dhaval) from Anand Mishra's site. In the process of doing so, I uncovered many issues with this data; I'll send a separate email about it.

4. Better display -- the debugging output, instead of being dumped unceremoniously on the page, is now hidden (in modern browsers) under a "click to expand" box.
5. Various bug fixes (e.g. Unicode normalization), code cleanup, and some performance improvements (e.g. using compiled regexes) that matter when using read_gretil.py on a large text.
6. The table of recognized metres from famous texts that's on the front page has been updated a bit, but it is again out of date. Because of the bigger list of metres and more aggressive recognizing, many verses with errors in them are now "recognized" as some metre -- I still need to mark such low-confidence recognitions separately.

7. The recognition of mātrā-vṛttas (by just counting syllables in each line) has actually been weakened for now, temporarily, because there are mroe constraints than that, and I need to look into this more carefully. On the plus side, Arya in particular (by far the most common non-vṛtta in Sanskrit texts, besides śloka) is much better recognized.


As always, please try it out, and if you encounter any issues (or regressions wrt older versions), or can think of possible improvements, please report them to me either by email or at https://github.com/shreevatsa/sanskrit/issues

Regards,
Shreevatsa

Mārcis Gasūns

unread,
Jul 7, 2014, 4:40:45 PM7/7/14
to sanskrit-p...@googlegroups.com, vishvas...@gmail.com
Impossible, you're still alive!


On Monday, 7 July 2014 09:34:05 UTC+4, shreevatsa wrote:
2014-07-06 23:22 GMT+05:30 विश्वासो वासुकिजः (Vishvas Vasuki) <vishvas...@gmail.com>:
0. The current version is serving at http://1e.sanskritmetres.appspot.com/ and (as of just now) on the main page http://sanskritmetres.appspot.com/ as well.
Time to add some CSS? Can I propose?
 
1. The metre recognition has been made much more aggressive, for robustness against a lot of common metrical errors (though of course there are also many more false positives as a result): during indexing, for each known metre, we store not only the pattern of the full verse, but also the pattern(s) of each half (ardha) and each line/quarter (pāda). For a given verse, we attempt matching the pattern of (i) the entire input, (ii) each line of the input, (iii) each half of the input, and (iv) each quarter of the input, against the indexed patterns.
That's some fine math.
 

2. For vṛtta metres, the input verse is printed again on the results page, with the input's word breaks respected, and with laghus and gurus marked with non-bold and bold respectively.
For
प्रणम्य गड्गाधरमिन्दुचूडं देवञ्च मातड्गवरेण्यवक्त्रम् ।
मत्तातपादान् जननीं गुरूंश्च सेकादिकस्य क्रममद्य वक्ष्ये ।।१।।
pattern LGLGGLLGLGGGGLGGLLGLGGGGLGGLLGLGLGGLGGLLGLGG (44 syllables, 70 mātras) is Upajāti
Lines:
  Line 1: pattern LGLGGLLGLGGGGLGGLLGLGG (22 syllables, 35 mātras) is Half of Upajāti
  Line 2: pattern GGLGGLLGLGLGGLGGLLGLGG (22 syllables, 35 mātras) is Half of indravajrā
is great to have, but I miss the "non-bold and bold" version for sure. It makes the text easy to read for me.
 
When the verse doesn't exactly match the metre, some sort of "best alignment" is found, with deviations highlighted in red (and underlined). You can hover over any such error to see what the right syllable there should be (either 'L' or 'G', or '-' in the case when the syllable should not be present).
Oh my, see
मूर्द्धच्छिद्रमधोमुखमेतत् न्यस्येच्छरावमध्ये तु ।
यन्नारिकेलकर्परशरावयोश्छिद्रयुगलमस्ति कृतम् ।।१६।।

Reading (note errors in red) as prasarā (प्रसरा):

rddhacchidramadhomukhametat 
nyasye[-]ccha[-]va[-]madhye 
tu yannārikelakarparaśa 
vayośchidrayu[-]galamasti kṛtam 

Reading (note errors in red) as lolam (लोलम्):

rddhacchidramadhomukhame 
tat nyasyecchava 
madhye tu yannā 
rikelakarparaśavayośchidrayugalamasti kṛtam 

Reading (note errors in red) as rañjakam (रञ्जकम्):

rddhacchidramadhomukhame 
tat nyasyecchavamadhye tu ya 
nnārikelakarparaśava[-]yo 
śchidrayugalamasti kṛtam 
 
3. The list of metres has been greatly expanded -- it now includes the "curated" list of 40+ metres I originally had, plus some 100+ metres from Vṛtta-ratnākara that Dhaval kindly contributed (here), plus the 1000+ metres that were scaped (again by Dhaval) from Anand Mishra's site. In the process of doing so, I uncovered many issues with this data; I'll send a separate email about it.
Scrape was done not by Dhaval, but me, otherwise correct :) He's a code, I'm a scraper.
 

4. Better display -- the debugging output, instead of being dumped unceremoniously on the page, is now hidden (in modern browsers) under a "click to expand" box.
Yeah, that works just fine.
 
5. Various bug fixes (e.g. Unicode normalization), code cleanup, and some performance improvements (e.g. using compiled regexes) that matter when using read_gretil.py on a large text.
Could you please make a video how to use it on large texts, like the gretil texts, please?
 
6. The table of recognized metres from famous texts that's on the front page has been updated a bit, but it is again out of date. Because of the bigger list of metres and more aggressive recognizing, many verses with errors in them are now "recognized" as some metre -- I still need to mark such low-confidence recognitions separately.

Metre statistics is of greatest interest. Please move to a separate page, because otherwise it adds scrolling to the page.

 

7. The recognition of mātrā-vṛttas (by just counting syllables in each line) has actually been weakened for now, temporarily, because there are mroe constraints than that, and I need to look into this more carefully. On the plus side, Arya in particular (by far the most common non-vṛtta in Sanskrit texts, besides śloka) is much better recognized.
Hmm, any samples to test on known?
 
As always, please try it out, and if you encounter any issues (or regressions wrt older versions), or can think of possible improvements, please report them to me either by email or at https://github.com/shreevatsa/sanskrit/issues
Should I still report at Github? Great work, love what you do, hope Dhavala and Vishvas will continue helping you,

M. 

Shreevatsa R

unread,
Jul 7, 2014, 10:15:04 PM7/7/14
to sanskrit-programmers, विश्वासो वासुकिजः
On Tue, Jul 8, 2014 at 2:10 AM, Mārcis Gasūns <gas...@gmail.com> wrote:
Time to add some CSS? Can I propose?

Sure, we can add some CSS, but I'd prefer it to be minimal. Feel free to create a suggestion on https://github.com/shreevatsa/sanskrit/issues
 

2. For vṛtta metres, the input verse is printed again on the results page, with the input's word breaks respected, and with laghus and gurus marked with non-bold and bold respectively.
For
प्रणम्य गड्गाधरमिन्दुचूडं देवञ्च मातड्गवरेण्यवक्त्रम् ।
मत्तातपादान् जननीं गुरूंश्च सेकादिकस्य क्रममद्य वक्ष्ये ।।१।।
pattern LGLGGLLGLGGGGLGGLLGLGGGGLGGLLGLGLGGLGGLLGLGG (44 syllables, 70 mātras) is Upajāti
Lines:
  Line 1: pattern LGLGGLLGLGGGGLGGLLGLGG (22 syllables, 35 mātras) is Half of Upajāti
  Line 2: pattern GGLGGLLGLGLGGLGGLLGLGG (22 syllables, 35 mātras) is Half of indravajrā
is great to have, but I miss the "non-bold and bold" version for sure. It makes the text easy to read for me.

Yes, I've been planning to add this for these metres too -- have opened https://github.com/shreevatsa/sanskrit/issues/41 to remind myself.

[Aside: in your verse, गड्गा and मातड्ग should be गङ्गा and मातङ्ग I guess. :-) This makes no metrical difference; just something I noted.]

 
 
When the verse doesn't exactly match the metre, some sort of "best alignment" is found, with deviations highlighted in red (and underlined). You can hover over any such error to see what the right syllable there should be (either 'L' or 'G', or '-' in the case when the syllable should not be present).
Oh my, see
मूर्द्धच्छिद्रमधोमुखमेतत् न्यस्येच्छरावमध्ये तु ।
यन्नारिकेलकर्परशरावयोश्छिद्रयुगलमस्ति कृतम् ।।१६।।

Yes, lots of red because the input poorly fits all known metres. (I guess the verse is in gīti, of 12-18-12-18 syllables. Improving the recognition of mātra metres is something I still need to do.)

5. Various bug fixes (e.g. Unicode normalization), code cleanup, and some performance improvements (e.g. using compiled regexes) that matter when using read_gretil.py on a large text.
Could you please make a video how to use it on large texts, like the gretil texts, please?

I think Dhaval had made/shared one earlier? I can make a video on Mac/Linux, but I'm guessing you want one for Windows.

Should I still report at Github? 

Yes please.
Reply all
Reply to author
Forward
0 new messages