formant frequency table for Sanskrit

326 views
Skip to first unread message

James Hartzell

unread,
Jan 23, 2012, 3:44:23 PM1/23/12
to sams...@googlegroups.com
HI All

I am looking for a speech formant frequency table for Sanskrit, if one
exists: such a table would list the specific vocal frequencies for
the 'sounds' of the Sanskrit language. Such tables have been created
for linguistic analysis of many other languages.

--
James Hartzell, PhD
Center for Mind/Brain Sciences (CIMeC)
The University of Trento, Italy

Eddie Hadley

unread,
Jan 23, 2012, 5:06:19 PM1/23/12
to sams...@googlegroups.com, Eddie Hadley
James,

There being a bi-directional one-to-one relationship between sound and
written sign, would it not simply be a matter of selecting any
representative text, separating each letter, sorting by alphabetical order
and counting.

The applying in a dash of statistical sophistication to the result.

Regards,

Eddie

HI All

The University of Trento, Italy-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1901 / Virus Database: 2109/4761 - Release Date: 01/23/12

Nityanand Misra

unread,
Jan 24, 2012, 12:17:26 AM1/24/12
to sams...@googlegroups.com, Eddie Hadley
While I do not know of such a table, constructing one should be easy with any one of the works available online at, for example, the Digital Corpus of Sanskrit.

http://kjc-fs-cluster.kjc.uni-heidelberg.de/dcs/

If I were to do this, I would start with a prose work like Daśakumāracaritam rather than a poetic work. The reason is that the constraints of prosody may bias the frequency of long and short syllables and in some cases also the consonants. For example if you choose a work like Meghadūta or Bhṛṅgadūta which is entirely composed in the Mandākrāntā, you would get 10 long syllables for every 7 short syllables which may be undesirable.

Thanks, Nityanand

--
You received this message because you are subscribed to the Google Groups "samskrita" group.
To post to this group, send email to sams...@googlegroups.com.
To unsubscribe from this group, send email to samskrita+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/samskrita?hl=en.




--
Nityānanda Miśra
http://nmisra.googlepages.com

|| आत्मा तत्त्वमसि श्वेतकेतो ||
(Thou art from/for/of/in That Ātman, O Śvetaketu)
     - Ṛṣi Uddālaka to his son, Chāndogyopaniṣad 6.8.7, The Sāma Veda

murthy

unread,
Jan 24, 2012, 5:58:28 AM1/24/12
to sams...@googlegroups.com
It would be possible to get the phoneme-frequency structure by copy-pasting a passage in its transliterated form on to Word and obataining the number of occurrences. For example if one wants to find out how often ऋ occurs, all that is required is that we find out how often "Ru" occurs in Baraha transliterated Sanskrit text. I believe using Unicode representation may not be useful for this exercise.
Regards
Murthy
----- Original Message -----
To unsubscribe from this group, send email to samskrita+...@googlegroups.com.

Ajit Krishnan

unread,
Jan 24, 2012, 8:37:42 AM1/24/12
to sams...@googlegroups.com
namaste,

I would start by contacting the author of this page:

http://www.sanskritweb.net/sansdocs/#CONJUNCTS 

I have definitely seen a frequency table on this website in the past (IIRC, it was based on the mahabharata). I am not able to find it at the moment.

sasneham,
 
   ajit
 
 



Nityanand Misra

unread,
Jan 24, 2012, 7:46:19 AM1/24/12
to sams...@googlegroups.com
That is one option, but would involve a lot of manual counting (for all vowels and consonants), plus the below method may not count the ऋ in कृ as a ऋ, depending on the script.

In my opinion, the right tool would be a lexer like flex on UNIX/Linux, run on the Velthuis/ITRANS/Howard Kyoto transliterated text. Some years ago I wrote one to detect typing errors of prosody in a large text by parsing the text a Velthuis transliteration as long and short syllables using the attached lex tokenizer. Something similar can be used here.
tokenizer.lex

Eddie Hadley

unread,
Jan 24, 2012, 10:17:57 AM1/24/12
to sams...@googlegroups.com, Eddie Hadley
For those interested, these are two relevant document that Ajit refers.
 
One relating to Classical texts and the other to Vedic.
 
 
However, both of the documents relate the research was carried out in the early days for a different purpose. To standardise a system of Romanization in the early days of Unicode.
 
The tables would need to be used in reverse, but I confirm that such tables are valid for such usage, simply because I have tried to use them so.
 
That said, here are the details of these docs.
 
The author is Ulrich Stiehl.
 
His works were instrumental to the development of Itranslator 2003.
 
I am familiar with these documents through my work as a beta tester for Itranslator 2003 v 2.5.0.0.
 
 
The Classical document with statistical table:   http://www.sanskritweb.net/itrans/itmanual2003.pdf
 
The Equivalent for Vedic:    http://www.sanskritweb.net/sansdocs/tbsvaras.pdf
 
 
Please, before you look at the Vedic document, be aware it contains the rather nasty racial comment viz.
 
‘Indian Niggers’
 
And realize that they are NOT those of the Author who is expressing his disagreement. And they certainty NOT part of my vocabulary.
 
 
Eddie
 
 
 
 
 
 
Sent: Tuesday, January 24, 2012 1:37 PM
Subject: Re: [Samskrita] formant frequency table for Sanskrit
 
namaste,
 
I would start by contacting the author of this page:
 
http://www.sanskritweb.net/sansdocs/#CONJUNCTS 
 
I have definitely seen a frequency table on this website in the past (IIRC, it was based on the mahabharata). I am not able to find it at the moment.
 
sasneham,
 
   ajit
 
 



On Mon, Jan 23, 2012 at 12:44 PM, James Hartzell <james.h...@gmail.com> wrote:
HI All

I am looking for a speech formant frequency table for Sanskrit, if one
exists:  such a table would list the specific vocal frequencies for
the 'sounds' of the Sanskrit language.  Such tables have been created
for linguistic analysis of many other languages.

--
James Hartzell, PhD
Center for Mind/Brain Sciences (CIMeC)
The University of Trento, Italy

--
You received this message because you are subscribed to the Google Groups "samskrita" group.
To post to this group, send email to sams...@googlegroups.com.
To unsubscribe from this group, send email to mailto:samskrita%2Bunsu...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/samskrita?hl=en.


--
You received this message because you are subscribed to the Google Groups "samskrita" group.
To post to this group, send email to sams...@googlegroups.com.
To unsubscribe from this group, send email to samskrita+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/samskrita?hl=en.

No virus found in this message.
Checked by AVG - www.avg.com

Version: 2012.0.1901 / Virus Database: 2109/4762 - Release Date: 01/23/12

murthy

unread,
Jan 25, 2012, 12:27:59 AM1/25/12
to sams...@googlegroups.com
Dear Misraji,
This ia as an amplification of my previous post. It is not necessary to count manually.If you want to count "Ru", you utilize the facilty, "Find and Replace". Replace "Ru" by say "X" throughout the file and this feature faithfully replaces "Ru" by "X" and in addition tells the number of replacements and voila, you have the count of "Ru"!
Regards,
Murthy----- Original Message -----

Oliver Hellwig

unread,
Jan 26, 2012, 4:55:25 PM1/26/12
to samskrita
An addition to the discussion about frequency of speech formants
(although I don't know whether this really answers the initial
question):
You find a dataset of all syllables in the DCS at
http://kjc-fs-cluster.kjc.uni-heidelberg.de/dcs/data/syllables/syllables.htm
(syllables.dat, link at the bottom of the page)

Vowels should be easy to extract from the data.

The top part of the page displays the proportions of consonant types
in the different historical layers of the corpus (retroflexes are not
surprising, but gutturals are). The R code used to produce the plot is
also linked at the bottom of the page.

Hope it helps!

Best, Oliver

Nityanand Misra

unread,
Jan 27, 2012, 1:11:51 AM1/27/12
to sams...@googlegroups.com
Herr Doktor Hellwig

Two comments and a question.

First the question. Is the frequency count in the .dat file based over all texts on the DCS? The total number seems a bit low at 7,567,475. A back of the envelope calculation shows that assuming VR, MBh and Bhagavata to contain only Anushtup verses would be half of that, ie ( 24 + 18 + 100 ) * 1000 * 32 = 4,544,000 syllables. And that's without any Puranas, Samhitas, et cetera.

The comments now.

1) The version of R script you pointed out does not work, as sums is undefined
  • Error: object 'sums' not found
    The following is what the script is missing before the "for" loop, you may want to make the changes online
  • sums <- colSums( data[,2:6] )
2) As the page rightly mentions, linear regression analysis is flawed here as the time data is categorical and not continuous. So any conclusions drawn from regression are to be taken with a grain of salt. A better way would be to use the prop.test() method in R which uses the Chi-Square test to compare proportions in several groups. I do remember seeing chi-square tests for searches on DCS though.

A matrix of p-values using a five-sample Chi-square test and four two-sample Chi-square tests between successive periods is below. From what I can see, the evidence against the null hypothesis of constant probability for a group is statistically significant for all five groups, even though he evidence is most extreme for Gutturals, Retroflexes and Labials which correspond to the significant linear regression F-stats. In fact, it is interesting to see that from Medieval to Later period, only Dentals have shown a significant change in Probability with p-value less than 0.1% (a justifiable critical value given the large sample sizes), while from Epic to Classic, evidence shows that probabilities for all five groups have changed. It is also interesting to observe that the frequency of Retroflexes did not change significantly between Early (Vedic/Upanishadic) and Epic periods, but it did in the post-Epic periods.

> matpvals

              All periods     Ea & Ep       Ep & Cl       Cl & Me      Me & La

Gutturals    0.000000e+00 3.607401e-43  0.000000e+00  0.000000e+00 1.267108e-01

Palatals    1.175834e-246 8.720096e-51 1.598550e-161  9.514065e-01 2.303646e-02

Retroflexes  0.000000e+00 3.181280e-01  6.859133e-92 1.400428e-127 3.685050e-02

Dentals      3.114376e-48 1.022459e-08  6.671614e-07  2.707000e-14 3.677069e-34

Labials      0.000000e+00 2.611753e-01  4.399987e-35 1.750870e-106 6.592439e-01


Lastly, while using the frequentist approach, sometimes large sample sizes are more likely result in significant test-stats, so it would be useful to try a Bayesian alternative.

Thanks, Nityanand

--
You received this message because you are subscribed to the Google Groups "samskrita" group.
To post to this group, send email to sams...@googlegroups.com.
To unsubscribe from this group, send email to samskrita+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/samskrita?hl=en.

Reply all
Reply to author
Forward
0 new messages