Frequency list of Sanskrit words

1,225 views
Skip to first unread message

Anunad Singh

unread,
Mar 20, 2016, 4:34:18 AM3/20/16
to sams...@googlegroups.com

Frequency list of words (for example, 1000 most frequently used words of Telugu, 500 most used words of Nepali etc) find many uses. The following page has links to such 'frequently used words' for many languages.

Wiktionary:Frequency lists

https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists


But Sanskrit is missing from this list. Has somebody attempted to make such a list (may be of 500 most frequently used words) somewhere for Sanskrit?

-- AnunAda

S. L. Abhyankar

unread,
Mar 20, 2016, 9:19:20 AM3/20/16
to anu...@gmail.com, sams...@googlegroups.com
नमस्ते श्रीमन् अनुनादसिंह-महोदय !
What are important in Sanskrit are not the (formatted) words, but the root words. Root words are primarily of 3 types
  1. प्रातिपदिकानि - Among these are three subgroups - 
    1. Nouns नामानि Adjectives विशेषणानि For commonly used nouns and adjectives in the form thesaurus (Note, thesaurus is basically an Indian/Sanskrit concept) good reference books are निघण्टु (Better read with निरुक्त) and अमरकोशः Of these निघण्टु (Better read with निरुक्त) is more a thesaurus related to words used in Vedic literature. And अमरकोशः is more a thesaurus related to words used in classical Sanskrit.
    2. Pronouns सर्वनामानि - There are 36 pronouns listed in the गणपाठ related to the अष्टाध्यायी-सूत्रम् - सर्वादीनि सर्वनामानि 
  2. धातवः - There is धातुपाठ listing some 2000 धातु-s. Out of these in बृहद्धातुरूपावलिः word-formations are detailed for some 662 धातु-s. Those can be considered as frequently used धातु-s.
    1. Prefixes उपसर्गाः lend an emphasis or lend a totally different meaning to the basic meaning of a धातु. 
    2. In गणपाठ there is a list of 22 उपसर्ग-s related to the अष्टाध्यायी-सूत्रम् - प्रादयः Note also, that उपसर्ग-s may be used single or many together. 
  3. अव्ययानि - In गणपाठ one finds two lists of अव्ययानि related to two अष्टाध्यायी-सूत्र-s - (1) स्वरादिनिपातमव्ययम्  (2) चादयोऽसत्त्वे 
It is all there. One needs to only have an eye for it.

सस्नेहम्
अभ्यंकरकुलोत्पन्नः श्रीपादः ।
"श्रीपतेः पदयुगं स्मरणीयम् ।"

Hnbhat B.R.

unread,
Mar 20, 2016, 10:17:55 AM3/20/16
to sams...@googlegroups.com

It is all there. But he is asking whether there is a list of words that are frequently used and of the 3000 roots, a list of frequently used ones. Not all the words in the dictionaries and grammar are used.

For example,

वृन्दारका दैवतानि पुंसि वा देवताः स्त्रियाम् ।। १.१.१८ ।।

Amara says the word दैवत is used in neuter and masculine gender also. But it is never used in masculine gender.

Of the verbs, हन हिंसागत्योः the root is used in the two senses, to kill and motion. But it is never seen in the second sense as रामो गङ्गां हन्ति meaning Rama goes to the river Ganga.

Not all the nouns in Ganapatha are used and many are used very rarely.

DR Y N RAO

unread,
Mar 20, 2016, 5:57:51 PM3/20/16
to sams...@googlegroups.com
I opine that such a list for Sanskrit is the need of the hour!   It would be appreciable if any individual/group/organization takes up this task!!   Especially the list should contain Sanskrit words of day-to-day usage, so that they could be easily used by people in their conversational Sanskrit!!!

--
You received this message because you are subscribed to the Google Groups "samskrita" group.
To unsubscribe from this group and stop receiving emails from it, send an email to samskrita+...@googlegroups.com.
To post to this group, send email to sams...@googlegroups.com.
Visit this group at https://groups.google.com/group/samskrita.
For more options, visit https://groups.google.com/d/optout.

DR Y N RAO

unread,
Mar 20, 2016, 5:59:52 PM3/20/16
to sams...@googlegroups.com

प्रिय अभ्यंकरकुलोत्पन्नः श्रीपाद महोदय,

उत्तमविवरणम्!  धन्यवादाः!!

भवदीयः,
--डॉ. वाई.एन्. राव्,


--

Sanju Nath

unread,
Mar 20, 2016, 7:29:55 PM3/20/16
to sams...@googlegroups.com
While we wait for something to be compiled, there is a list already that can be used.  The writers state:

The list of words is a compilation from various sources such as messages on sanskrit-digest, translated documents such as Bhagavadgita, atharvashiirshha, raamarakshaa et cetera, and other files accessible on the web.


धन्यवाद,
Sanju

Anunad Singh

unread,
Mar 21, 2016, 12:04:35 AM3/21/16
to sams...@googlegroups.com
If the words in this Sanskrit-English dictionary were arranged in descending order of their occurrences in common Sanskrit literature, it could be called a frequency list and would be much more useful. For example, a learner could learn only, say 200 words (which many people could do in one day), and get approximately 50% 'command' on Sanskrit text. The idea of frequency of words is so important that somebody has claimed that only 3 words of English (I, the, and) constitute 10% of English text.

There are many softwares available for making frequency list. For this one has to choose text so that the text represents 'common' Sanskrit literature.

This dictionary can be a  guide for making such list. There is another list of common words here-

http://sanskrit.samskrutam.com/en.sanskrit-english-dictionary.ashx

But again, the words are not arranged in order of frequency but alphabetically.

-- anunAda

------------------------------------------------------------------

On Mon, Mar 21, 2016 at 4:18 AM, Sanju Nath <sanji...@gmail.com> wrote:
Boxbe This message is eligible for Automatic Cleanup! (sanji...@gmail.com) Add cleanup rule | More info

Nagaraj Paturi

unread,
Mar 21, 2016, 1:57:31 AM3/21/16
to saMskRRita-sandesha-shreNiH

--
You received this message because you are subscribed to the Google Groups "samskrita" group.
To unsubscribe from this group and stop receiving emails from it, send an email to samskrita+...@googlegroups.com.
To post to this group, send email to sams...@googlegroups.com.
Visit this group at https://groups.google.com/group/samskrita.
For more options, visit https://groups.google.com/d/optout.



--
Nagaraj Paturi
 
Hyderabad, Telangana, INDIA.
 
Former Senior Professor of Cultural Studies
 
FLAME School of Communication and FLAME School of  Liberal Education,
 
(Pune, Maharashtra, INDIA )
 
 
 

G S S Murthy

unread,
Mar 21, 2016, 2:07:54 AM3/21/16
to sams...@googlegroups.com
We could do word-frequency analysis of digitally available classics using standard "Find" button. But if one is trying to develop a list which would be needed most for modern usage, such an analysis may not be of much help.
Regards,
Murthy

On Mon, Mar 21, 2016 at 9:34 AM, Anunad Singh <anu...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "samskrita" group.
To unsubscribe from this group and stop receiving emails from it, send an email to samskrita+...@googlegroups.com.
To post to this group, send email to sams...@googlegroups.com.
Visit this group at https://groups.google.com/group/samskrita.
For more options, visit https://groups.google.com/d/optout.

Hnbhat B.R.

unread,
Mar 21, 2016, 2:22:07 AM3/21/16
to sams...@googlegroups.com


On 21-Mar-2016 11:37 am, "G S S Murthy" <murt...@gmail.com> wrote:
>
> We could do word-frequency analysis of digitally available classics using standard "Find" button. But if one is trying to develop a list which would be needed most for modern usage, such an analysis may not be of much help.
> Regards,
> Murthy
>

In that case, frequency list should be made from the modern Sanskrit speakers' words in day to day usee of modern world which would be useful for learners.

And not the list of words used in classical literature.

Anunad Singh

unread,
Mar 21, 2016, 2:58:31 AM3/21/16
to sams...@googlegroups.com
There are several softwares for word frequency analysis. There are many aspects of such an analysis, proper selection of text is one of them.

Some relevant links-
https://en.wikipedia.org/wiki/Word_lists_by_frequency

https://en.wikipedia.org/wiki/Letter_frequency

https://en.wikipedia.org/wiki/Frequency_analysis

Frequency Analysis Tool (with source code)


-----------------------------------------------------

V Singh

unread,
Mar 21, 2016, 7:45:44 AM3/21/16
to samskrita

अनुनाद महोदय


I understand that you have asked for most frequently used संस्कृत word list which I too would like to have, but I have amaturely (being trying to compile a list of exact opposite. ) i.e most frequently used words in english or other languages and their translation to संस्कृत । My motivation and plan to learn, like small children, is to use संस्कृत for daily practical conversation before I graduate to reading short stories.


I am attaching two lists which I have made. If the scholars on this forum find it useful, these can be expanded upon with their help. Please keep in mind that these lists are still a work in progress and moreover I am a neophyte and through स्वाध्याय have just started learning सः तौ ते therefore these lists are incomplete and bound to be full of mistakes. I would be indebted if corrections are sent.


-- वीरेन्द्रः 


P.S - To reduce the size from more than 10 Mb, the verb list consists of images.

100-top-eng-nouns-with-sanskrit-equivalent-VER-2.pdf
100-top-eng-verbs-with-sanskrit-equivalent.pdf

Anunad Singh

unread,
Mar 21, 2016, 9:20:47 AM3/21/16
to sams...@googlegroups.com
वीरेन्द्र महोदय,

To create a 'word-frequency list' in exact sense requires that the list be made by counting word-frequencies in a selected corpus.

But we know that there are not one but many such lists for English and other languages because selection of the corpus can make considerable difference. Keeping in view this practical fact that such a list can not be unique, the procedure followed by you can also be used in part for this list. I propose the following method for creating such a list of most frequently used Sanskrit words-

Suppose we want to create a list of 2000 most frequently used Sanskrit words. Let us take about 3000 most frequently used English words. I feel that about half of these will have Sanskrit equivalents which might be most frequently used Sanskrit words. These have to be sorted out by experienced Sanskrit scholars. Add to this sorted list some Sanskrit words which are known to be used frequently but not occurring in the list. This list, though not created by counting, may be as useful as one created using the exact method. But much of the work in this procedure will have to be done manually and it will add an element of subjectivity in the output (ie the list) too.

Today Sanskrit text of various types is available in digital form in abundance. Frequency analysis softwares are also available easily. But we  need to do Sandhi-vichchheda for Sanskrit corpus before words can be counted. This can be a problem, though I know that some Sanskrit text is also available where Sandhis have been undone.

-- अनुनादः
 

dhaval patel

unread,
Mar 21, 2016, 10:02:30 AM3/21/16
to samskrita

There are many files in GRETIL which have sandhi viccheda done already. And the coverage of site is also trans-genre. That would be a good candidate for non subjective creation of word frequency list.

Oliver Hellwig

unread,
Mar 21, 2016, 11:04:35 AM3/21/16
to samskrita
Dear all,

You could also use the DCS overview for single texts:
http://kjc-fs-cluster.kjc.uni-heidelberg.de/dcs/index.php?contents=corpus

Click on the "F" in the last column of the table to get those roots for single texts. When you choose the Mbh or Ramayana, you may get a good starting point for compiling such a list.

Oliver

Anunad Singh

unread,
Mar 22, 2016, 12:04:22 AM3/22/16
to sams...@googlegroups.com
Great! It has become so easy....

The frequency list output for Arthashaastra ...

WordFrequency
ca (ind)498
(ind)225
tad (pron)218
iti (ind)185
kṛ (8. Ā.)129
na (ind)96
rājan (m)86
artha (mn)85
etad (pron)77
yad (pron)72
daṇḍa (mn)72
karman (n)68
idam (pron)64
hi (ind)62
sarva (pron)57
kāray (10. Ā.)52
para (pron)51
yoga (m)48
yuj (4. P.)47
agni (m)46
tva (n)40
eka (pron)39
tri (nr)37
puruṣa (m)37
tasmāt (ind)36
anya (pron)36
bhāga (m)35
dravya (n)34
mantra (mn)33
sthāna (n)33
sva (adj)33
kārya (n)33
hastin (m)32
eva (ind)32
mūla (mn)31
vyaya (m)31
api (ind)31
putra (m)30
bhū (1. Ā.)29
amātya (m)29
cūrṇa (mn)29
(1. Ā.)29
mantrin (m)28
deśa (m)28
udaka (n)27
tu (ind)27
viṣa (n)26
sidh (4. Ā.)26
yathā (ind)25
vid (1. P.)25

Anunad Singh

unread,
Mar 22, 2016, 4:12:31 AM3/22/16
to sams...@googlegroups.com
Here is the list of 2000 most frequently occuring words in Ramayan. Please give your comments.
List of 2000 most frequently used words in Ramayan.txt

G S S Murthy

unread,
Mar 22, 2016, 7:38:45 AM3/22/16
to sams...@googlegroups.com
Fine work. Congrats.
If we do frequency analysis for each Kanda of Ramayana, any marked variations could point to possibility of different authors. Many scholars believe that Balakanda and Uttarakanda are later additions.
Regards,
Murthy

On Tue, Mar 22, 2016 at 1:42 PM, Anunad Singh <anu...@gmail.com> wrote:
Here is the list of 2000 most frequently occuring words in Ramayan. Please give your comments.

--
You received this message because you are subscribed to the Google Groups "samskrita" group.
To unsubscribe from this group and stop receiving emails from it, send an email to samskrita+...@googlegroups.com.
To post to this group, send email to sams...@googlegroups.com.
Visit this group at https://groups.google.com/group/samskrita.
For more options, visit https://groups.google.com/d/optout.

Shrivathsa B

unread,
Mar 22, 2016, 2:39:57 PM3/22/16
to saMskRRita-sandesha-shreNiH

I found multiple authors in the mails sent by GSS Murty. I deduced it from a word frequency analysis of his varied mails! I, however didn't go looking for multiple authors!

Isn't it elementary that word frequency changes with the subject being discussed?

What if suppose the multiple authors were conscious enough to fox any future frequency analysts? Would they magically become one? Or will you conclude that actually they were 1568 in number, but they were intelligent enough to fox the best frequency analysts.

ken p

unread,
Mar 22, 2016, 3:52:29 PM3/22/16
to samskrita
One may try this tool for Roman script as well for other scrips.

Copy this news articles and convert in to IAST text and use above counter. It's good to find Frequency words in prose form instead of in poetry form. 


WORD COUNT REPORT

Total word count: 108 words

Primary Keywords (no common words): 99 words (91.67%)
Common Words Count: 9 words (8.33%)

Primary Keywords Frequency
iti 3
ambadkaraḥ 3
pradhānamantriṇā 2
śīghraṁ 2
bhaviṣyati 2
vimocakaḥ 2
sarveṣāṁ 2
nigūḍha 1
prayatnaṁ 1
kecana 1
kadāpi 1
smṛteḥvismṛteḥ 1
vismṛtyarthaṁ 1
rahasyamidam 1
bhāgāḥ 1
vaijñānikāḥ 1
kimiti 1
bhāvanāyaḥ 1
durgamevai 1
atbhutānāṁ 1
laṇḍan 1
prācīnānāṁ 1
nūtanānāṁ 1
modivaryaḥ 1
jīvanaṁ 1
aikyāya 1
rāṣṭraśāsanānusāraṁ 1
netṛtvaṁ 1
sāmājikodgrathanāya 1
anusmṛtam 1
samaśīrṣaḥ 1
kintu 1
āsīt 1
viśvamānavaḥ 1
varyasya 1
lūthar 1
kṛtavataḥ 1
adhikārebhyaḥ 1
amerikkāyāṁ 1
mdhye 1
bhāṣamāṇaḥ 1
navadilyāṁ 1
cedapi 1
ud‌ghāṭitam 1
smṛtipaddhatiḥ 1
eva 1
pārśvavatkṛtānāṁ 1
āsīditi 1
uktam 1
ambadkar 1
śilāsthāpanasamārohe 1
āsīdayam 1
śyāmavarṇīyānām 1
pravartanaṁ 1
mārṭin 1
kiṅ 1
navadillī 1
modī 1
kevalaṁ 1
bhārate 1
saḥ 1
kṛtavān 1
sāmājikasamatvāya 1
tasya 1
samarpitamiti 1
avadat 1
smṛtyarthaṁ 1
vismṛtiḥ 1
manuṣyamastiṣkaḥ 1
mastiṣkasya 1
kurvanti 1
cālanaṁ 1
tat‌ 1
vismriyate 1
vismṛtiṁ 1
narendramodinā 1
vinā 1
deśīyasmārakasya 1
nūtanajñānasañcayaḥ 1
dvicakrikā 1
anviṣyamānāḥ 1
dalitavibhāgīyanāṁ 1
paṭhitvā 1
nūtanāni 1
pravartananiratāḥ 1
purātanakāryeṇa 1
bhavatu 1
saha 1
nūtanā 1
badhvā 1

Common Words Frequency
na 3
ca 3
ḍo 1
ār 1
bi 1




WORD COUNT REPORT

Total word count: 110 words

Primary Keywords (no common words): 97 words (88.18%)
Common Words Count: 13 words (11.82%)

Primary Keywords Frequency
अम्बद्करः 3
प्रधानमन्त्रिणा 2
शीघ्रं 2
इति। 2
सर्वेषां 2
विमोचकः 2
प्रवर्तननिरताः 1
बध्वा 1
नूतनानि 1
नूतनज्ञानसञ्चयः 1
विस्मृतिं 1
विस्म्रियते 1
तत्‌ 1
चालनं 1
वैज्ञानिकाः 1
भागाः 1
मस्तिष्कस्य 1
रहस्यमिदम् 1
अन्विष्यमानाः 1
म्ध्ये 1
स्मृतेःविस्मृतेः 1
निगूढ 1
मनुष्यमस्तिष्कः 1
विस्मृतिः। 1
स्मृत्यर्थं 1
अवदत्। 1
समर्पितमिति 1
तस्य 1
सामाजिकसमत्वाय 1
कृतवान् 1
सामाजिकोद्ग्रथनाय 1
अनुस्मृतम्। 1
आसीत् 1
समशीर्षः 1
किङ् 1
मार्टिन् 1
प्रवर्तनं 1
श्यामवर्णीयानाम् 1
आसीदयम् 1
शिलास्थापनसमारोहे 1
दलितविभागीयनां 1
नवदिल्ली 1
उद्‌घाटितम्। 1
किन्तु 1
पार्श्ववत्कृतानां 1
केचन 1
नरेन्द्रमोदिना 1
।नवदिल्यां 1
देशीयस्मारकस्य 1
भाषमाणः 1
अमेरिक्कायां 1
अधिकारेभ्यः 1
कृतवतः 1
लूथर् 1
वर्यस्य 1
विश्वमानवः 1
इति 1
भारते 1
नेतृत्वं 1
राष्ट्रशासनानुसारं 1
ऐक्याय 1
जीवनं 1
मोदिवर्यः 1
नूतनानां 1
प्राचीनानां 1
लण्डन् 1
अत्भुतानां 1
दुर्गमेवi 1
भावनायः 1
किमिति 1
आर् 1
प्रयत्नं 1
उक्तम् 1
कुर्वन्ति 1
अम्बद्कर् 1
विस्मृत्यर्थं 1
द्विचक्रिका 1
पठित्वा 1
कदापि 1
चेदपि 1
विना 1
भविष्यति 1
पुरातनकार्येण 1
स्मृतिपद्धतिः 1
भविष्यति। 1
केवलं 1
भवतु 1
आसीदिति 1
नूतना 1
मोदी। 1

Common Words Frequency
च 3
न 3
। 2
डो 1
सः 1
सह 1
एव 1
बि 1



Free tool from TextFixer.com: Online Word Counter

Anunad Singh

unread,
Mar 23, 2016, 3:43:11 AM3/23/16
to sams...@googlegroups.com
I took 2000 most frequent Sanskrit words each from about 15 books (Ramayan, Kumarasambhavam, Hitopadesh, Amarushatak, Buddhacharita, Chhandogya upanishad, Dhanurveda, Kaamsuutra, Kavyalankar, Krishiparashara, Manusmriti, Nyayasutra, Rasahridayatantra, Samkhyakarika, Tarkasangraha, Yogasutra etc) and computed a cumulative frequency list.

The resulting file is attached.
Frequency list of Sanskrit words.txt

Nityanand Misra

unread,
Mar 23, 2016, 3:53:19 AM3/23/16
to samskrita

Thanks Anunad Singh Ji, this is interesting and can be used to create word clouds. 

Do you mind sharing the code you are using for this? Are you reading URLs or data files? If one uses a language like R, one can directly read the raw text from URLs. 

Anunad Singh

unread,
Mar 23, 2016, 4:23:15 AM3/23/16
to sams...@googlegroups.com
Nityanand ji,

It did not require me to use  any code. The following site, suggested by Shri
Oliver Hellwig, directly gives the frequency list of up to 2000 words.
       
http://kjc-fs-cluster.kjc.uni-heidelberg.de/dcs/index.php?contents=corpus

One has to click at F (last column) corresponding to any book to get the frequency list of that book. I only had to change the IAST to Devanagari and compute the cumulative frequency which I did with help of LibreOfice Calc.

-- anunAda

G S S Murthy

unread,
Mar 23, 2016, 6:37:01 AM3/23/16
to sams...@googlegroups.com
Many thanks Anunadji for this great and clever piece of work. I note that the list  is sorted out in 2-levels, frequency-wise and alphabetically (Roman). List contains compound words too. If proper nouns could be filtered out we could arrive at a more meaningful list.
Regards,
Murthy

--
You received this message because you are subscribed to the Google Groups "samskrita" group.
To unsubscribe from this group and stop receiving emails from it, send an email to samskrita+...@googlegroups.com.
To post to this group, send email to sams...@googlegroups.com.
Visit this group at https://groups.google.com/group/samskrita.
For more options, visit https://groups.google.com/d/optout.

Anunad Singh

unread,
Mar 23, 2016, 8:36:52 AM3/23/16
to sams...@googlegroups.com
I have planned to put this list within four-five days at https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists  where other languages list have been put.

Shri Murthy ji's suggestion to remove the proper nouns will surely be implemented. Please let me know if there are other suggestions.

We should not be contented with this list only and somebody should take this as a project to come up with much extensive frequency list on a much bigger corpus selected much more thoughtfully.


----------------------


On Wed, Mar 23, 2016 at 4:06 PM, G S S Murthy <murt...@gmail.com> wrote:
Boxbe This message is eligible for Automatic Cleanup! (murt...@gmail.com) Add cleanup rule | More info

Anunad Singh

unread,
Mar 25, 2016, 9:35:38 AM3/25/16
to sams...@googlegroups.com
Finally,  frequency list for Sanskrit words has been put here-

https://en.wiktionary.org/wiki/Appendix:Sanskrit_frequency_list_1

-- anunAda

Saleel Kulkarni

unread,
Mar 27, 2016, 6:22:11 PM3/27/16
to sams...@googlegroups.com

Dear Shri. Anunad Singh,

 

This is an important task that you have successfully carried out. Thank you.

 

You said :

{{I took 2000 most frequent Sanskrit words each from about 15 books (Ramayan, Kumarasambhavam, Hitopadesh, Amarushatak, Buddhacharita, Chhandogya upanishad, Dhanurveda, Kaamsuutra, Kavyalankar, Krishiparashara, Manusmriti, Nyayasutra, Rasahridayatantra, Samkhyakarika, Tarkasangraha, Yogasutra etc) and computed a cumulative frequency list. }}

 

This is an important piece of information vis-a-vis the Frequency List. I think you should put up this list also beside the Frequency List.

 

 

Saleel Kulkarni

--

Nagaraj Paturi

unread,
Apr 14, 2016, 2:50:33 PM4/14/16
to saMskRRita-sandesha-shreNiH

ken p

unread,
Apr 14, 2016, 9:51:18 PM4/14/16
to samskrita
Now we need to know that out of these 67000+ frequent Sanskrit words, how many words have same meanings for an English word?
.............................................................................................

G S S Murthy

unread,
Apr 15, 2016, 1:57:25 AM4/15/16
to sams...@googlegroups.com

My input is missing in this thread
I do not follow.
Regards
Murthy

विश्वासो वासुकिजः

unread,
Apr 19, 2016, 1:13:49 AM4/19/16
to samskrita, sanskrit-programmers, Martin Gluckman
+ martin and sanskrit-programmers

There is also  http://sanskritdictionary.com/frequency/ by mArtin Gluckman, using data from Oliver's DCS site.

सोमवार, 21 मार्च 2016 को 9:04:22 अपर UTC-7 को, अनुनाद सिंह: ने लिखा:
Reply all
Reply to author
Forward
0 new messages