Sanskrit Sandhi using Pure Python

566 views
Skip to first unread message

shantanu oak

unread,
May 21, 2023, 6:18:07 AM5/21/23
to sanskrit-programmers
Hi,
I have developed Denormalized sanskrit sandhi in pure python. This is using a simple "for - loop" to generate the dictionary based on each Panini Sutra. A work in progress. Feedback appreciated. 

https://github.com/shantanuo/sandhi

Run all cells in the notebook, test your word at the end of the script. for e.g.

sandhi_builder('पितृ उपदेश')
#returns {'पित्रुपदेश'}

If you do not want to use python, then look for the last and first character in the index file. for e.g. if you are looking for sandhi of यज् + न then look for ज् न in the index file.

!grep 'ज् न' sandhi_code_out.txt
# ज् न ज्ञ 2.1.1 श्चुत्व

You will get ज्ञ and hence your sandhi word will be य + ज्ञ = यज्ञ

This is poor man's sandhi builder. For richer experience you can visit:

https://sanskrit.uohyd.ac.in/scl/#

-- Shantanu

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
May 21, 2023, 6:56:49 AM5/21/23
to sanskrit-p...@googlegroups.com
make it a package which I can install with pip; and provide some usage examples in the README.

--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sanskrit-programmers/731ba55f-7635-452e-b40f-7c9f39311bf4n%40googlegroups.com.


--
--
Vishvas /विश्वासः

shantanu oak

unread,
May 23, 2023, 12:15:16 AM5/23/23
to Rajeshwari Godbolé, sanskrit-p...@googlegroups.com
Hi,
I am not an expert either. I did google search for each sutra and wrote the code. During my research I have seen so many incorrect words that I do not trust google search anymore.

I will not be surprised if the people find bugs in the script.
But the beauty is that anyone can make changes and correct it.

In order to re-check this example, I used the Linux find command (grep) and got this result...

# grep 'ृ उ' sandhi_code_out.txt
 ृ उ ् रु 1.3.3 यण

It means this is 'यण' sandhi, 'इकोऽयणचि' is the sutra. The explanation that I got on the net is...
# ऋ के बाद कोई स्वर आवे तो ऋ के स्थान पर र

You will get पित्रोपदेश word if you run this...
sandhi_builder('पित्र उपदेश') # or पितरोपदेश for 'पितर उपदेश'

Making a python package for this is a good idea. But I do not know much about that. Any help will be appreciated.

-- Shantanu


On Mon, May 22, 2023 at 8:07 PM Rajeshwari Godbolé <rgod...@gmail.com> wrote:
This looks interesting! One question about the example (I'm not an expert so pardon me if this is incorrect):

sandhi_builder('पितृ उपदेश')
#returns {'पित्रुपदेश'} -- should this not be पित्रोपदेश?

Thanks,

Rajeshwari



--

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
May 23, 2023, 12:23:55 AM5/23/23
to sanskrit-p...@googlegroups.com, Rajeshwari Godbolé
On Tue, 23 May 2023 at 09:45, shantanu oak <shanta...@gmail.com> wrote:


Making a python package for this is a good idea. But I do not know much about that. Any help will be appreciated.


Check out these files, and how the code is placed in the repo.

2 months ago

Just imitate that. Shouldn't take long for you to figure out using that as an example.

 
-- Shantanu


On Mon, May 22, 2023 at 8:07 PM Rajeshwari Godbolé <rgod...@gmail.com> wrote:
This looks interesting! One question about the example (I'm not an expert so pardon me if this is incorrect):

sandhi_builder('पितृ उपदेश')
#returns {'पित्रुपदेश'} -- should this not be पित्रोपदेश?


पित्रुपदेश is correct.

 

Thanks,

Rajeshwari



On Sun, May 21, 2023 at 6:18 AM shantanu oak <shanta...@gmail.com> wrote:
Hi,
I have developed Denormalized sanskrit sandhi in pure python. This is using a simple "for - loop" to generate the dictionary based on each Panini Sutra. A work in progress. Feedback appreciated. 

https://github.com/shantanuo/sandhi

Run all cells in the notebook, test your word at the end of the script. for e.g.

sandhi_builder('पितृ उपदेश')
#returns {'पित्रुपदेश'}

If you do not want to use python, then look for the last and first character in the index file. for e.g. if you are looking for sandhi of यज् + न then look for ज् न in the index file.

!grep 'ज् न' sandhi_code_out.txt
# ज् न ज्ञ 2.1.1 श्चुत्व

You will get ज्ञ and hence your sandhi word will be य + ज्ञ = यज्ञ

This is poor man's sandhi builder. For richer experience you can visit:

https://sanskrit.uohyd.ac.in/scl/#

-- Shantanu

--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sanskrit-programmers/731ba55f-7635-452e-b40f-7c9f39311bf4n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.

Hrishikesh Terdalkar

unread,
May 23, 2023, 1:15:41 AM5/23/23
to sanskrit-p...@googlegroups.com

shantanu oak

unread,
Aug 14, 2023, 5:44:38 AM8/14/23
to sanskrit-programmers
Here is an API access to the sandhi code. Type the words you want to join after the question mark.

https://2ku5vw336655hiomtcogv4sopm0bpqdo.lambda-url.us-east-1.on.aws/?

If you type "कर्मणि एव अधिकारः ते" you will get back "कर्मण्येवाधिकारस्ते". 

Please test it and let me know the cases where it fails. Any developer can easily build an app for this :)

-- Shantanu

shantanu oak

unread,
Aug 21, 2023, 12:39:48 AM8/21/23
to sanskrit-programmers
The Sanskrit Sandhi android app is available here...

https://play.google.com/store/apps/details?id=com.myapp.marathispellcheckandsanskritsandhi

It includes Marathi spell check as well.

-- Shantanu

Akshay B

unread,
Nov 15, 2023, 7:57:43 PM11/15/23
to sanskrit-programmers
Hi Shantanu

Akshay Bapaye here. Can you share your email address with me. I would like to connect with you personally.

Thanks.

shantanu oak

unread,
Feb 5, 2024, 10:49:43 PM2/5/24
to sanskrit-programmers
If you are using "Telegram" app on your mobile, add "SanskritSandhibot" in your friends list.

https://t.me/SanskritSandhibot

Message any 2 or more words for e.g. "कर्मणि एव अधिकारः ते" and get back sandhi like "कर्मण्येवाधिकारस्ते" 

-- Shantanu

venkata raman

unread,
Feb 24, 2024, 9:40:08 AM2/24/24
to sanskrit-programmers
Is there any such api which does reverse sandhi or split eg: given  "कर्मण्येवाधिकारस्ते"  the api returning "कर्मणि एव अधिकारः ते"

shantanu oak

unread,
Mar 19, 2024, 4:22:05 AM3/19/24
to sanskrit-programmers
Use "SandhiSplitBot" in telegram if you need to split.

https://t.me/SandhiSplitBot
If you type कर्मण्येवाधिकारस्ते bot will reply कर्मणि एव अधिकारः ते

Screenshot: https://kagapa.s3.ap-south-1.amazonaws.com/spellcheck/app/sandhi_split.jpg

There are 2 bots in telegram. One bot can join the words the other can split.

SanskritSandhibot
If you type गण ईश उत्सव  bot will reply गणेशोत्सव

Screenshot: https://kagapa.s3.ap-south-1.amazonaws.com/spellcheck/app/sandhi_join.jpg

kenp

unread,
Mar 22, 2024, 11:24:32 AM3/22/24
to sanskrit-programmers

shantanu oak

unread,
Mar 24, 2024, 8:52:59 AM3/24/24
to sanskrit-programmers
I think browser (desktop) version is not possible because even if it is based on Hunspell, the sandhi and splitter are part of complex cloud programming. You can add the bot called "SanskritOneBot" in your telegram friends list. It is called "one" because it can do spell check, sandhi and also split!
Check spelling based on hunspell. Also try to split (संधि विच्छेद), if there is a single word. If in case 2 or more (upto 19) words are typed, then it will try to merge them based on Panini Sutras.

screenshot:
https://kagapa.s3.ap-south-1.amazonaws.com/spellcheck/app/telegram_sansone.jpg

Spell checker splits (संधि विच्छेद) each word and if all the parts are found in corpus then that word is considered correct. In the screenshot "संक्षिप्तपरिचयं" is marked as incorrect. That is because even if "संक्षिप्त" is there in the database, परिचयं is not. (परिचयम् is however included) I am not an expert and feedback is appreciated!

shantanu oak

unread,
Jul 11, 2025, 6:56:01 AMJul 11
to sanskrit-programmers
According to the benchmark results, INRIA achieved the highest accuracy of 82.1%, followed by the University of Hyderabad (UoH) with 73.1% on the Bhagavad-Gita sandhi corpus (Table 2, page 4498).

http://www.lrec-conf.org/proceedings/lrec2018/pdf/755.pdf

The python script mentioned in this thread achieved an accuracy of approximately 76%.
_____

It is important to note that the corpus contains several errors

https://github.com/sanskrit-sandhi/SandhiKosh/issues/3

If these errors are corrected, the performance is likely to improve.

-- Shantanu

Avinash L Varna

unread,
Jul 22, 2025, 12:27:43 PMJul 22
to sanskrit-p...@googlegroups.com
Thanks for sharing. This paper and the results are from 2018 - so almost 7 years old at this point. While doing something else this weekend, I noticed that you have a corrected version of the Bhagavad-Gita corpus in this repo along with a sandhi implementation. I ran a test of a couple of python libraries available for performing sandhi on this corrected version - sanskrit_parser, sandhi which is a port of the UoH implementation from ~3-4 years ago, and the sandhi_builder function in the notebook available in the repo. The results are here, if anyone is interested. The summary is that all three have similar accuracy on this corpus for performing sandhi, with varying runtimes for the actual test. (Notes - The top-10/100 refers to keeping only the shortest 10/100 results at each intermediate stage in the libraries native encoding - SLP1 for sanskrit_parser, WX for sandhi when iteratively performing sandhi involving long sentences, since they typically only support sandhi between two words. A disclaimer is that I directly used the benchmarking code for sandhi_builder function from the notebook and didn't try to optimize the runtime, so take that with a pinch of salt.)

Separately, Karthik has also made some corrections to the SandhiKosh corpus here. It might be good to combine the efforts so that we can have an improved single corpus. 

Thanks
Avinash

--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages