Sebastian Nehrdich on Sanskrit AI at Dharmamitra.org

44 views
Skip to first unread message

Mārcis Gasūns

unread,
Feb 28, 2025, 7:48:18 AM2/28/25
to sanskrit-programmers
On February 28 (Friday) 2025 at 7:00 pm Moscow time, the “Sanskrit Zealot's Society” will host a talk by Sebastian Nehrdich about artificial intelligence within the most advanced language model to date https://dharmamitra.org for translating from Sanskrit and Tibetan into English and German, as part of the ongoing workshop “On Sanskrit Computational Linguistics” in English. The talk will be streamed from Zoom to three websites:

1) https://vk.com/samskrtamru
2) https://rutube.ru/channel/41323102/
3) https://www.youtube.com/@MarcisGasuns

Regards,
Dr. Mārcis Gasūns
Russian Federation

cover en2.png

Anunad Singh

unread,
Feb 28, 2025, 8:42:15 AM2/28/25
to sanskrit-p...@googlegroups.com
Thank you for this great news and thanks to the developers of this tool too.

I tested this tool with simple sentences. I find two different outputs for the same input .
     
     Input  :  My name is Manohar.

     Output : (1) मम नाम मनोहरः । ( translated into  'Sanskrit Devanagari' )
                 : (2) mama nāma manohar asti. ( translated into  'Sanskrit' )

I wonder why it should be so? It shows that it is following two different paths for the two translations. Shouldn't the output (2) be the transliterated version of the output (1)?

Also, the transliteration seems inconsistent. Should it not be 'manohara' instead of 'manohar' ?

-- anunAda



--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/sanskrit-programmers/a2f60f03-f8d2-4117-af5e-e5576c7b80d8n%40googlegroups.com.

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Feb 28, 2025, 9:01:32 AM2/28/25
to sanskrit-p...@googlegroups.com, nehr...@eecs.berkeley.edu

nehr...@gmail.com

unread,
Feb 28, 2025, 10:30:42 AM2/28/25
to sanskrit-programmers
Hi anunAda and everybody, 

Thank you for your feedback! The translation system uses probabilistic sampling and is therefore not 100% deterministic, i.e. it doesn't follow hard-coded translation rules. It will show variations as the one you described, that is to be expected. Also some differences in conventions regarding the application of Sandhi rules and word segmentations can show when you ask the output to be in IAST vs. Devanagari, since we trained the system on existing editions of Sanskrit material. I agree that these variations are not 100% ideal, but thats where we are. Lets see if we can improve the behavior in the future.
Best, 

Sebastian 

Anunad Singh

unread,
Mar 1, 2025, 3:03:02 AM3/1/25
to sanskrit-p...@googlegroups.com, nehr...@gmail.com
namaste Sebastian,
Thank you again.

Let me start with saying that I have full confidence that the behaviour will improve seeing the potential of the technology and the enthusiasm you have started with. I know my limitations that I do not know its exact internal structure and mechanism. What I have said is obviously based on its behaviour at the interface.

What I observe is that its behaviour is 100% deterministic (repeating) as long as you have selected either 'Sanskrit Devanagari' or 'Sanskrit' . I do not think it will give a different output than 'मम नाम मनोहरः' for 'My name is Manohar' if output option is 'Sanskrit Devanagari' . In that sense, the output is NOT probabilistic.

From my limited knowledge in this field, and from observations from its behaviour at the interface, I guess that it has been trained on at least two separate datasets- one for 'Sanskrit Devanagari' and another for 'Sanskrit'. In other words, it seems it is using two different 'networks' for the two. If it is so, the question is, is it useful (or proving to be a burden )?

It is surely a useful feature if the translator has been designed to give two or more types of output, something like 'standard Sanskrit' , 'simple Sanskrit' , 'Sandhi-segmented Sanskrit' etc,

-- anunAda

Reply all
Reply to author
Forward
0 new messages