Vidyut: a high-performance Sanskrit toolkit

Arun

unread,

Nov 2, 2022, 12:02:12 AM11/2/22

to sanskrit-programmers

Link: https://github.com/ambuda-org/vidyut

Summary

From the readme:

Vidyut is a lightning-fast toolkit for processing Sanskrit text. Vidyut aims to provide standard components that are fast, memory-efficient, and competitive with the state of the art.

Vidyut compiles to native code and can be bound to your language of choice. As part of our work on Ambuda, we provide first-class support for Python bindings through vidyut-py.

Vidyut is currently experimental code, and its API is not stable. If you wish to use Vidyut for your production use case, please file an issue first.

Components

From the readme:

Lexicon maps Sanskrit words to their semantics with high performance and minimal memory usage. In one recent test, we were able to store 29.5 million inflected Sanskrit words in 31 megabytes of disk space for a total cost of around 1 byte per word, and we were able to retrieve these words at around 820 ns/word, as compared to 530 ns/word for a standard in-memory hash map.

Segmenter performs a padaccheda on a Sanskrit phrase and annotates each pada with its basic morphological information.

Segmenter is not yet competitive with other options, but we are optimistic that we can improve it over time. What is quite special, however, is its sheer speed: Segmenter can process a shloka in under 10 milliseconds, and we expect it to become even faster in the future.

Context

The context for this toolkit is that many of Ambuda's challenges -- word analysis for texts not in DCS, high-quality "spellcheck" for our proofing work, pedagogical tools for learners, and more flexible interfaces for our search tools -- would be better supported if there were a standard set of high-quality modules that were performant enough to run on a commodity webserver.

While all of these components exist in the broader ecosystem of Sanskrit programs, I could not find an option that was public, high performance, and high quality. Vidyut is an effort to create a set of components that meets all three of these conditions.

The main technical item of note here is that Vidyut is implemented in Rust, which compiles to native code and has a rich ecosystem of bindings to other languages. We plan to offer first-class support for Python bindings through vidyut-py, and we plan to investigate PHP and WebAssembly bindings in the future as well.

Once our padaccheda engine is stable, I plan to revive my work on Padmini so that Vidyut will also have a complete prakriya engine.

Questions

I would like to find short Sanskrit names for these components and namespaces. Lexicon might be called Rupavali, and Segmenter might be called Chedaka, but I am open to suggestions.

Les Morgan

unread,

Nov 3, 2022, 12:36:43 PM11/3/22

to sanskrit-programmers

FYI, Vidyut is the name of the Windows Input Method Editor phonetic keyboard that I co-developed many years ago. The Vidyut keyboard enables direct typing of Unicode-compliant Devanāgarī and selected Sanskrit Vedic and metrical marks on Windows computers using a phonetic method. It has been available since 2010 as a free download.

https://mywhatever.com/sanskrit/vidyut/index.html

Arun

unread,

Dec 28, 2022, 1:12:30 AM12/28/22

to sanskrit-programmers

Thanks for letting me know, Les. I'll update our readme to avoid confusion with your wonderful project.

~

As part of our work on Vidyut, we are releasing vidyut-prakriya, a fast Paninian word generator. If you remember my post on our Python-based generator from earlier this year, here are the major improvements from that system.

- It's much more comprehensive. We currently have reasonable support for karmani prayoga, and I'll also add support for sanAdi pratyayas by the end of the year. We have experimental support for various krdantas and basic support for subantas.

- It has much better documentation.

- It's much faster. After compilation, my computer can generate all kartari tinantas in under 5 seconds. Incremental compile + generate takes about 15 seconds.

- It can be compiled to WebAssembly, which means that with a bit of work, it can run in the browser.

My hope is that vidyut-prakriya can eventually be a comprehensive reference implementation for the Ashtadhyayi, including subantas, tinantas, krdantas, taddhitAntas, chAndasa usage, and rules for svaras. The speed of the library is an important feature here: being able to run hundreds of thousands of test cases lets us make changes with confidence.

If you like व्याकरण and want to help improve this library, please see our Contributing section and consider joining our community. I would be very grateful for your help!

Please don't circulate this post on other large lists just yet. I want to have a WebAssembly demo ready first so that people can experiment with the system in their web browsers.

Arun

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,

Dec 29, 2022, 12:38:06 AM12/29/22

to sanskrit-p...@googlegroups.com

On Wed, 28 Dec 2022 at 11:42, Arun <aru...@gmail.com> wrote:

Thanks for letting me know, Les. I'll update our readme to avoid confusion with your wonderful project.

~

As part of our work on Vidyut, we are releasing vidyut-prakriya, a fast Paninian word generator.

If you remember my post on our Python-based generator from earlier this year, here are the major improvements from that system.

- It's much more comprehensive. We currently have reasonable support for karmani prayoga, and I'll also add support for sanAdi pratyayas by the end of the year. We have experimental support for various krdantas and basic support for subantas.

Added a request https://github.com/ambuda-org/vidyut/issues/15 :

Currently, how are rule conflicts handled in prakriyA simulation? The regular interpretation of विप्रतिषेधे परं कार्यम्, augmented by a web of paribhAShA-s?

Would it be simple to implement an option to resolve such rule conflicts by means of the simpler framework described in Rishi rajpopat's thesis which recently entered the news and fascinated / surprised many? This will be enormously valuable in validating the claims made there, and will likely advance our understanding of what pANini intended + drawbacks therein.

--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sanskrit-programmers/d4480e9c-6023-4671-903c-9a491d6009efn%40googlegroups.com.

--

--
Vishvas /विश्वासः

Arun

unread,

Dec 29, 2022, 1:17:51 AM12/29/22

to sanskrit-programmers

(Replied on GitHub, cross-posting here.)

Sadly it would not be simple -- details below. My current focus is on making vidyut-prakriya fast and correct, and I think exploring this problem is a better fit for a fork of this repo. [Edit to add: I think this problem is a great one, and I am happy to help someone use our library to explore it.]

~

Right now, I use a similar approach to SanskritVerb: I hand-code a specific rule ordering. I haven't followed any specific philosophy except "produce the correct padas with a reasonable-looking prakriya." But since this project is heavily inspired by SanskirtVerb (which, I believe, draws on the work of Smt. Pushpa Dikshit), in practice I am using a पौष्पी प्रक्रिया.

Regarding rule selection, my thinking was: as long as the major sections of the Ashtadhyayi are unimplemented, modeling the resolution of rule conflicts is extra complexity and a new source of bugs and degraded performance. For my current needs, I want to focus first on generating the correct forms quickly with reasonable prakriyas.

However, I agree that modeling rule conflicts is tremendously useful for the reasons you mention.

Due to the substantial nature of these changes, I think a fork is best; but, such a fork could lean on this library's rich APIs and exhaustive test suite. So perhaps something like this would be workable:

Refactor each rule so that it's in its own function. Then, create a new function that receives the name of a rule and runs the associated function on the current prakriya. This lets us have dynamic control flow.
Find a way to examine the "meta" aspects of each rule: which properties they select on, which samjnas they use, which changes they cause, etc. The only academic work I'm familiar with in this vein is here, but I don't know if it's public. Otherwise, some work might be required to create an inspectable representation of each rule. (One hacky approach might be to walk the AST we create in (1).)
Implement a function that accepts a prakriya and returns the rule that should be run. This is the core logic we would test.
Validate the implementation of (3) against our test suite.

This procedure is quite promising because each step can lean on our test suite, so the developer can always know that the overall system is in a reasonable state. While it would still involve quite a bit of effort, it's far less effort than writing a system from scratch.

Arun

unread,

Jan 1, 2023, 8:50:15 PM1/1/23

to sanskrit-programmers

Thanks to Shreevatsa, we now have an online demo of vidyut-prakriya available:

https://ambuda-org.github.io/vidyullekha/

This demo runs entirely in the browser. I've noticed that Safari is quite a bit slower than Chrome and Firefox, though.

Currently, parasmaipada / Atmanepada forms are mixed in the same table, which is confusing. Once this is fixed, I'll circulate the tool more widely.

Arun

Dhaval Patel

unread,

Jan 1, 2023, 10:55:13 PM1/1/23

to sanskrit-p...@googlegroups.com

I checked the tool. Kind request to show the text of the sutra along with the sutra number.

To view this discussion on the web visit https://groups.google.com/d/msgid/sanskrit-programmers/d823ddc5-6462-4b45-83d0-b68540bda3f8n%40googlegroups.com.

--

Dr. Dhaval Patel
www.sanskritworld.in

Dhaval Patel

unread,

Jan 1, 2023, 10:58:18 PM1/1/23

to sanskrit-p...@googlegroups.com

Also at the end of the prakriya, it would make sense to show the final form as the last step.

Arun

unread,

Jan 5, 2023, 11:07:34 PM1/5/23

to sanskrit-programmers

Thanks -- we'll take this feedback into account and update our demo soon.

I have made some small changes to the demo, and although it still could use more work, I believe it is ready to share out more widely. I have posted to samskrita but don't have posting rights on bvparishat, and I am sure there are also other interested lists I am not aware of.

I would be grateful if members of the list, if they deem this project worthy, could share the below with whoever they felt would find it interesting:

~

नमो विद्वद्भ्यः --

vidyut-prakriya generates Sanskrit words by applying Paninian rules step-by-step. Our long-term goal is for the program to generate all valid Paninian words.

Summary

vidyut-prakriya is heavily indebted to the SanskritVerb generator from Dr. Dhaval Patel, I.A.S and Dr. Shivakumari Katuri, and we are grateful to the authors for their encouragement in this project.

If you are familiar with SanskritVerb, vidyut-prakriya offers the following improvements:

- It adds many more forms (जुगुप्सते etc, अतत etc, -आम्बभूव etc.)

- It fixes various small bugs.

- Its prakriyas generally have more detail, especially for it-Agama rules.

- It can run in a web browser without an internet connection.

- It is much faster. On my laptop, vidyut-prakriya can generate all kartari-tinantas in about 4 seconds. This speed is especially useful for testing and for natural language processing tools.

- It has partial support for sanAdi forms. These have not been tested very much, so please use with caution.

Code: https://github.com/ambuda-org/vidyut/tree/main/vidyut-prakriya

Demo: https://ambuda-org.github.io/vidyullekha/ -- click on a dhatu to see its tinanta padas, and click on a pada to see its prakriya.

Notes:

We are sharing our system publicly so that we can collect feedback and better discover bugs. Please share it widely so that we can collect more feedback.

- For bug reports, please use https://github.com/ambuda-org/vidyut/issues .

- We are eager to partner with scholars and experts to better test our system. If you are interested in working with us more closely, please contact us at https://ambuda.org/contact.

- We have partial support for subantas and krdantas, but these are not available in the demo yet.

- Testing was done by comparing our program to the output of SanskritVerb. Most differences have been accounted for and are (we believe) in vidyut-prakriya's favor, but a few small errors likely remain.

Regards,

Arun Prasad

Arun

unread,

Jan 8, 2023, 12:52:50 PM1/8/23

to sanskrit-programmers

Now including कृदन्त prakriyas:

https://ambuda-org.github.io/vidyullekha/

Next steps for the generator:

- add support for सोपसर्ग dhatus

- add test cases from various grammar books. English-medium books tend to organize words by rule as opposed to by dhatu, which suits the kinds of tests I want to write. So I'm reading through Kale's notes on कृत्-प्रत्ययs and adding all of the words he mentions.

Next steps for the demo:
- add sutra text

- add support for shareable URLs

Next steps for Vidyut as a whole:

- update vidyut-py with better bindings for all of our new code

Next steps for Ambuda:

- use all of these words in our dictionary and padaccheda engine

Learn Sanskrit

unread,

Jan 10, 2023, 12:27:37 AM1/10/23

to sanskrit-programmers

The demo link now includes sutra texts and a copy of the final word. I am now working on improving सोपसर्ग-तिन्तानि/-कृदन्तानि, सनाद्यन्तानि, and verbs in कर्मणि/भावे prayogas. After that, I'll add सुबन्तानि to the demo and return to work on the rest of Vidyut.

I am looking for interested volunteers who can help find errors in the program.. I've written up some details on getting started in this Reddit post.

Arun

unread,

Jan 22, 2023, 7:41:18 PM1/22/23

to sanskrit-programmers

Our Python bindings are mature enough that they are ready for general usage. Here's our documentation:

https://vidyut.readthedocs.io/en/latest/

Please feel free to ask questions, file bugs, and make feature requests on our issues tracker: https://github.com/ambuda-org/vidyut-py/issues

For everything else, see the links in our readme: https://github.com/ambuda-org/vidyut-py

Arun

Reply all

Reply to author

Forward