Vidyut is a Sanskrit toolkit written in Rust. Version 0.3 of our Python bindings brings powerful new functionality to Sanskrit programmers, particularly around generating and querying Sanskrit words.
Vidyut 0.3 is not a final product. Instead, it is another big step toward providing reliable digital infrastructure for all Sanskrit programs. There are a lot of exciting features on our roadmap, and I hope to ship them to you soon.
vidyut.prakriya, a Sanskrit word generator
vidyut.prakriya, which powers many of the derivations on
ashtadhyayi.com, is an
interface to the Ashtadhyayi that creates words along with their derivations. It provides excellent support for tinantas,
krdantas, subantas, and taddhitantas and partial support for samasas and accent.
Vidyut
0.3 brings a greatly improved and expanded API along with hundreds of
small bug fixes. It also integrates a variety of performance
improvements that make word generation screamingly fast. (On my laptop, I
can generate all 356,961 kartari-tinantas in around 2 seconds.)
Future releases will add more rules and stronger support for accent rules.
vidyut.kosha, a morphological store
vidyut.kosha provides a space-efficient morphological dictionary that stores close to 100 million Sanskrit words in less than one byte per word. Our kosha includes:
- all dhatus from the Dhatupatha on Ashtadhyayi.com
- all combinations of upasarga + dhatu from the Upasargarthacandrika
- various combinations of sanadi pratyayas, including णिच्, सन्, यङ्, यङ्-लुक्, णिच्-सन्, and सन्-णिच्
- all combinations of (prayoga, lakara, purusha, vacana) for these dhatus
- all combinations of (krt, linga, vibhakti, vacana) for these dhatus, including स्य-शतृ स्य-शानच्, and यक्-शानच्
- various pratipadikas and avyayas scraped from Sanskrit dictionary files, inflected for (linga, vibhakti, vacana)
Future releases will add more words, more data, and more API options.
vidyut.lipi, a new transliterator
vidyut.lipi aims to provide the correctness of Aksharamukha with the ease of use of indic_transliteration. It has a robust test suite and supports various edge cases, such as Unicode normalization, Grantha numeral transliteration, and support for the ISO ':' separator.
Future releases will continue to improve quality. There are many transliterators available today, but we hope to create a best-in-class transliterator available throughout the stack.
vidyut.chandas, a metrical classifier
vidyut.chandas classifies a variety of Sanskrit meters. It is not quite state-of-the-art, but it is easy to use and comes provided as an extra in this release.
Future releases will match the best-in-class solutions available today.
Regressions in vidyut.cheda
To avoid getting stuck in an even longer development cycle, we have shipped despite introducing some quality reductions to vidyut.cheda, which we will address in our next release as time allows.
Now that neural segmentation engines are readily available, we think that vidyut.cheda is less important and are considering deprecating it.
Arun