Vidyut is a lightning-fast toolkit for processing Sanskrit text. Vidyut aims to provide standard components that are fast, memory-efficient, and competitive with the state of the art.
Vidyut compiles to native code and can be bound to your language of choice. As part of our work on Ambuda, we provide first-class support for Python bindings through vidyut-py.
Lexicon maps Sanskrit words to their semantics with high performance and minimal memory usage. In one recent test, we were able to store 29.5 million inflected Sanskrit words in 31 megabytes of disk space for a total cost of around 1 byte per word, and we were able to retrieve these words at around 820 ns/word, as compared to 530 ns/word for a standard in-memory hash map.
Segmenter performs a padaccheda on a Sanskrit phrase and annotates each pada with its basic morphological information.
Segmenter is not yet competitive with other options, but we are optimistic that we can improve it over time. What is quite special, however, is its sheer speed: Segmenter can process a shloka in under 10 milliseconds, and we expect it to become even faster in the future.
Thanks for letting me know, Les. I'll update our readme to avoid confusion with your wonderful project.~As part of our work on Vidyut, we are releasing vidyut-prakriya, a fast Paninian word generator.
If you remember my post on our Python-based generator from earlier this year, here are the major improvements from that system.- It's much more comprehensive. We currently have reasonable support for karmani prayoga, and I'll also add support for sanAdi pratyayas by the end of the year. We have experimental support for various krdantas and basic support for subantas.
Currently, how are rule conflicts handled in prakriyA simulation? The regular interpretation of विप्रतिषेधे परं कार्यम्, augmented by a web of paribhAShA-s?
Would it be simple to implement an option to resolve such rule conflicts by means of the simpler framework described in Rishi rajpopat's thesis which recently entered the news and fascinated / surprised many? This will be enormously valuable in validating the claims made there, and will likely advance our understanding of what pANini intended + drawbacks therein.
--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sanskrit-programmers/d4480e9c-6023-4671-903c-9a491d6009efn%40googlegroups.com.
Sadly it would not be simple -- details below. My current focus is on making vidyut-prakriya fast and correct, and I think exploring this problem is a better fit for a fork of this repo. [Edit to add: I think this problem is a great one, and I am happy to help someone use our library to explore it.]
~
Right now, I use a similar approach to SanskritVerb: I hand-code a specific rule ordering. I haven't followed any specific philosophy except "produce the correct padas with a reasonable-looking prakriya." But since this project is heavily inspired by SanskirtVerb (which, I believe, draws on the work of Smt. Pushpa Dikshit), in practice I am using a पौष्पी प्रक्रिया.
Regarding rule selection, my thinking was: as long as the major sections of the Ashtadhyayi are unimplemented, modeling the resolution of rule conflicts is extra complexity and a new source of bugs and degraded performance. For my current needs, I want to focus first on generating the correct forms quickly with reasonable prakriyas.
However, I agree that modeling rule conflicts is tremendously useful for the reasons you mention.
Due to the substantial nature of these changes, I think a fork is best; but, such a fork could lean on this library's rich APIs and exhaustive test suite. So perhaps something like this would be workable:
This procedure is quite promising because each step can lean on our test suite, so the developer can always know that the overall system is in a reasonable state. While it would still involve quite a bit of effort, it's far less effort than writing a system from scratch.
To view this discussion on the web visit https://groups.google.com/d/msgid/sanskrit-programmers/d823ddc5-6462-4b45-83d0-b68540bda3f8n%40googlegroups.com.