skorg-mode: a sanskrit extension for emacs org-mode

85 views
Skip to first unread message

Sebastian Nehrdich

unread,
May 21, 2016, 4:08:47 PM5/21/16
to sanskrit-programmers
Hey folks,
I wrote a small extension for emacs making it possible to manually tag sanskrit. It also works as a grammatical analyzer and has functionallity for syntax highlighting and saving tags across sessions, so maybe it is interesting for some of you (I do not know wether there are any emacs folks here).
There is also transliteration functionality and sandhi dissolving which can be used outside of the tool. You can check it out here and see the full list of features: http://sanskrit-db.de/blog/pages-output/skorg/
But be aware that this is in alpha and I am not a good programmer at all. :) I just thought I might post it here because it could be useful for other people as well.
I wrote it to power my website where I upload annotated sanskrit translations for our students at the university, you can check them out here: http://sanskrit-db.de/files (they are of course not of high quality and most are in german language anyway).
With best wishes from Germany,

Sebastian Nehrdich

Shreevatsa R

unread,
May 22, 2016, 1:37:11 AM5/22/16
to sanskrit-programmers, nehr...@gmail.com
Thanks for sharing!

This is a beautiful piece of technology, and I'm really happy to see the tools for Sanskrit advancing. If I were coding this I would probably have written this whole thing in Javascript so that it can be used in a browser (as I see with your "plans for the future"), but personally org-mode is where I live most of the time, so I'm not complaining. :-)

And I can report that it also works well on Mac.

This is what I did:
- Had to extract skorg-0.0.1.tar.bz2 to a directory in my Emacs load-path
- Added (require 'skorg) to .emacs
- (Maybe I have the auto filename thing turned off) Manually did M-x skorg-mode

I'm still getting the sdcv feature figured out -- which dictionaries do you use, personally?

The functionality seems to work well, I have some questions about the Sanskrit tagging part.

- I input dharma|kSetre and hit 7, so it looks like dharma|kṣetre in green. But the grammar tag still says:

(cp "dharmakSetre" (
(f "dharma" (iic) (s ("dharman" . ""))) 
(f "kṣetre" (iic) (s ("kṣetra" . "")))) ((na voc du neu) (na loc sg neu) (na nom du neu) (na acc du neu)) (pos 0 0) (s ("dharmakSetra" . "")))

so it seems to have 4 options voc du neu, loc sg neu, nom du neu, acc du neu still? Shouldn't it have got resolved into just loc sg neu?

- I am yet to figure out how to properly deal with certain kinds of sandhi. E.g. how to tag pANDavAzcaiva (pāṇḍavāścaiva)?

- It would be nice to document the syntax of these grammar tags and the abbreviations, so that more people can use it. (The cp, f, iic, s, voc, loc, neu etc.)

Thanks again -- it's really amazing how functional it is with so few lines of code!





--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sebastian Nehrdich

unread,
May 22, 2016, 4:41:40 AM5/22/16
to sanskrit-programmers, nehr...@gmail.com

Dear shreevatsa,

Thank you very much for your reply, you adress some important issues.
First of all it is nice to see that it was not difficult for you to install it on OSX.
To use the sdcv-shortcuts you need to have stardict dictionaries in HK-transliteration installed. I created one from Monier Williams and one from apte but I am not sure about licensing, this is why I didn't upload them. But as soon as I find out I will do so. I am sure they are also to be found somewhere in the internet.
About the tagging of the compound dharma|kSetre: The part of the code that reduces the tags to the valid ones works on single words, but not on compounds yet. So it is a bug and I am working on it... Thank you very much for pointing it out, it reminds me that some work still has to be done!
Also currently the fine tagging is limited to the precision that the xml-files of Gerard Huet offer, so when we for example tag a file as accusative and the tag of the xml includes the nominative as well (as it is the case with many neuter forms), this nominative tag remains visible.
But I think this is also solvable.
About your request to the documentation of the used tags: Have a look here: http://sanskrit-db.de/files/sl-morph.dtd
Everything is explained in that file, but turning that into a handy pdf, a table or something like that could be a great help for the user. So that's another item on the TODO-list.
So thank you very much for your feedback and I am glad to see that the tool is somewhat useful to you. As you pointed out some important problems I am motivated to work on them in order to improve it.

But yes you are right, on the long run it is best to rewrite it using web technologies, as this makes the tool useful for people who are not into emacs. I hope that I can start my work on that during the summer. :)
Currently I just wish to fix the remaining issues with skorg-mode and once that's done it's time to move on.
With best wishes,

Sebastian
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-programmers+unsub...@googlegroups.com.

dhaval patel

unread,
May 22, 2016, 7:23:01 AM5/22/16
to sanskrit-p...@googlegroups.com, nehr...@gmail.com
The tool is excellent.
I love it. Good work.

To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Dr. Dhaval Patel, I.A.S
Collector and District Magistrate, Anand

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
May 22, 2016, 11:25:28 AM5/22/16
to sanskrit-programmers, nehr...@gmail.com

2016-05-22 1:41 GMT-07:00 'Sebastian Nehrdich' via sanskrit-programmers <sanskrit-p...@googlegroups.com>:
To use the sdcv-shortcuts you need to have stardict dictionaries in HK-transliteration installed. I created one from Monier Williams and one from apte but I am not sure about licensing, this is why I didn't upload them. But as soon as I find out I will do so. I am sure they are also to be found somewhere in the internet.

​Indeed! Though it uses "Optitrans" transliteration rather than HK : https://github.com/sanskrit-coders/stardict-sanskrit/tree/master/sa-head/



--
--
Vishvas /विश्वासः

Shreevatsa R

unread,
May 23, 2016, 6:23:54 AM5/23/16
to sanskrit-programmers, Sebastian Nehrdich
Thanks Sebastian, that makes sense.

I think what would be nice is a way to
(1) have the automatically generated fine tagging explicitly saved in the file or an associated file (right now I'm not sure where the persistent hashtable is), and 
(2) be able to manually edit the pos tagging, for the occasional problematic cases.

To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.

Sebastian Nehrdich

unread,
May 27, 2016, 2:51:50 PM5/27/16
to sanskrit-programmers, nehr...@gmail.com

Hello everybody,

I have to excuse for my late reply. At the same time I want to say thankyou for the stimulating input! I
I just found a little bit of time fixing the fine-grained tagging of compounds, it is now working (although I had no time to extensively test it for bugs).
Also I noticed that the sandhi-code is not complete yet, some of the less common rules are missing.
I just updatet the tarball, so the url remains the same: http://sanskrit-db.de/files/skorg-0.0.1.tar.bz2
shreevatsa: I just saw your question regarding the sandhi between words that are not a compound should be written together:
take for example from the mahaabharata 'mamedamiti' without sandhi this should be written 'mama idam iti'.
to indicate this to the tagger use a . (dot) between the words. So inputting 'mama.idam.iti' is rendered by the sandhi-engine to 'mamedamiti'
About this questions:
> (1) have the automatically generated fine tagging explicitly saved in the file or an associated file (right now I'm not sure where the persistent hashtable is), and 
> (2) be able to manually edit the pos tagging, for the occasional problematic cases.
1. Yes true. Currently the persistent hashtable lies within the skorg package folder and grows there as time goes by. It is not really a database but better than nothing (about as good as it gets with pure elisp).
In the initial stages I was using a system based on xml to store the tags in the file that was edited (so the edited file was the database at the same time). That turned out to be rather ugly as I had to write my own code to 'hide away' the xml from the user. that resulted a mass of bug-ridden code.
So I decided to just use org (as org is a great markup and very nice to use) and keep the data seperately. Way more simple and not so likely to introduce new bugs.
But however how to transfer data between two skorg-systems is not yet solved. I imagine to just simply write a command that adds the numbers to the words in a batch run for the whole file (so a sentence would look like 'aham1 atra9 āgatavān1') and the client when reading can do the fine-grained tagging. :)
But to return to your question: In an ideal world we would be able to export skorg to pdf via latex, including the grammar-tags. This functionality I even had implemented at some point, but the code is not working currently. But being able to export to latex would be kind of great.
I think the most elegant way to do this would be to incooperate that part on the webserver, of sanskrit-db.de, as we cannot expect every user to have a working latex installation with devanagari running and this would also mean that once I rewrite the tagger in   javascript I do not have to rewrite that code as well.

2. This was also possible at some point, but I dropped it again because the basic questions of the design of the tagger where not solved and when I changed a little bit, this feature was usually the first thing that broke So I dropped it at some point.
But yes right there should be a way to tackle it, especially with regards to vedic sanskrit or buddhist sanskrit. In that field the tagger is doomed to fail, as we only have little reliable morphological data collected currently.
So thanks a lot for the stimulating input, thats giving me some inspiration to push the development further! :)
With best wishes,

Sebastian
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-programmers+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-programmers+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages