Digital Corpus of Sanskrit

140 views
Skip to first unread message

Oliver Hellwig

unread,
Jul 18, 2016, 3:42:49 PM7/18/16
to samskrita
[Sorry for cross-posting]

Dear all,

a new version of the Digital Corpus of Sanskrit has come out, which contains, among other texts, the complete morphological and lexical annotation of the Mahabharata except for three prose chapters.

Although you are still redirected from the old URL, you may note the new web address:
http://kjc-sv013.kjc.uni-heidelberg.de/dcs/index.php

A few notes on the new release:
(1) I find the multi-word search rather useful: You can now search for text lines that must contain two or more lemmata (click on the "Add to multi-word q." links after a search result on the query page). To start with, try something popular such as rāma and sītā; will display all text lines that contain any inflected form of rāma and sītā.
(2) Global and text dictionaries have been merged into one. Contrary to former versions, the lexicographic database now contains all lemmata given in my digital dictionary, even if they don't occur in a text.
(3) You should, in principle, be able to type IAST Unicode directly in the search interface.
(4) The information contained under "Similar and related words" is only a gimmick at the moment, at least for less popular words. It displays the cosine similarity between neural embeddings built with word2vec (https://en.wikipedia.org/wiki/Word_embedding for more information). They seem to capture some semantic similarites; check, for instance, 'rāma' or 'gam'.
(5) This release relies heavily on JavaScript. The website will look quite unresponsive when JS is deactivated in your browser.
(6) Access to parts of the semantic annotation layer will be added in the next weeks.

I'm considering quite seriously to make this version of the DCS open source. If you are interested in collaborating, please send me your github user name, so that I can invite you to the project.


Oliver

---
Oliver Hellwig, University of Düsseldorf, Germany

Margie Parikh

unread,
Jul 19, 2016, 1:42:37 AM7/19/16
to samskrita
Dear Professor Hellwig,

Thank you for sharing the update. I visited the site and was fascinated by the richness of its content. 
Out of curiosity, I searched for tagged parallels for Gautamadharmasutras. A list of parallels appeared (Manusmruti and Mahabharata having 16 and 14 parallels). However, Gautamadharmasutra also appeared, and showed 10 parallels. I wondered how this correspondence could have been arrived. I would appreciate it if you or any one else in the forum could shed light.

With my namaskara to everyone on Guru Purnima,
margie

Oliver Hellwig

unread,
Jul 19, 2016, 2:56:22 PM7/19/16
to samskrita
Hi Margie, the program looks at lines just as sequences of words. If it finds two lines in the same text that resemble each other, it marks them as parallels.

Oliver
Reply all
Reply to author
Forward
0 new messages