[Sorry for cross-posting]
Dear all,
a new version of the
Digital Corpus of Sanskrit has come out, which contains, among other
texts, the complete morphological and lexical annotation of the
Mahabharata except for three prose chapters.
Although you are still redirected from the old URL, you may note the new web address:
http://kjc-sv013.kjc.uni-heidelberg.de/dcs/index.phpA few notes on the new release:
(1)
I find the multi-word search rather useful: You can now search for text
lines that must contain two or more lemmata (click on the "Add to
multi-word q." links after a search result on the query page). To start
with, try something popular such as rāma and sītā; will display all text
lines that contain any inflected form of rāma and sītā.
(2) Global
and text dictionaries have been merged into one. Contrary to former
versions, the lexicographic database now contains all lemmata given in
my digital dictionary, even if they don't occur in a text.
(3) You should, in principle, be able to type IAST Unicode directly in the search interface.
(4)
The information contained under "Similar and related words" is only a
gimmick at the moment, at least for less popular words. It displays the
cosine similarity between neural embeddings built with word2vec (
https://en.wikipedia.org/wiki/Word_embedding for more information). They seem to capture some semantic similarites; check, for instance, 'rāma' or 'gam'.
(5) This release relies heavily on JavaScript. The website will look quite unresponsive when JS is deactivated in your browser.
(6) Access to parts of the semantic annotation layer will be added in the next weeks.
I'm
considering quite seriously to make this version of the DCS open
source. If you are interested in collaborating, please send me your
github user name, so that I can invite you to the project.
Oliver
---
Oliver Hellwig, University of Düsseldorf, Germany