हेमाब्दानुशासनलधुन्यास a short comm- entary on Hemacandra'sSabdanu- Sव्रsana written by Devendrastri. हेमाब्दानुशासनव्राते a short gloss call- ed अवचूरि also, written by a Jain grammarian नन्दसुन्दर on the ईम- इब्दानुद्भासन. _ ह्यस्तनी imperfect tense; a term used by ancient grammarians for the affixes of the immediate past tense, but not comprising the present day, corresponding to the term लङ्क of Pafini. The term is found in the Katantra and Haima- candra grammars; cf. Kt. III. 1.23, 27; cf. Hema. III. 3.9. इस्व short, a term used in connec- tion with the short vowels taking a umit of time measured by one matra for their utterance: cf. ऊकालेोज्इरस्वदीर्घप्लुत: P. I. 2.27.
This should be replaced with (note bolded letters such as श ह्र which have been fixed.):############ हेमाशब्दानुशासनलधुन्यास a short comm- entary on Hemacandra's Sabdanu- Sव्रsana written by Devendrastri. हेमाब्दानुशासनव्राते a short gloss call- ed अवचूरि also, written by a Jain grammarian नन्दसुन्दर on the हेम- शब्दानुद्भासन. _ ############ ह्यस्तनी imperfect tense; a term used by ancient grammarians for the affixes of the immediate past tense, but not comprising the present day, corresponding to the term लङ्क of Pafini. The term is found in the Katantra and Haima- candra grammars; cf. Kt. III. 1.23, 27; cf. Hema. III. 3.9. ############ ह्रस्व short, a term used in connec- tion with the short vowels taking a umit of time measured by one matra for their utterance: cf. ऊकालेोज्इरस्वदीर्घप्लुत: P. I. 2.27.
--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I observed that https://archive.org/stream/ADictionaryOfSanskritGrammarByMahamahopadhyayaKashinathVasudevAbhyankar/DictionaryOfSanskritGrammar_abhyankar_djvu.txt already has a crude OCR which does not recognize devanAgarI.Luckily using the wonderful infrastructure and a couple of hundred machines I have access to at my workplace, I was able cobble together something to get a better OCR in about an hour - https://raw.githubusercontent.com/sanskrit-coders/stardict-sanskrit/master/sa-head/abhyankar-grammar/abhyankar-grammar-gocr.txt . Now, all that remains is for someone to:
1] Mark new headwords with a string - say "############".
2] Fix egregious errors - especially in the headwords - to facilitate lookup. Typo errors in the meanings are more tolerable (usually the fixes are obvious to the reader).
Marking the keywords can be better done automatically using find and replace in regular expression mode-find \.\n([^a-zA-Z0-9\,\;\(\)\[\]\-\{\}])
Replace with .\n\n>>> \1
>>> तत्र सीर्यभगवतेाक्तमनिधिज्ञी वाडवः पठति |
इष्यत एव चतुर्मात्रः त: M. Bh. on
P. VIII. 2.106 Vart. 3.
2016-02-25 21:34 GMT-08:00 Anunad Singh <anu...@gmail.com>:Marking the keywords can be better done automatically using find and replace in regular expression mode-find \.\n([^a-zA-Z0-9\,\;\(\)\[\]\-\{\}])
Replace with .\n\n>>> \1
Can almost be done automatically! But see below:
>>> तत्र सीर्यभगवतेाक्तमनिधिज्ञी वाडवः पठति |
इष्यत एव चतुर्मात्रः त: M. Bh. on
P. VIII. 2.106 Vart. 3.
This is not a headword.
But if you're able to mark the headwords (mostly) through a series of regex replacements - please do so! Even with all the errors, the output of such an effort can still be used to produce an early version of a very useful stardict dictionary.
----
Vishvas /विश्वासः
I feel the 'automatic' replacements be done centrally by one person. Regarding the error pointed by you, it (and similar other errors caused by the previous step) can be undone byfind cf.\n\n>>>\sreplace with cf.\nThere are some systematic errors which also can be cleared in semiautomatic way. Among them, first one is replacement of visarga where ever it comes after non-Devanagari characters. The second one it replacing halanta where ever it comes after non-Devanagari characters.
If someone already started editing a page at a time, what is the way to pull it out, globally replace and then continue further along. May be we should have a set of standard replacements for all OCRed documents before we upload them on wikisource? (please correct if I am missing some step here)?
From a process standpoint, is there anyway on wikisource to distinguish between pages that an individual proofreader has proofread it and a subsequent second-pass reviewer? I wanted to check that before marking something as "proofread".
--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.