Experience so far with corrections and proofreading

15 views
Skip to first unread message

Suhas Bharadwaj

unread,
Dec 13, 2021, 6:12:02 PM12/13/21
to sanskr...@googlegroups.com
Namaste everyone,

I've been working on corrections in the OCR text of Śri Pāndurang Vāman Kāṇe's History of Dharmaśāstra as part of Kalpāntaram for just above two months now and I wanted to share my experience so far with you all.


The work itself was hard and time taking initially, but gradually it became easier once I got the hang of things. The ease of work (or lack thereof) aside, the biggest takeaway for me is the enormous knowledge about various Dharmasūtras, and insights into the works of various Ṛṣis like Baudhayāna, Vasiṣṭha, Āpastamba, etc. Not only that but also insights into the life in ancient times and the various codes of law. As I have gone from chapter to chapter, I've been thoroughly enjoying not only the Dharmasūtras themselves but also Ācārya Kāṇe's writing, reasoning and constructions of timelines of the various ṛṣis and their works. On top of all this, it feels great contributing to works that can be used by future generations of enthusiasts and scholars.


I initially joined when someone tweeted about it and it seemed interesting (as I am sure some of you have), but then I was blown away by the sheer breadth of material on Dharma that was out there, lying unindexed and unsearchable. So I started working on Chapter 1 right away and after I was done, I enlisted a couple of friends and cousins onto the project too. We are all making as much progress, but it'd be great to have some more, if not all, hands on deck and I write this mail as a motivation/invitation for all of you.


To help newcomers, I thought I'd list the following tools and methods I figured out along the way that made my work way easier and faster:


1. Using the English (India) language pack and keyboard on Windows. You can find it on the Task Bar here:

image.png

Or install it as follows if you can't find it:

image.png

image.png

image.png


Once you have this, you should be able to use Ctrl and Alt keys for IAST letters. For example, Ctrl + Alt + A will give ā and Ctrl + Alt + Shift + A will give Ā. The list of keyboard shortcuts that you can use after installing the language pack can be found here.


However, this language pack misses support for some letters. For these, I used a program called AutoHotKey, which can be downloaded from here. The simple AutoHotKey script I wrote for this purpose can be found here.


2. Using the Windows clipboard. This can be activated by using the shortcut Win + V (Windows key on the keyboard + V), and it will have the list of recently copied text that you can reuse wherever you want, instead of typing again.


3. The IAST to Devanāgarī keyboard by Lexilogos lets you quickly type out any Sanskrit text in IAST syntax and then copy it. It can be found here.


4. sanksritdictionary.com's excellent OCR tool takes an image and gives out Devanāgarī text. It is pretty accurate, but any mistakes can be rectified by pasting the text in the Lexilogos keyboard mentioned above. It can be found here. You can use Snipping Tool or Snip & Sketch (Win Key + Shift + S) on Windows (or the equivalents on Mac and Linux) to snip the required part of the original PDF that you want to fix and paste it in the OCR tool.


5. In some cases, I had to transliterate Devanāgarī into IAST and for that, I used this converter by ashtangayoga.info, which supports conversions between various formats of text.


6. Finally, for any typos that slip on the first pass of reading, I use the Grammarly extension on Chrome to check if there are any typos. The OCR reader makes some common mistakes like reading 'e' as 'o', 'o' as 'c', 'b' as 'h', etc. These are caught by Grammarly (or any spell checker).


For people on other platforms such as Linux and Mac, let me know, and I will try to make a cheatsheet/tutorial for you as well.


As a side note, for those interested in working on History of Dharmaśāstra, here are some of the useful resources I found (so far) for reference:


1. Sanskrit Wikisource: https://sa.wikisource.org/




3. Āpastamba Gṛhyasūtra (Devanagari and IAST text): https://www.wisdomlib.org/hinduism/book/apastamba-grihya-sutra-sanskrit




5. Gautama Dharmasūtra (Devanagari text, but no text encoding. Cannot copy text directly, can search using śloka numbers do OCR though): http://www.hinduonline.co/vedicreserve/kalpa/dharma/gautama_dharma_sutra.pdf


P. S. Please feel free to reach out to me if you need any help regarding the project.


Thanks and regards,
Suhas Bharadwaj

Reply all
Reply to author
Forward
0 new messages