Re: Texts for OCR conversion - संपादित करने के लिए आमंत्रण

37 views
Skip to first unread message

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Mar 19, 2016, 12:32:11 PM3/19/16
to shubha zero, sanskr...@googlegroups.com
साधु मान्ये। तदत्र निवेदयामि यत् pdf-सञ्चिकासङ्केतयुक्तं स्तम्भं योजयत्व् इति। अन्यथा ऽस्मत्क्रिया कुटिलतरा भवति।

2016-03-19 0:56 GMT-07:00 shubha zero (Google पत्रक द्वारा) <shubh...@gmail.com>:
shubha zero ने आपको निम्न स्प्रैडशीट को संपादित करने के लिए आमंत्रित किया है:
Unknown profile photoविश्वासवर्य,
संस्कृतविकिस्रोतसि येषां ग्रन्थानाम् अनुक्रमणिकावली मया सज्जीकृता वर्तते तेषां विवरणमत्र विद्यते । सद्यः लब्धेन sanskritnlpbot यन्त्रोपाधिना एतेषां परिवर्तनं कृत्वा दातुमर्हति चेत् बहु उपकाराय भवति ।
शुभा
Google पत्रक: ऑनलाइन स्प्रैडशीट बनाएं और संपादित करें.Google पत्रक के लिए लोगो



--
--
Vishvas /विश्वासः

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Mar 21, 2016, 2:31:37 PM3/21/16
to shubha zero, dhaval patel, Shrivathsa B श्रीवत्सो ब्रह्मा, sanskr...@googlegroups.com
साधु! सप्ताहद्वयं तु बहिर् भवितास्मि, ततोऽचिरात् प्रारभिष्यामहे OCR-पाठम् उपारोपयितुम्।

+dhavala-shrIvatsau - भवन्तावपि sanskritocr-युक्तावत्र सहकर्तुम् अवकाशवन्तौ वा OCR-सृष्टौ (तदुपारोपणं त्व् अहं स्वयं करिष्यामि)?

2016-03-21 0:16 GMT-07:00 shubha zero <shubh...@gmail.com>:
pdf-सञ्चिकासङ्केतयुक्तं स्तम्भं योजितवती अस्मि महोदय ।

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Mar 21, 2016, 2:46:27 PM3/21/16
to Shrivathsa B, shubha zero, sanskr...@googlegroups.com, dhaval patel
ah - fixed permissions.

samskrita bhAratI wikisource project volunteers want to upload lots of texts to wikisource and proofread them (using the side-by-side comparison system described here). If we get ocr-ed text, I have a bot to upload it page by page to facilitate this proofreading. To my knowledge, I have access to some ocr tools, as do you and dhaval - I was wondering if you might be interested in taking up the task of OCR-ing some texts (as many as you like any given week).


2016-03-21 11:38 GMT-07:00 Shrivathsa B <shrivath...@gmail.com>:

The sheet wasn't accessible. I will be grateful if you tell me (in English) what is expected of me. The Sanskrit based tech discussion below has me at sea.

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Mar 23, 2016, 1:12:15 AM3/23/16
to Shrivathsa B, shubha zero, dhaval patel, sanskr...@googlegroups.com
Dear shrIvatsa, I'm very excited by your positive response. In my linux computer, exploding the pdf into images is a simple matter of running a command - it turns out that the case with Windows (which you presumably use for sanskritocr) is similar - please see step 2b here in this guide by dhaval - https://github.com/sanskrit-coders/sanskrit-ocr-r0/issues/8 . It is far more convenient than have another person do the image extraction and upload to dropbox etc.. (oh the wait) . I hope you're now convinced that owning the pdf-to-OCR-text pipeiline fully is the better route. Please confirm.

Ok - that apart, it'd be great if you could please join https://groups.google.com/forum/#!forum/sanskrit-ocr (very low traffic) - it helps us coordinate the project and keep everyone on the same page.


2016-03-22 1:49 GMT-07:00 Shrivathsa B <shrivath...@gmail.com>:

If you upload the images on to Dropbox and share the link, I can commit to 1000 pages of OCR in a week (I can do more, but this is what I can commit to).

For this, I need the images to be named as 0001, 0002 to 0999, 1000. If your numbering is wrong, the images may not get converted.

If the images are of good quality (min 200 dpi), the OCR output will be good and will significantly reduce proof reading effort. this is for your info.

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Mar 23, 2016, 5:50:32 PM3/23/16
to Shrivathsa B, shubha zero, sanskr...@googlegroups.com, dhaval patel
आमोदो मम वर्ततेतराम्।

Ah - you've been trying to use acrobat for image extraction. Perhaps (going by dhaval's report) irfan view is better. (<-- I'm curious how it handles djvu files. Does one need to convert to pdf [possible with irfan view] before extracting images?)

For the first project, can you give ekAgnikANDaH a shot (links here: https://github.com/sanskrit-coders/sanskrit-ocr-r0/tree/master/kalpa/ekAgnikANDaH )? I am curious how good sanskrit ocr with accented text.

let me know your github user id so that I give access to the github repository. Reasons: Once you're done OCR-ing, you just sign in to github,  and upload the text files by dragging them over here. This is superior to dropbox etc.. since it makes it easy to compare outputs, store them indefinitely and so on. Also, we can track these projects on github issues.



2016-03-23 4:18 GMT-07:00 Shrivathsa B <shrivath...@gmail.com>:

hariH OM,
VV,

   The problem I face in extracting the images is that the file numbers thrown up by acrobat are funny. If the book is from DLI, giving the barcode is enough. DLI books are numbered acceptably.

   I will do this nonetheless.

   Applied to join the group.

svasti,
       JAYA BHAVAANII BHAARATII,
                                                      shrivathsa.

Shrivathsa B

unread,
Mar 23, 2016, 8:34:35 PM3/23/16
to vishvAs vAsuki, shubha zero, dhaval patel, sanskr...@googlegroups.com

Will try irfanview.
I don't remember my github ID. Will check and get back.

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Apr 5, 2016, 9:20:31 PM4/5/16
to Shrivathsa B, shubha zero, dhaval patel, sanskr...@googlegroups.com
प्रयोगः कथमासीत्?

गिट्हब्-नाम प्रतीक्षे।

Shrivathsa B

unread,
Apr 5, 2016, 11:58:24 PM4/5/16
to vishvAs vAsuki, shubha zero, sanskr...@googlegroups.com, dhaval patel

Pl give me time till 21 Apr. I am very tied up till then.

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Apr 6, 2016, 12:03:02 AM4/6/16
to Shrivathsa B, shubha zero, sanskr...@googlegroups.com, dhaval patel
Sure, thanks for the update!

shubha zero

unread,
Apr 7, 2016, 5:57:30 AM4/7/16
to विश्वासो वासुकिजः (Vishvas Vasuki), Shrivathsa B, sanskr...@googlegroups.com, dhaval patel
विश्वासवर्य,
प्रणामाः ।

विकिस्रोतसः कार्यशाला अग्रिमशनि-भानुवासरयोः (एप्रिल् ९-१०) बेङ्गलूरुनगरे अक्षरे भविष्यति ।
तत्र उपविंशतिः जनाः भागं वहेयुः । विकिस्रोतसः कार्यविषये पाठनम् अभ्यासश्च भविष्यति ।
तन्निमित्तं  - ocr परिवर्तनं येषु अभवत् तादृशाः केचन ग्रन्थाः अपेक्षिताः । अतः मया प्रेषितावल्याम्
४७ (पतञ्जलिचरितम्) ९७ (बालनीतिकथामाला)  - एतद्वयं सज्जीकर्तुम् अर्हति वा ? लघुग्रन्थावेव ।
एकः वा अपेक्षितः एव ।

अहं गूगल्-ocr द्वारा परिवर्तयितुं प्रयतमाना अस्मि । किन्तु कार्यं न सिद्ध्यति एव । तदपि कार्यशालायां
दर्शनीयमिति आसीत् (कथं परिवर्तयितुं शक्यमिति) । का समस्या इत्येव नावगच्छामि । व्यवस्था परिवर्तिता वा ?
जानाति चेत् कृपया सूचयतु ।

शुभा

Shrivathsa B

unread,
Apr 7, 2016, 6:30:00 AM4/7/16
to shubha zero, vishvAs vAsuki, dhaval patel, sanskr...@googlegroups.com

I have already told you my programme. I am extremely sorry for not being of help. There are too many things to do before I leave Bangalore this Sat.

shubha zero

unread,
Apr 7, 2016, 7:00:48 AM4/7/16
to Shrivathsa B, vishvAs vAsuki, dhaval patel, sanskr...@googlegroups.com
Namaste Shrivathsaji,

I am really sorry. I didn't mean you to do this. Please do go according to your schedule.

I just requested Shri Vishwas to get us one small book as it is needed for the workshop
to be held on 9th and 10th of this month.

Vishwasji,
Please do if if possible. Otherwise nothing to worry.

shubha

shubha zero

unread,
Apr 7, 2016, 11:02:32 AM4/7/16
to Shrivathsa B, vishvAs vAsuki, dhaval patel, sanskr...@googlegroups.com
कार्यं सम्पन्नम् । कापि त्वरा नास्ति । धन्यवादः ।
शुभा

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Apr 7, 2016, 3:53:26 PM4/7/16
to shubha zero, Shrivathsa B, dhaval patel, sanskr...@googlegroups.com
शुभे शुभे,

सादरं नमामि! 🙏

सद्यः परिहृता समस्येति तोषाय। शिवमस्तु युष्मच्छिबिरस्य। 

"गूगल्-ocr द्वारा परिवर्तयितुं प्रयतमाना अस्मि " इति यदुक्तं तत्र कीदृशः प्रयत्नो विहितः? google drive इत्यत्रोपारोपितं वा? तस्य फलमपि दर्शयेद्भवती मह्यम्?

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
May 13, 2016, 7:46:01 PM5/13/16
to Shrivathsa B, shubha zero, sanskr...@googlegroups.com, dhaval patel
namaste shrIvatsa,

were you able to find time? could you ocr this text - https://archive.org/details/brhad-dhatu-kusumakarah , which is very important from the perspective of spoken sanskrit. I already did an ocr with another tool, but I suspect sanskritocr will be better.

Shrivathsa B

unread,
May 17, 2016, 3:24:07 PM5/17/16
to vishvAs vAsuki, shubha zero, dhaval patel, sanskr...@googlegroups.com

Was out of town. Will get back on this.

Shrivathsa B

unread,
May 27, 2016, 3:39:23 AM5/27/16
to vishvAs vAsuki, shubha zero, dhaval patel, sanskr...@googlegroups.com
hariH OM,

   Converted bRhaddAtukusumAkaraH. Don't know the legality of doing so, specially because it has some original content of the compiler. Hence I am apprehensive of uploading anywhere.

   I will send the text files to Vishvas' mail.

svasti,
       JAYA BHAVAANII BHAARATII,
                                                    shrivathsa.
Reply all
Reply to author
Forward
0 new messages