Rashtriya Sanskrit Vidyapeetha (RSVP) encoding to unicode tool : anyone?

178 views
Skip to first unread message

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Nov 15, 2014, 10:20:01 AM11/15/14
to sanskrit-p...@googlegroups.com
Incidentally, I am so upset at Rashtriya Sanskrit Vidyapeetha (RSVP) people for not having the sense to publish these valuable digitized documents using Unicode, if someone were to just point me to a linux command line utility (or write one for me in any of the umpteen programming languages that work easily in Linux), I swear that I will host all documents seen here after converting it to Unicode http://www.wilbourhall.org/sansknet/content.htm

​Any takers?​


2014-11-15 7:13 GMT-08:00 विश्वासो वासुकिजः (Vishvas Vasuki) <vishvas...@gmail.com>:

2014-11-15 1:50 GMT-08:00 Anunad Singh <anu...@gmail.com>:
'DV-TTSurekhEN to Unicode Converter'

It is strange - I tried http://hindi-fonts.com/tools/DV-TTSurekhEN-to-Unicode-Converter , but it doesn't yield the sort-of clean results you showed.​



--
--
Vishvas /विश्वासः




--
--
Vishvas /विश्वासः

Mārcis Gasūns

unread,
Nov 15, 2014, 11:32:48 AM11/15/14
to sanskrit-p...@googlegroups.com
The converter is ready long ago. It was hosted at http://sanskritlibrary.org/tomcat/sl/TranscodeText and you should ask at https://github.com/sanskrit-lexicon/Cologne/issues so Jim can teach you to use it. I have used it before.

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Nov 15, 2014, 5:46:25 PM11/15/14
to sanskrit-p...@googlegroups.com
I would agree that it works if it produced something meaningful out of the below:
1 V��x��|ɾ�i� 2 V�R��P��|ɾ�i� 3 V�R��P��|ɽ�i� 4 {��n�����n�x� 5 E�h]�E���n��x� 6 M�i��M�i� 7 ���i���{ɪ��i� 8 +x��M�i�

But, the tool you pointed to doesnt work here.


--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Nov 15, 2014, 5:48:13 PM11/15/14
to sanskrit-p...@googlegroups.com
In the meantime I've spun some code I found in the tubes ( https://github.com/vvasuki/sanskritnlpPHP/ ) - now if I could figure out how to use it ... ... http://sanskritnlpphp.appspot.com/unicodify/fileconverterindex.php5

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Nov 15, 2014, 6:01:26 PM11/15/14
to sanskrit-p...@googlegroups.com

2014-11-15 14:47 GMT-08:00 विश्वासो वासुकिजः (Vishvas Vasuki) <vishvas...@gmail.com>:
In the meantime I've spun some code I found in the tubes ( https://github.com/vvasuki/sanskritnlpPHP/ ) - now if I could figure out how to use it ... ... http://sanskritnlpphp.appspot.com/unicodify/fileconverterindex.php5

​Seems like it's been mirrored elsewhere already: http://www.innovatrix.co.in/unicode/fileconverterindex.php5  - I still get gibberish " अƶ��उअ�द��� ६,२,१९३ अƶ��उअ��द� ११ अƶ�� - " .​

​I am beginning to wonder if something is wrong with my computers/ setup. Can someone else use the above to convert the first line of http://www.wilbourhall.org/sansknet/vyakaranam/ghanapatha.htm please?​

narayan prasad

unread,
Nov 15, 2014, 8:54:59 PM11/15/14
to sanskrit-p...@googlegroups.com
The first line is: +ƶÉÖ=+ÉnùªÉ& 6,2,193 +ƶÉÖ=+ÉÊnù 11 +ƶÉÖ -
and the converted text is: अंशुउआदय& ६,२,१९३ अंशुउआदि ११ अंशु -

Of course, in this case I used DV-Alankar and not DV-TTYogesh.
There were no user of the latter that time. So I did not go ahead with corrections in the converter, if any.

Bhasha IME

unread,
Nov 16, 2014, 4:18:13 AM11/16/14
to sanskrit-p...@googlegroups.com
Friends,

Many of the sansknet docs use more than 1 font eg. DV-TTYogesh and DV1-TTYogesh. How to recognize? Just copy paste onto word doc. Cursor over the text and the Font field shows the font name.

These multiple fonts use the same encoding (ASCII 0x00 - 0xFF and  few beyond this range). Each char needs to be converted per it's font type.

For this, the text needs to be analyzed in a font preserving format, like HTML, RTF, DOC, DOCX, ODT...

I have done this, to some extent, in BhashaIME, which recognizes RTF. To use, copy from Sanskent (text is available as HTML on clipboard), paste into a word doc, again select-all, copy to clipboard (clipboard now has RTF), and invoke the relevant transliteration (eg. DV-TTYogesh -> Uni Dev)

regards
Venkatesh

anan...@gmail.com

unread,
Dec 15, 2019, 1:19:21 AM12/15/19
to sanskrit-programmers


On Saturday, 15 November 2014 20:50:01 UTC+5:30, विश्वासो वासुकिजः (Vishvas Vasuki) wrote:
Incidentally, I am so upset at Rashtriya Sanskrit Vidyapeetha (RSVP) people for not having the sense to publish these valuable digitized documents using Unicode, if someone were to just point me to a linux command line utility (or write one for me in any of the umpteen programming languages that work easily in Linux), I swear that I will host all documents seen here after converting it to Unicode http://www.wilbourhall.org/sansknet/content.htm

​Any takers?​

Yes I wish to have the files in unicode....This is your encoded text @
 
.    xŠÉÉŠÉŪúixÉĻÉÉąÉÉ*

  

   +ÉxÉxnųĻÉĻÉÞiÉĻÉ YÉÉxÉĻÉVÉÆ šÉÉĘIÉhÉĻÉŌ·ÉŪúĻÉ*

   ĨÉĀ šÉīÉĮĻÉšÉīÉĮ\SÉ1 īÉxnäų näųīÉÆ ―þËŪú |ÉĶÉÖĻÉÆ** 1**

 

......When changed to unicode: नाात्नाााा।

  

   आनन्दााृता ज्ञानाजं ााक्षणाश्वा।

   ा ाााााञ्च१ ान्दे देां िं प्राुां।। १।।

 

   ाााांाऽणााांूतः कुाााााना कृतः।.... How can I have the file properly encoded to use in windows/word? 

Dhaval Patel

unread,
Dec 15, 2019, 1:30:38 AM12/15/19
to sanskrit-p...@googlegroups.com
The convertor I have seems to be OK.
Please find attached the convertor.
surekh_unicode_converter.html

anan...@gmail.com

unread,
Dec 16, 2019, 8:30:19 AM12/16/19
to sanskrit-programmers
Yes it seems ok with the DV-TT Surekh font. But not works with 'sansknet font' 
  न्ŠााŠाŪत्नĻााąाा।
  
   आनन्दųĻाĻाृतĻा ज्ञानĻाजं šााĘक्षणĻाŌश्वŪĻा।
   ĨाĀ šाīाĮĻाšाīाĮञ्च1 īान्देų देųīां ―Ūिं प्रĶाुĻां।। 1।।

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
May 15, 2020, 8:25:11 AM5/15/20
to sanskrit-programmers, dhaval patel, Bhasha IME, Ajit अजितकृष्णसूनुः होता
Now I am wondering how to convert https://www.wilbourhall.org/sansknet/vedas/rgveda/rgveda_khilasukta.htm . Never seen Rgveda khilasUkta-s digitized elsewhere on the net, so this will be very useful. If someone can convert and send that's great too!

PS: Don't have windows - so no bhAShAIME here..

On Sun, Dec 15, 2019 at 12:00 PM Dhaval Patel <drdhav...@gmail.com> wrote:
The convertor I have seems to be OK.
Please find attached the convertor.

--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
May 15, 2020, 8:35:15 AM5/15/20
to sanskrit-programmers, dhaval patel, Bhasha IME, Ajit अजितकृष्णसूनुः होता, Anunad Singh
On Fri, May 15, 2020 at 5:54 PM विश्वासो वासुकिजः (Vishvas Vasuki) <vishvas...@gmail.com> wrote:
Now I am wondering how to convert https://www.wilbourhall.org/sansknet/vedas/rgveda/rgveda_khilasukta.htm . Never seen Rgveda khilasUkta-s digitized elsewhere on the net, so this will be very useful. If someone can convert and send that's great too!


In case it helps: instead of आदित्ये॑न॒ सही॑यसा, I get आदिचत्येनग सचहीयगसा । with the converter attached. Seems that the svaras are misrecognized. + anunAd - have a fix?

 
PS: Don't have windows - so no bhAShAIME here..

On Sun, Dec 15, 2019 at 12:00 PM Dhaval Patel <drdhav...@gmail.com> wrote:
The convertor I have seems to be OK.
Please find attached the convertor.

--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sanskrit-programmers/CADSGPzUkLBxPNRVRVBYW4avCr_yDcZw0NNZoCCV6sYNDO7Kcww%40mail.gmail.com.


--
--
Vishvas /विश्वासः

DV-TTVedicNormal ==_ यूनिकोड परिवर्तित्र.html

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
May 15, 2020, 8:38:24 AM5/15/20
to sanskrit-programmers, dhaval patel, Bhasha IME, Ajit अजितकृष्णसूनुः होता, Anunad Singh
On Fri, May 15, 2020 at 6:05 PM विश्वासो वासुकिजः (Vishvas Vasuki) <vishvas...@gmail.com> wrote:


On Fri, May 15, 2020 at 5:54 PM विश्वासो वासुकिजः (Vishvas Vasuki) <vishvas...@gmail.com> wrote:
Now I am wondering how to convert https://www.wilbourhall.org/sansknet/vedas/rgveda/rgveda_khilasukta.htm . Never seen Rgveda khilasUkta-s digitized elsewhere on the net, so this will be very useful. If someone can convert and send that's great too!


In case it helps: instead of आदित्ये॑न॒ सही॑यसा, I get आदिचत्येनग सचहीयगसा । with the converter attached. Seems that the svaras are misrecognized. + anunAd - have a fix?

Slight correction about expected svara-s - see https://archive.org/stream/RgVedaWithSayanasCommentaryPart4/rv_sayanabhasya_part4#page/n1021/mode/2up . It seems that only udAtta is marked with a svarita-sign( ॑), with no other svaras.

 

 
PS: Don't have windows - so no bhAShAIME here..

On Sun, Dec 15, 2019 at 12:00 PM Dhaval Patel <drdhav...@gmail.com> wrote:
The convertor I have seems to be OK.
Please find attached the convertor.

--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sanskrit-programmers/CADSGPzUkLBxPNRVRVBYW4avCr_yDcZw0NNZoCCV6sYNDO7Kcww%40mail.gmail.com.


--
--
Vishvas /विश्वासः



--
--
Vishvas /विश्वासः

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
May 15, 2020, 8:45:38 AM5/15/20
to sanskrit-programmers, dhaval patel, Bhasha IME, Ajit अजितकृष्णसूनुः होता, Anunad Singh
On Fri, May 15, 2020 at 6:08 PM विश्वासो वासुकिजः (Vishvas Vasuki) <vishvas...@gmail.com> wrote:


On Fri, May 15, 2020 at 6:05 PM विश्वासो वासुकिजः (Vishvas Vasuki) <vishvas...@gmail.com> wrote:


On Fri, May 15, 2020 at 5:54 PM विश्वासो वासुकिजः (Vishvas Vasuki) <vishvas...@gmail.com> wrote:
Now I am wondering how to convert https://www.wilbourhall.org/sansknet/vedas/rgveda/rgveda_khilasukta.htm . Never seen Rgveda khilasUkta-s digitized elsewhere on the net, so this will be very useful. If someone can convert and send that's great too!


In case it helps: instead of आदित्ये॑न॒ सही॑यसा, I get आदिचत्येनग सचहीयगसा । with the converter attached. Seems that the svaras are misrecognized. + anunAd - have a fix?

Slight correction about expected svara-s - see https://archive.org/stream/RgVedaWithSayanasCommentaryPart4/rv_sayanabhasya_part4#page/n1021/mode/2up . It seems that only udAtta is marked with a svarita-sign( ॑), with no other svaras.


Another correction - It seems like the sansknet text follows the usual svara-marking system, so that what is transliterated as "आदिचत्येनग सचहीयगसा" is to be actually transliterated as "आदि॒त्येन॑ स॒हीय॑सा"। I hope this makes fixing the converter easier ...


 

 

 
PS: Don't have windows - so no bhAShAIME here..

On Sun, Dec 15, 2019 at 12:00 PM Dhaval Patel <drdhav...@gmail.com> wrote:
The convertor I have seems to be OK.
Please find attached the convertor.

--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sanskrit-programmers/CADSGPzUkLBxPNRVRVBYW4avCr_yDcZw0NNZoCCV6sYNDO7Kcww%40mail.gmail.com.


--
--
Vishvas /विश्वासः



--
--
Vishvas /विश्वासः



--
--
Vishvas /विश्वासः

prati...@gmail.com

unread,
May 15, 2020, 10:54:47 AM5/15/20
to sanskrit-p...@googlegroups.com, dhaval patel, Bhasha IME, Ajit अजितकृष्णसूनुः होता, Anunad Singh
Here's a quick (and not quite perfect) python script for conversion. Copy/paste the entire text from rgveda_khilasukta.htm into input text file. Attached sample output.


Script depends on this library: https://github.com/sushant354/indic2unicode 

Errors pointed below appear to be present in this approach too. Python is bit more hackable, hoping we can try to debug and fix :) Also if there's enough interest, this library can be a good candidate for inclusion in https://github.com/sanskrit-coders org.
out.txt

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
May 15, 2020, 11:35:00 PM5/15/20
to sanskrit-programmers, Arun Mahapatra, dhaval patel, Bhasha IME, Ajit अजितकृष्णसूनुः होता, Anunad Singh
Thanks, Arun! Svara-s are still missing though - can you fix it and contribute the code under this dir: font_converter  ? There's already a python wrapper around the tech_hindi html/js code there; but this native code will still be useful. Of course, this is very useful (there is more than "enough interest") - can just look at the sansknet threads on this mailing list.

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Nov 26, 2020, 10:42:11 AM11/26/20
to sanskrit-programmers, Arun Mahapatra, dhaval patel, Bhasha IME, Ajit अजितकृष्णसूनुः होता, Anunad Singh
There is some promising progress -  see shrI Arun's thread here - https://github.com/sanskrit-coders/indic_transliteration/issues/38
Reply all
Reply to author
Forward
0 new messages