Minimalistic transliateration script

Vasu Srinivasan

unread,

Aug 24, 2011, 1:38:44 PM8/24/11

to sanskrit-p...@googlegroups.com

This is not intended to be an important topic, but I always felt existing trans schemes try too hard. Too many usages of symbols (tilde, dot, apos, power symbol etc.), caps, double caps (RRu etc).

Parsing of course is extra work for such schemes, and I was thinking about a simpler script. Again, this is not intended for replacing any existing schemes, nor Im too serious about it. (i have not considered candrabindu for eg) But just to show a very minimalistic latin script can be worked for devanagari letters.

a A e E u U r. R. l. E. i o O. a. a:

k k. g g. n
c c.j j. n.
t t. d d. N
T T. D D. N.
p p. b b. m
y r l v s s. S h

It uses only 3 conventions

1. small letters
2. cap letters
3. a dot after the letters. (except visarga which is semi-colon)

At any given point of time, a devanagari letter is ONE char or ONE char + dot. (You can see e, E are as you pronounce e, i is as you pronounce i). The order is per krama of devanagari letters (first small, then small + dot, then cap, then cap + dot or caps for dIrgha). It may be hard read at first, but didnt we all start with guessing what a dot, tilde, apos etc means on top of the letters :-). There are never two letters together for the same akshara. Parsing or converting to unicode etc. becomes superficially easy.

Comments?

--
Regards,
Vasu Srinivasan
-----------------------------------
vagartham.blogspot.com
vasya10.wordpress.com

vishvAs vAsuki

unread,

Aug 24, 2011, 2:02:32 PM8/24/11

to sanskrit-p...@googlegroups.com

Transliteration scheme एतत् समीचीनं एव दृश्यते, परन्तु तदुपयोगात् parsing सौलभ्ये महत्व-पूर्ण-वर्धनं भवति इति स्पष्ठः नास्ति - finite automaton उपयोगः सुलभः एव, खलु। परन्तु parsing-दृष्ट्या transliteration scheme कल्पना (कल्पनात्मक-दृष्ट्या) आकर्शका अस्ति एव।

--
vishvAs

vishvAs vAsuki

unread,

Aug 24, 2011, 2:12:40 PM8/24/11

to sanskrit-p...@googlegroups.com

अत्रापि k. स्थाने .k इत्यादयः सङ्केताः अभविष्यन् चेत् parsing दृष्ट्या सुलभतरं स्यात्।
--
vishvAs

Vasu Srinivasan

unread,

Aug 24, 2011, 2:16:36 PM8/24/11

to sanskrit-p...@googlegroups.com

aam yes, it will be easier, i thought about it too, but reading becomes a bit more difficult, than what it is already. (admittedly this scheme is not a natural read, partly because of our own existing impressions)

b.avaTa: nAma kiM?
.bavaTa: nAma kiM?

kind-of trade-off whether to check if the i+1 letter is dot or if i+2 letter is dot.

Vasu Srinivasan

unread,

Aug 24, 2011, 2:28:04 PM8/24/11

to sanskrit-p...@googlegroups.com

it will be easy in the sense the cyclomatic complexity will be lesser (less if conditions etc).

For eg to convert this text to "unicode", all you have to do is check if the letter is followed by a dot and that's it. Essentially only one if condition. Then append the corresponding unicode from a collection/map.

compare to s/h/sh, R/u/Ru/RRu, ^ ~ etc. Also there is 1:1 uniform mapping to letters as opposed to multiples (eg in baraha, aa vs A, but not uu vs U) - which is (i agree) easier to write, but qtn is are multiple options really necessary? In native scripts, we don't substitue one letter for another, but why do so far transliteration? --- im glad we don't have "Transliteration Spelling Bees" :-0)

Viswanath B

unread,

Aug 25, 2011, 9:48:17 AM8/25/11

to sanskrit-p...@googlegroups.com

So far, I use baraha, since it is easy for me to type-in the vedic swara symbols. I find most transliteration schemes don't care about those, unfortunately.

Can we add it as well ?

vissu

On Wed, Aug 24, 2011 at 11:58 PM, Vasu Srinivasan <vas...@gmail.com> wrote:

it will be easy in the sense the cyclomatic complexity will be lesser (less if conditions etc).

For eg to convert this text to "unicode", all you have to do is check if the letter is followed by a dot and that's it. Essentially only one if condition. Then append the corresponding unicode from a collection/map.

compare to s/h/sh, R/u/Ru/RRu, ^ ~ etc. Also there is 1:1 uniform mapping to letters as opposed to multiples (eg in baraha, aa vs A, but not uu vs U) - which is (i agree) easier to write, but qtn is are multiple options really necessary? In native scripts, we don't substitue one letter for another, but why do so far transliteration? --- im glad we don't have "Transliteration Spelling Bees" :-0)

On Wed, Aug 24, 2011 at 1:02 PM, vishvAs vAsuki <vishvas...@gmail.com> wrote:

Transliteration scheme एतत् समीचीनं एव दृश्यते, परन्तु तदुपयोगात् parsing सौलभ्ये महत्व-पूर्ण-वर्धनं भवति इति स्पष्ठः नास्ति - finite automaton उपयोगः सुलभः एव, खलु। परन्तु parsing-दृष्ट्या transliteration scheme कल्पना (कल्पनात्मक-दृष्ट्या) आकर्शका अस्ति एव।

--
cavishvAs

vishvAs vAsuki

unread,

Aug 25, 2011, 10:43:29 AM8/25/11

to sanskrit-p...@googlegroups.com

namaste shrI-vishvanAth,

Not sure what you mean by "add it", but you are welcome to contribute a transliterator (and point us to a baraha transliteration map) if you find it useful. (The conversation so far has been an exercise of imagination.)

--
vishvAs

Viswanath B

unread,

Aug 25, 2011, 10:48:51 AM8/25/11

to sanskrit-p...@googlegroups.com

I meant to add to the proposal by shrI Vasu Srinivasan. I am trying to see additional stuff that we could add to Vasu's proposal, so that it can be more complete.

I am trying to create a 'free' Editing component that be a RTF editor for sanskrit, to start with, which can understand these various transliteration schemes. I was happy with Baraha until the time it became a paid version.

I am basically an embedded programmer, with Java experience. I am planning to learn scala now. Thats my intro, so that you guys are familiar with my background. :)

Viswanath

vishvAs vAsuki

unread,

Aug 25, 2011, 10:58:30 AM8/25/11

to sanskrit-p...@googlegroups.com

Hearty welcome to you shrI-vishvanAth! (and Scala is indeed a good language to learn and use ;-)

Incidentally, if you were using Linux, you can simply use ibus to type devanAgarI into any window/ widget/ editor which accepts text input! (By default it comes with an itrans table, but it is very simple to modify it to suit one's liking.) I routinely use it to type kannaDa and dEvanAgarI.

--

vishvAs

Vasu Srinivasan

unread,

Aug 25, 2011, 11:18:55 AM8/25/11

to sanskrit-p...@googlegroups.com

namaste Viswanath,

If you are embedded programmer, you are already half-paninian,as you have to write economically consciously :-). Yeah this is a good opportunity to learn scala, not just for language, but for its finer capabilities....

As for as adding vedic accents to this scheme, not sure - because as you said there is probably more to it than just the 3 udatta, anudatta, svarita.. (i dont know)

i was not even serious with this scheme as I was only focusing to make my parsing easier (i hate complex text parsing)... but if you think it is useful you can contribute to it as well.. as vishvas said, the scheme is mostly in imagination without any practice. it needs critical analysis and acceptance to become useful.

Viswanath B

unread,

Aug 25, 2011, 11:44:13 AM8/25/11

to sanskrit-p...@googlegroups.com

Gone are the days when I needed to be economical, these days I write stuff thats supposed to work on a multicore (16-core, yes sixteen) with some times a GB of memory.

viswanath

shreevatsa

unread,

Aug 31, 2011, 12:14:59 AM8/31/11

to sanskrit-programmers

[Apologies in advance for the very long rant that follows. My
intention is not to offend or be dismissive, but just to make the
point that software must be designed for humans, not computers.]

On Aug 24, 10:38 pm, Vasu Srinivasan <vasy...@gmail.com> wrote:
> This is not intended to be an important topic, but I always felt existing
> trans schemes try too hard. Too many usages of symbols (tilde, dot, apos,
> power symbol etc.), caps, double caps (RRu etc).
>
> Parsing of course is extra work for such schemes, and I was thinking about a
> simpler script. Again, this is not intended for replacing any existing
> schemes, nor Im too serious about it. (i have not considered candrabindu for
> eg) But just to show a very minimalistic latin script can be worked for
> devanagari letters.
>
> a A e E u U r. R. l. E. i o O. a. a:
>
> k k. g g. n
> c c.j j. n.
> t t. d d. N
> T T. D D. N.
> p p. b b. m
> y r l v s s. S h

Every transliteration scheme is a trade-off between ease of typing,
ease of reading, and (lack of) ambiguity. I think we should focus on
the humans; computers can transliterate between scripts easily enough.
That is what they are good at. We should pick our conventions to make
things easier for the humans, not easier for the computers.

Just for comparison, here are the other famous transliteration
schemes:

1. Harvard-Kyoto, which is an informal standard among scholars for
ASCII transliteration:
a A i I u U R RR lR e ai o au aM aH
k kh g gh G
c ch j jh J
T Th D Dh N
t th d dh n
p ph b bh m
y r l v z S s h

2. The Velthuis scheme, which used to be used especially for TeX
input:
a aa i ii u uu .r rr .l e ai o au a.m a.h
k kh g gh "n
c ch j jh ~n
.t th .d .dh .n
t th d dh n
p ph b bh m
y r l v s .s s h

3. IAST, which is the standard in all printed books.
a ā i ī u ū ṛ ṝ ḷ e ai o au aṃ aḥ
k kh g gh ṅ
c ch j jh ñ
ṭ ṭh ḍ ḍh ṇ
t th d dh n
p ph b bh m
y r l v ś ṣ s h

4. ITRANS, which was a Indian software program, but whose conventions
are now used by others in India too:
a aa/A i ii/I u uu/U RRi/R^i RRI/R^I LLI/L^i e ai o au aM aH
k kh g gh ~N/N^
ch Ch j jh ~n/JN
T Th D Dh N
t th d dh n
p ph b bh m
y r l v/w sh Sh s h

Which one to use is always a matter of personal preference. I
personally prefer IAST and I have a keyboard layout that makes it easy
to type (EasyUnicode on Mac OS X), particularly because it's the
standard and is well understood in the scholarly community. In my
opinion, all the other formats (including yours, the one that started
this thread) are best suited for *input* to be converted into
Devanagari, rather than inflicting them on others and expecting them
to read it and guess what they mean.

For example, I certainly wouldn't have guessed from your format that e
and E actually meant इ and ई rather than ए. Similarly with using i not
for इ but for ऐ. I know you are going based on the *name* of the
letters than on their usual sounds, but this not intuitive to
readers... and using O. for औ is even less so. Similarly, with Harvard-
Kyoto, even though it's very easy to remember once you learn it -- for
the last letter (anunasika) in ka and ca varga you use the third
letter, and use z for श, that's the main thing -- I *definitely*
wouldn't expect others to read it.

There are too many transliteration formats already, and only IAST is
close to being standard (and that too only among scholars; it is also
confusing for people seeing it for the first time) -- IMHO we simply
should NOT display Sanskrit text in more formats and expect readers to
learn them, unless it is extremely intuitive. Experience tells me they
won't bother to look up the convention, and will instead try to muddle
along and guess (incorrectly) what is written.

To substantiate my earlier remark that all transliteration schemes are
easy for computers: I wrote my transliterator about a couple of years
ago in one afternoon+evening ( http://shreevatsa.appspot.com/sanskrit/transliterate.html
) and that includes an hour or two it took to learn JavaScript. And
the same code accommodates transliteration for other Indian languages
as well. A lot of people have similarly written transliterators; it
shows that existing schemes are not too hard to program for.

Among Indians, only Devanagari is close to being common for Sanskrit
-- and different regions use their own scripts too. (A lot of Sanskrit
books in Karnataka are written in the Kannada script.) Among non-
Indians learning Sanskrit, only IAST is anywhere close to being
common. Ideally, we should design websites and software that allow the
reader to pick whichever script he or she is most comfortable with.
(For example, see the "translipi" in the left sidebar on http://sahityam.net/
and http://stotrasamhita.net/ — this is the kind of thing I mean,
though this particular translipi has a few bugs.)

-Shreevatsa

vishvAs vAsuki

unread,

Aug 31, 2011, 8:40:56 AM8/31/11

to sanskrit-p...@googlegroups.com

नमस्ते श्री-श्रीवत्स, भवतः पत्रं च script - दवौ अपि बहु ज्ञानदौ स्तः। एतावत् तु अहं harvard-kyoto किं इति सम्यक् दृष्टवान् न आसम्। अग्रे संपूरणाय काश्चन सूचनाः -

१] यथा‌ श्री विश्वनाथेन सूचितं - वैदिक-स्वरान् संयोजयेत् (प्रायः transliteration schemes यथोचितं वर्धयित्वा।)

२] ॐ‌-काराय ITRANS अनुसारं 'OM' इति सङ्केतः - तदपि संयोजयेत्।

३] ಕನ್ನಡ-ಲಿಪಿಗೂ ಒನ್ದು ಪಟ್ಟಿ ಸೇರಿಸಬಹುದು.

--
vishvAs

vishvAs vAsuki

unread,

Aug 31, 2011, 9:13:50 AM8/31/11

to sanskrit-p...@googlegroups.com

अन्यत् च इदानीं एव स्तोत्रसंहितां च साहित्यं दृष्टवान् - उपायः अद्भुतः अस्ति!

--
vishvAs

Ramakrishna Upadrasta

unread,

Sep 2, 2011, 2:34:46 PM9/2/11

to sanskrit-p...@googlegroups.com, Bhaskar

namaste members,

Very nice to see lot of ideas and discussions being floated.

I think scala is a good choice. I know that Prof. Huet's ZEN toolkit
is based on Ocaml (and hence, it is all French school languages
winning :) Did someone have a look at his tagger? It is said to parse
baala-raamayaNa and do sandhi-vichcheda, as he told me long time back.
I did not have time to pursue.

On the subject of transliteration, I was about to as if writing
programs that do conversion from devanaagari to other indic languages
(and in the reverse direction) is in the roadmap, but it seems Shri
Shreevatsa's program already does that. Very nice to know. Is there
some plan to add it into the code base?

I am asking this because one of my friend (Shri Bhasar in CC) is
working on making a Telugu PDFs of the sanskrit docs files and had
needs some help in this aspect.
http://sanskritdocuments.org/telugu/

Regards
Ramakrishna

vishvAs vAsuki

unread,

Sep 2, 2011, 4:53:17 PM9/2/11

to sanskrit-p...@googlegroups.com, Bhaskar

नमस्ते श्री-रामकृष्ण!

१. एतावत् OCAML वा श्रीहुएट्-वर्यस्य tagger न अधीतौ। भविष्ये वीक्षितव्यं, प्रथमं तु अन्येषु धेयेषु उत्साहः अस्ति।

२. It was in my mind to write a simple devanAgarI to other indic languages converter - which is much simpler than transliteration from ITRANS etc. because of 1 to 1 mapping between symbols. For example, someone recently wanted RRigvEda in kannaDa-lipi, but were put off by 'complications'. I have written a sample indic transliterator - (See Changeset). So, if we could get a similar table for telugu, we could add it.

--
vishvAs

Vasu Srinivasan

unread,

Sep 4, 2011, 8:29:56 PM9/4/11

to sanskrit-p...@googlegroups.com

shreevatsa-varya

No need of apologies and objective criticisms are welcome.

Why another scheme, I have already put forward my thoughts at http://vagartham.blogspot.com/2011/08/lost-in-transliteration.html, so I wont repeat here.

The online transliterator between scripts is pretty good.

The stotram wiki is very good, definitely useful for people with needs of different scripts. Wishing it grow more. I may contribute some what I had created before.

vishvAs vAsuki

unread,

Sep 5, 2011, 10:13:32 AM9/5/11

to sanskrit-p...@googlegroups.com, Bhaskar

namste!

I was just made aware of another utility (V), which does the same job, supports many more scripts including telugu.

--
vishvAs

Mārcis Gasūns

unread,

Oct 24, 2013, 8:24:08 AM10/24/13

to sanskrit-p...@googlegroups.com, Bhaskar

See http://samskrtam.ru/devanagari-vba-converter/ for one more converter solution.

ken p

unread,

Oct 26, 2013, 2:12:39 PM10/26/13

to sanskrit-p...@googlegroups.com

hello,

(See Changeset)..........This link does not open file.Please re post the link.Thanks.

शुक्रवार, 2 सितम्बर 2011 3:53:17 pm UTC-5 को, विश्वासो वासुकिजः ने लिखा:

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,

Oct 26, 2013, 2:18:08 PM10/26/13

to sanskrit-p...@googlegroups.com

On Sat, Oct 26, 2013 at 11:12 AM, ken p <drk...@gmail.com> wrote:

(See Changeset)..........This link does not open file.Please re post the link.Thanks.

It is here. But I myself don't use it anymore - there are superior solutions available (eg: learnsanskrit.org/tools/sanscript ).

--
--
Vishvas /विश्वासः

ken p

unread,

Oct 26, 2013, 3:39:23 PM10/26/13

to sanskrit-p...@googlegroups.com

This is the only solution I see for all Indian Languages to revive Brahmi Script.

Please create keyboard where you can apply matras on Roman letters for educational purpose.May be letter/matras fonts can be modified in future for better application.

https://groups.google.com/forum/?hl=hi&fromgroups=#!topic/sanskrit-programmers/qrU7MsUFgmk

We need to minimize use of Nukta,chandrabindu,anuswar as much as we can for easy Roman transliteration.If word has no other meaning and retains pronunciation then why not spell it easy way?

See the nukta letters from Persian,Arabic here used in loanwords.Do we use these letters in writing English word pronunciation ?Do Urdu language use borrowed letters?

http://www.omniglot.com/writing/hindi.htm

In IAST scheme all characters don't get converted in copy and paste application.

I like this keyboard for modified IAST...where ā , ē ,ī ,ō, ū =aa,ae,ii,au,uu where characters are uniform with macron.

http://maori.typeit.org/

for additional characters one may use this keyboard

http://www.lexilogos.com/clavier/goujarati_latin.htm

One may learn all Indian regional languages in their state script via standard script converter!

We need this type of standard dictionary for all regional languages.

http://www.kdictionaries-online.com/

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,

Oct 26, 2013, 3:50:58 PM10/26/13

to sanskrit-p...@googlegroups.com

On Sat, Oct 26, 2013 at 12:39 PM, ken p <drk...@gmail.com> wrote:

We need to minimize use of Nukta,chandrabindu,anuswar as much as we can for easy Roman transliteration.If word has no other meaning and retains pronunciation then why not spell it easy way?

Pronunciation is not retained. Indic scripts are mostly "what you see is what you say".

Mārcis Gasūns

unread,

Oct 26, 2013, 4:18:40 PM10/26/13

to sanskrit-p...@googlegroups.com

On Saturday, 26 October 2013 23:39:23 UTC+4, ken p wrote:

I like this keyboard for modified IAST...where ā , ē ,ī ,ō, ū =aa,ae,ii,au,uu where characters are uniform

Whenever I see it I cry. It's better than dropping it away, but there is nothing better than IAST yet proposed.

ken p

unread,

Oct 26, 2013, 6:09:18 PM10/26/13

to sanskrit-p...@googlegroups.com

Hello,

"what you see is what you say"..........That is what pundits say but do they read and speak and teach correctly?

mother....मां..माँ, yes...हां,हाँ..............via Google

mAM..mA.N, hAM,hA.N ..............ITRANS

māṁ..mām̐, yes...hāṁ,hām̐......IAST

mAM..mA~, hAM,hA~ ...............HK

māṁ..mām̐, hāṁ,hām̐ ............ISO

maa-n/maa*n haa-n/haa*n...........Prefered

ॅ ,ॉ.....These missing sounds existed in local dialects but pundits didn't identify them until lately.

http://www.omniglot.com/writing/sanskrit.htm

Just borrowing words from regional languages Roman transliteration can be improved.If words can be borrowed from Urdu then why not from regional languages?

mother / அம்மா/Am'mā (tamil)/ મા/Mā(Gujarati)

One may visit this active site where no one agrees on sounds or symbols in English words pronunciations.

http://groups.yahoo.com/neo/groups/saundspel/conversations/topics

Thanks.

Usha Sanka

unread,

Oct 27, 2013, 2:47:05 AM10/27/13

to sanskrit-p...@googlegroups.com

Namaste.

I agree. "what you see is what you say" is not always the case.

Mispronouncing is also seen with some words.. not only from computational point of view..

Like famous "brahma"- It is always pronounced as "bramha". (m--h interchange)

Hindi native speakers say- "chinham" for "chihnaM" and "vanhi" for "vahni" (n--h interchange)

There are others as well.

-Regards

Usha

--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,

Oct 27, 2013, 12:23:17 PM10/27/13

to sanskrit-p...@googlegroups.com

On Sat, Oct 26, 2013 at 11:47 PM, Usha Sanka <usha....@gmail.com> wrote:

Like famous "brahma"- It is always pronounced as "bramha". (m--h interchange)
Hindi native speakers say- "chinham" for "chihnaM" and "vanhi" for "vahni" (n--h interchange)

आधुनिकास् तथा वदन्ति, केचन पण्डिता अपि। विसर्गम् हकारयन्ति। परन्तु न तथासीद् भाषा पाणिनेः काले इति मन्ये।

Reply all

Reply to author

Forward