Tamil Transliteration standard + Bamini Help

126 views

Skip to first unread message

Srikanth Lakshmanan

unread,

Jan 23, 2012, 11:58:11 AM1/23/12

to freetamil...@googlegroups.com

Hi,

Is there a common standard for Tamil transliteration keymapping being used across? When I was making the key-mapping for a help[1] I had some thoughts. The transliteration input scheme on Tamil Wikipedia supports doubling of trivial chars.

M gives ம்ம்

V gives வ்வ்

I tried this behavior in ibus, it didn't have this. I wanted to know if there are other input tools which support this and is there any standardization in this? Doubling non trivial chars can reduce number of keystrokes(and makes it little more efficient) especially since there are lot of words like மக்கள், அவ்வாறு,அம்மா

However,

K didnt give க்க்

C did give க்க் out of blue, while every other usage of c is for ச். Is this a bug?

Also, on a different note If you know Bamini input scheme, your help is greatly appreciated here[2]. Its the thing thats stopping us for adding Bamini input method on Tamil Wiki Projects Thanks!

[1] http://www.mediawiki.org/wiki/Help:Extension:Narayam/Tamil/Transliteration

[2] https://bugzilla.wikimedia.org/31904

--
Regards
Srikanth.L

கா. சேது | කා. සේතු | K. Sethu

unread,

Jan 24, 2012, 4:50:49 AM1/24/12

to freetamil...@googlegroups.com

Dear Srikanth

//I tried this behavior in ibus, it didn't have this. //

IBUS is an IME (Input Method Engine) just as SCIM and UIM are. It is not clear by what you said "I tried this behaviour in ibus" but what I guess is that you used a keymap from m17n which is run as backend to IBUS. (m17n can also be run as backend of SCIM or UIM)

In m17n there are two keymaps which can be termed as "romanised phonetic" or more commonly referred to as "transliteration" - they are 1) phonetic 2) itrans.

One standardized scheme known is "ITRANS" but AFAIK the irtrans keymap (made by m17n themselves) departs substantially from Standard ITRANS in some mappings. The original "ITRANS" I recollect is more a "phoentic transcription" type in which for example a key sequence kee would map to கீ and koo to கூ. The itrans keymap in m17n is like the most of the so-called transliteration type in which kee - > கே and koo --> கோ.

Apparently over the years most uers of transliteration types have shown preference for use of repeated vowels in such cases for lengthening (நெடிலாக்கல்) because Tamil users at large seem to prefer avoidance of capital case requiring pressing of Shift.

Strictly there cannot be a fully transliteration type keymaps to Tamil because of wider variability in the phonemes in the language (here English) used for transliteration and also prevalence of phonemes in Tamil not available in the other language. Whatever the type of transliteration used it can be at best called a quasi-transliteration or described as "transliteration to the maximum possible extent.

As regards the use of capital case of a key for a consonant to map to repetion of the consonant, I think it is basically not so sound scheme because you cannot consistently have it for all consonants; the shift keys of some keys are used for different consonants - l for ல but L for ள, n for ன but N for ண etc.

For an user as much uniformity as possible would help to maintain speed. If the facility of capital case yielding repeated consonants is available for some sub-set of consonants only, then an user while typing fast can make mistakes if the user is not all the time conscious of which all characters may not be so obtained (e.g., in a typing ceLam forgetting L is not ல்ல் but is ள )

Also what do you mean by "trivial" characters? While in one statement you mention "doubling of trivial chars" in another line you mention "Doubling non trivial chars" ! Your clarification is needed please.

What is more fundamental criticism I put forward is how do we say that it is more convenient to input say ம்ம் with key M (which is pressing shift key, holding it and pressing m ) than with "mm" . The latter (mm) is two keys sequentially whereas the former M is also *two* keys though simultaneously. In fact it is because that many users preferred to use of two simple case keys in succession that the schemes for long vowels (like ee -> ஏ) are brought in such romanised phonetic keymaps in addition to originally planned capital case (E-> ஏ for the same example)

Regarding support from input tools you asked this facility is mapping of one key to two characters both same and each having a pair of code-points (consonant and pulli). It is not something which would be hindered by any of the IME, be it ibus, scim or uim. Only that one has to include the required mappings into the keymap - here in m17n keymap schemes - phonetic , itrans (of course not in the same but building from them as an alternative).

I will comment further on how this type of keymap embedded Wikipedia pages after testing there.

~Sethu

--
You received this message because you are subscribed to the Google Groups "ThamiZha! - Free Tamil Computing(FTC)" group.
To post to this group, send an email to freetamil...@googlegroups.com.
To unsubscribe from this group, send email to freetamilcomput...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/freetamilcomputing?hl=en-GB.

கா. சேது | කා. සේතු | K. Sethu

unread,

Jan 24, 2012, 7:17:05 AM1/24/12

to freetamil...@googlegroups.com

Srikanth and others

I searched and found ITRANS table for Tamil here : http://www.aczoom.com/itrans/html/tamil/node5.html

My surmise that the standard ITRANS was a phonetic transcription (so that like oo -> ஊ ; ee --> ஈ.. ) is wrong after all.

It was that ITRANS standard table had aa --> ஆ, ii --> ஈ, uu --> ஊ but not so vowel lengthening with ee and oo. This is because North Indic languages have only longer vowels ஏ and ஓ but not shorter எ and ஒ, so for them more than one keystroke was not needed and because ITRANS was to be transliterations between various inidc languages.

~Sethu

2012/1/24 கா. சேது | කා. සේතු | K. Sethu <skh...@gmail.com>

Dear Srikanth
s

Srikanth Lakshmanan

unread,

Jan 24, 2012, 10:38:15 AM1/24/12

to freetamil...@googlegroups.com

Dear Sethu,

Thanks much for the reply giving a detailed background for a newbie like me :).

(Modified the reply order for continuity)

2012/1/24 கா. சேது | කා. සේතු | K. Sethu <skh...@gmail.com>

Dear Srikanth

//I tried this behavior in ibus, it didn't have this. //

IBUS is an IME (Input Method Engine) just as SCIM and UIM are. It is not clear by what you said "I tried this behaviour in ibus" but what I guess is that you used a keymap from m17n which is run as backend to IBUS. (m17n can also be run as backend of SCIM or UIM)

I now realize the mistake in usage of the :) I tried using m17n, phonetic map to be precise.

Strictly there cannot be a fully transliteration type keymaps to Tamil because of wider variability in the phonemes in the language (here English) used for transliteration and also prevalence of phonemes in Tamil not available in the other language. Whatever the type of transliteration used it can be at best called a quasi-transliteration or described as "transliteration to the maximum possible extent.

Totally agree. ந் can be typed only with w, ஃ with q are exceptions in Tamil making it quasi-transliteration.

Also what do you mean by "trivial" characters? While in one statement you mention "doubling of trivial chars" in another line you mention "Doubling non trivial chars" ! Your clarification is needed please.

the shift keys of some keys are used for different consonants - l for ல but L for ள, n for ன but N for ண etc.

Sorry for the confusion, The above are the non-trivial characters where 1 key in English map to 2 characters in Tamil. ( ர் ற், ன் ண், ல், ள்). ழ், ந் and other characters are trivial since they have reserved their own keys(some cases have more than one like ப், க்)

As regards the use of capital case of a key for a consonant to map to repetion of the consonant, I think it is basically not so sound scheme because you cannot consistently have it for all consonants;

I hear this. The doubling is not possible for non-trivial characters ( ர் ற், ன் ண், ல், ள்). But for others, it might help is what I feel(not think, I need data to think,more below)

Apparently over the years most uers of transliteration types have shown preference for use of repeated vowels in such cases for lengthening (நெடிலாக்கல்) because Tamil users at large seem to prefer avoidance of capital case requiring pressing of Shift.

What is more fundamental criticism I put forward is how do we say that it is more convenient to input say ம்ம் with key M (which is pressing shift key, holding it and pressing m ) than with "mm" . The latter (mm) is two keys sequentially whereas the former M is also *two* keys though simultaneously. In fact it is because that many users preferred to use of two simple case keys in succession that the schemes for long vowels (like ee -> ஏ) are brought in such romanised phonetic keymaps in addition to originally planned capital case (E-> ஏ for the same example)

I think the reason for people not preferring shift now for lengthening is because, its used only for few vowels(Uyirezhuthukkal). If the "Upper case" is used for lengthening in vowels and doubling in the case of trivial consonents, the typing patterns *may* vary. And one might not use Shift keys, we could toggle using Caps lock and type faster is what I feel(again not think, some data to help).

For example:-

Tamil - itrans - "Uppercase usage"

அம்மாப்பேட்டை - ammaappeettai - aMAPETai (T is not mapped to ட்ட் in tamil wikipedia, but this could be a possiblity)

The above was the "best case" I could think of. As a layman, I see a lot of ஒற்றுஎழுத்துக்கள் and related consonants double and used in quite a few words (ஒற்றுஎழுத்துக்கள் itself has 3!) and these are the places where doubling helps in reducing number of strokes. This may not be useful for naive users who might just start, but wondering if adding this doubling might help "power" users. People who know enough grammar can tell us the probability of occurance of ஒற்றுஎழுத்துக்கள் and related consonants together AND/OR we could a statistical analysis of words to see if such a thing might help. There are 1000's of people who use transliteration based input schemes and a small portion might be power users, who may find this option useful. Also do people feel the same(okay to type twice) when they use transliteration on touch devices?

For an user as much uniformity as possible would help to maintain speed. If the facility of capital case yielding repeated consonants is available for some sub-set of consonants only, then an user while typing fast can make mistakes if the user is not all the time conscious of which all characters may not be so obtained (e.g., in a typing ceLam forgetting L is not ல்ல் but is ள )

I agree to this, in fact this is the reason why there seem to be more typos coming out of people using existing transliteration scheme. May be effective usage of keys can be found and transliteration should be made in a way it minimizes spelling mistake. This might be more important than the doubling.

Regarding support from input tools you asked this facility is mapping of one key to two characters both same and each having a pair of code-points (consonant and pulli). It is not something which would be hindered by any of the IME, be it ibus, scim or uim. Only that one has to include the required mappings into the keymap - here in m17n keymap schemes - phonetic , itrans (of course not in the same but building from them as an alternative).

Agree, I kind of asked if that need C = க்க் needs be considered as bug in keymap.

I will comment further on how this type of keymap embedded Wikipedia pages after testing there.

Thanks!

~Sethu

Regards,

Srikanth L

PS: Tamil99 might be optimal solution, that doesn't mean transliteration shouldn't improve and all power users need to use only T99 :)

Reply all

Reply to author

Forward

0 new messages