Issues with Google Urdu transliteration input tool

227 views
Skip to first unread message

EmKay

unread,
Aug 23, 2010, 2:16:04 PM8/23/10
to Google India Labs

I recently discovered this group. I am probably late to this party,
but I recently tried using the desktop version of Urdu transliteration
input tool on WinXP , and discovered a number of issues, and below is
a brief documentation of them. [ The "support" address listed for the
tool in its Help pages uses a "+" sign in email address, and most
mailers do not allow sending email to it] - and not sure how one can
discuss in an interactive manner with such a blanket feedback email
address]

The most recent message in this forum seems to be from 2009, so not
sure how active or useful it is - or if anyone at Google reads/
responds on it - but I am glad to find it if it can be a forum to
interact. Also note that for Urdu - the desktop version is more useful
than the Web version because of the need for most people to use
Nastaliq fonts which are supported neither natively by Google nor by
Microsoft on its platforms. [ Not complaining though - I am extremely
happy to see this work done - despite the shortcomings and a great fan
of the tool! ]. The details of issues (bugs/problems/suggestions) are
below. Will be happy to answer any further questions. [ some haven
been discussed before, but not others ]


---------------------------------
Issue #1: Numbers typed are entered in the Unicode Arabic Number Range
066* rather than Unicode Range for 06F* (designed for Farsi and Indic
languages)

The Google Farsi Transliteration Input too does this RIGHT and enters
them correctly (and differently from Google Arabic Transliteration
Input tool).

Unicode Code Page for Arabic at http://unicode.org/charts/PDF/U0600.pdf
says:

On top of the Arabic range 066* it says:

“Arabic-Indic digits
These digits are used with Arabic proper; for languages of Iran,
Pakistan and India, see Eastern Arabic-Indic digits at 06F0-06F9.”

This problem is especially ACUTE because many Nastaliq fonts used for
Urdu and the many popular ones DO NOT have glyphs for the Arabic
number range 066*. E.g. Nafees Nastaliq and Jameel Noori Nastaliq
fonts do not have Arabic number range - typing results in blanks!
[ Faiz Lahori Nastaliq supports both ranges but shows Urdu/Farsi
glyphs in both ranges – so it seems fine – but only because the font
has bug/mis-feature – it should show Arabic number glyphs for Arabic
range but shows Urdu/Farsi instead.].

One workaround that gets around this issue is the also install Google
Farsi transliteration input tools and switch to Farsi when typing
numbers and then switch back to Urdu!

-----------------------------------------------------------------------------

Issue #2: Floating Pad to type has WRONG/CONFUSING images for
diacritics

The Unicode documents show diacritics on a letter with a dotted circle
and the glyph for the diacritics above/below/left/right of the dotted
circle as the case may be. The Google Urdu transliteration input tools
shows ALL diacritics to the left of the circle. As are result the
diacritic Urdu Zabar (=Arabic Fatha: Unicode 064E) and Urdu Zer
(=Arabic Damma: Unicode 064F) look the same – and other diacritics are
confusing too as the ones that go above/below are all show to the LEFT
side of the dotted-circle (my guess is it is NOT accidentally rotated
image, but a mis-feature by design! Diacritic images shown to the left
of the glyph – regardless of the actual position of the diacritic).

[ The (mis)information being said on some Urdu newsgroups is that one
cannot type a zer etc., see also Issue # 3 and 5 below ].

--------------------------------------------------------------------------------------

Issue #3: How to type Diacritics ?

It seems that it is not possible to type diacritics inline with text
input. The only way I could do it was to navigate into the joined word
after typing it and then selecting the diacritic on the pad on the
floating icon (and because of Issue #2, only because I knew the
Unicode number). Diacritic typing is ESSENTIAL with the need to
disambiguate certain words (even if not typing with full diacritics)
and for the all important IZAFAT (same diacritic as Zer/Kasra but used
on last letter of word) construct in Perso-Arabic script.

------------------------------------------------------------------------------------------


Issue #4: Attempting to type “Barree Yeh with Hamza” (Unicode 06D3 or
equivalent) results in incorrect Unicode with almost identical glyph
but wrong codes in text !

Urdu language very commonly uses the letter “Barree Yeh with Hamza
diacritic”. The Unicode Arabic code is 06D3 and the accompanying text
for the character says:

“Actually a ligature, not an independent letter = 06D2 (Barri Yeh) and
0654 (Hamza diacritic).”

So, two possible RIGHT answers for entering it are either Unicode 06D3
OR Unicode 06D2+Unicode 0654.

Attempts to type it however result in something that (seemingly) seems
influenced from rules for Arabic language input (which is DIFFERENT
from Urdu and does not use Barree Yeh with Hamza). It types in 0636
(Chotee yeh with Hamza –) with the glyph in its medial form followed
by 06D2 (Barri Yeh – without any adornment). That results in a
VISUALLY DECEIVING Glyph which looks like “Barree Yeh with Hamza” and
fools normal users but on inspection of Unicode text inside is the
WRONG thing to do for Urdu.

Note also that "Barree Yeh with Hamza" is VERY COMMON in urdu words
E.g. words- which rendered in Devanagari are approx. like आए and गए
[ I said approx. because Modern Standard Hindi sometimes spells these
words with य but urdu spelling is with the troublesome character
always. ]

------------------------------------------------

Issue #5 (Suggestion): The typing pad from the floating tool for
typing diacritics etc. should have NAMES – in addition to Unicode
numbers (and unfortunately made worse by Issue #2 above). Since a
larger image floats of Hovering over a selection, I guess it should be
possible to add the names at least in the Larger image. Often the
names for the diacritics are LANGUAGE specific – in this case
different between Arabic (which Unicode documents use) and the one in
the language. Examples are these are Zabar/Zer/Pesh/Tashdid etc. in
Urdu vs the Fatha/Damma/Kasra/Shaddah etc in Arabic.
This will make the type pad usable by normal human beings (rather
than just techies well versed in Unicode).

----------------------------------------

There are other issues too - but any progress on these 5 would be a
good start - before I report more.

Thanks for reading.

EmKay

unread,
Aug 23, 2010, 4:01:28 PM8/23/10
to Google India Labs


On Aug 23, 11:16 am, EmKay <mukeshkac...@yahoo.com> wrote:

>
> from Urdu and does not use Barree Yeh with Hamza). It types in 0636
> (Chotee yeh with Hamza –)  with the glyph in its medial form followed
> by 06D2 (Barri Yeh – without any adornment). That results in a


Minor typo when I typed above, That should be Unicode 0626 (Arabic Yeh
with hamza) - NOT 0636.
Reply all
Reply to author
Forward
0 new messages