print arabic characters

Ahmad

unread,

Dec 22, 2003, 6:05:25 AM12/22/03

to

Hi all,

I am a python newbie, I want to print on the console UTF-8 arabic
characters. They print OK with
print text.encode("UTF-8")

BUT, the characters are printed LTR, not RTL (right to left). How can
I change the printing direction??

Thnx

Martin v. Loewis

unread,

Dec 22, 2003, 7:07:10 AM12/22/03

to

Ahmad wrote:

> I am a python newbie, I want to print on the console UTF-8 arabic
> characters. They print OK with
> print text.encode("UTF-8")

Are you talking about console mode (i.e. the cmd.exe window)? Or are
you perhaps talking about IDLE?

In Python 2.3, they should also print ok when you print a Unicode object
proper; sys.stdout.encoding should indicate what encoding the terminal
uses.

> BUT, the characters are printed LTR, not RTL (right to left). How can
> I change the printing direction??

Assuming it is IDLE: This is a bug in Tk, then - I don't know whether
Tk supports RTL.

Assuming it is the terminal: Does the terminal support RTL in the first
place? Python does nothing else but write the characters in logical
order to the terminal stream; it would be the terminal's job to put
them in the right display order. Perhaps printing RIGHT-TO-LEFT MARK
(U+U200F) helps?

Regards,
Martin

Peter Otten

unread,

Dec 22, 2003, 1:53:23 PM12/22/03

to

Ahmad wrote:

Are all arabic characters 2 byte in UTF-16? Then the following RTLStream
class should work in an otherwise left to right environment.
Call the script with a -d or --delay parameter to see it working.

<rtl.py>
import sys, time

def utfReverse(s):
# CAVEAT: this will mess up characters that are
# more than 2 bytes long in utf 16
u = s.decode("utf-8")
return u[::-1].encode("utf-8")

class RTLStream:
""" Emulate a right-to-left printing console in a
left-to-right environment
"""
def __init__(self, out=sys.stdout, wrapwidth=40):
self.out = out
self.wrapwidth = wrapwidth
self.curline = ""
def _write(self, s):
if len(s) == 0: return
self.curline += utfReverse(s)
self.out.write("\r")
if len(self.curline) > self.wrapwidth:
self.out.write(self.curline[:self.wrapwidth])
self.out.write("\n")
self.curline = self.curline[self.wrapwidth:]
self.out.write(self.curline.rjust(self.wrapwidth))
def _nl(self):
self.out.write("\n")
self.curline = ""
def write(self, s):
lines = s.split("\n")
lines.reverse()
for line in lines[:-1]:
self._write(line)
self._nl()
self._write(lines[-1])

class SlowStream:
""" delay the output to the target stream
"""
def __init__(self, out=sys.stdout, delay=0.01):
self.delay = delay
self.out = out
def write(self, s):
for b in s:
time.sleep(self.delay)
self.out.write(b)
self.out.flush()

if __name__ == "__main__":
rtlstream = RTLStream(wrapwidth=36)
args = sys.argv[1:]
if "--delay" in args or "-d" in args:
rtlstream.out = SlowStream()

# always save a copy of the original stdout
orig_stdout = sys.stdout

# redirect stdout
sys.stdout = rtlstream

print "sella ow",
print "tsieweb kc\xc3\xbclg hcrud hcis"
print "dnu", "thcsuat",
print "kcilb ned", "hcuregniew mi\negnir eid thcsuat dnu",
print "egnid red hcsuar mi"
print "med kc\xc3\xbclgnegeg med ud tsneid"

# restore stdout
sys.stdout = orig_stdout

# explicit redirection with
# print >> rtlstream, some_text
# is usually preferable
print
print "back to normal"
print >> rtlstream, "a saner way to use it"
print "that's all folks"
</rtl.py>

Disclaimer: As I know nothing about right-to-left printing languages, it's
likely that I have got it at least partially wrong.

Can anybody point me to a way to iterate over characters with a varying
number of bytes? Something like

for c in "Gru\xc3\x9f".characters("utf-8):
print repr(c),
#should print 'G' 'r' 'u' '\xc3\x9f'

Peter

Ahmad

unread,

Dec 22, 2003, 2:13:33 PM12/22/03

to

Sorry for not being very verbose

I am on Linux Redhat9. Terminal used is Konsole. I am not really sure
if the terminal should be the one re-arranging the chars?! The konsole
people say it should be the application programmer to arrange the bidi
chars correctly!

Also, I hope that my application will be cross-platform, so if there
is a way that will work with windows please tell me...

Do I need python2.3? I think the one pre-packaged is 2.2

Also, how do I print "(U+U200F)"?

Thnx a lot

"Martin v. Loewis" <mar...@v.loewis.de> wrote in message news:<bs6mpb$g9d$03$1...@news.t-online.com>...

vincent wehren

unread,

Dec 22, 2003, 2:56:14 PM12/22/03

to

"Ahmad" <eng...@link.net> schrieb im Newsbeitrag
news:3014031e.03122...@posting.google.com...

| Sorry for not being very verbose
|
| I am on Linux Redhat9. Terminal used is Konsole. I am not really sure
| if the terminal should be the one re-arranging the chars?! The konsole
| people say it should be the application programmer to arrange the bidi
| chars correctly!
|
| Also, I hope that my application will be cross-platform, so if there
| is a way that will work with windows please tell me...

The sad thruth is that complex scripts (including Arabic) are not supported
on win32 console.

Vincent Wehren

Martin v. Loewis

unread,

Dec 22, 2003, 3:18:21 PM12/22/03

to

Ahmad wrote:

> I am on Linux Redhat9. Terminal used is Konsole. I am not really sure
> if the terminal should be the one re-arranging the chars?! The konsole
> people say it should be the application programmer to arrange the bidi
> chars correctly!

I believe this is non-sense. It would mean that the program needs
to output the characters in non-logical order, which makes it impossible
to have cut-and-paste work correctly. So it *must* be the terminal
which implements RTL (it also needs to implement glyph shaping).

There are patches circulating that enhance terminals, e.g. for xterm:

http://mail.nl.linux.org/linux-utf8/2000-10/msg00140.html

> Also, I hope that my application will be cross-platform, so if there
> is a way that will work with windows please tell me...

I recommend you don't change the application at all. Instead, you should
work with the terminal application developers to make this work.
Unfortunately, there are quite a lot of terminal applications out there
to change.

> Do I need python2.3? I think the one pre-packaged is 2.2

Not really. If you *know* your terminal is UTF-8, you can output
UTF-8 directly. BTW, you can also implement RTL and glyph shaping
in your application if you know the terminal won't do it.

In Python 2.3, you could print Unicode objects directly, without
the need to encode them as UTF-8 first.

> Also, how do I print "(U+U200F)"?

print u"\u200F".encode("utf-8")

HTH,
Martin

Martin v. Loewis

unread,

Dec 22, 2003, 3:31:35 PM12/22/03

to

Peter Otten wrote:
> Disclaimer: As I know nothing about right-to-left printing languages, it's
> likely that I have got it at least partially wrong.

Indeed. First of all, each Unicode character has a directionality,
available as unicodedata.bidirectional; this is L, R, or AL for most
characters; some characters have weak (EN, ES, ET, ...) or neutral
(B, S, ...) directionality. You need to find runs of characters with
the same directionality; extending the run into weak or neutral
characters. Then you need to reverse only RTL runs, leaving the LTR
runs intact.

Next, in the process of reversing, you may need to mirrot weak LTR
characters, replacing them with their unicodedata.mirrored character.

Then, for AL runs, you need to replace European numerals with Arabic
numerals (but keeping the LTR order).

Finally, and again for Arabic characters, you need to perform glyph
shaping, replacing the first character of a word with the INITIAL
FORM, the last character with the FINAL FORM, all other characters
of a word with the MEDIAL FORM, and all remaining characters with
the ISOLATED FORM. This, of course, assumes your font has glpyhs
for these available.

This is specified in more detail in

http://www.unicode.org/reports/tr9/

> Can anybody point me to a way to iterate over characters with a varying
> number of bytes?

There is no trivial algorithm. You best decode the string into Unicode,
reverse, then encode again to the original encoding.

Regards,
Martin

Ahmad

unread,

Dec 25, 2003, 2:51:59 AM12/25/03

to

Hi all,

I just wanted to tell everyone here, that none of the tips really
worked. The RTLstreamer class seemed so messed up, and printing
"\u200F" before my text didn't make any difference!! I can't beleive
that after all this time, unicode and bidi support still isn't working
nicely :(

OTOH, I tried pyGtk, the text is automatically RTL, (nice) but still
the first character in the scentence isn't showing.

Any other tricks?

"Martin v. Loewis" <mar...@v.loewis.de> wrote in message news:<bs7kb5$9mv$01$1...@news.t-online.com>...

Alia Khouri

unread,

Dec 25, 2003, 8:41:54 AM12/25/03

to

Ahmad:

> I just wanted to tell everyone here, that none of the tips really
> worked. The RTLstreamer class seemed so messed up, and printing
> "\u200F" before my text didn't make any difference!! I can't beleive
> that after all this time, unicode and bidi support still isn't working
> nicely :(

It's not so much that the RTLstreamer class is 'so messed up', but
that, notwithstanding the generous efforts of its author, it remained
untested with RTL Arabic script.

Support for bidirectional text in python will be, as is the case with
all open-source software, a function of how many interested people
need the functionality or want to scratch a common 'itch'.

Alia

Martin v. Loewis

unread,

Dec 25, 2003, 2:44:02 PM12/25/03

to

Ahmad wrote:
> I can't beleive
> that after all this time, unicode and bidi support still isn't working
> nicely :(

I'm not surprised at all. To make this really work, you need support
from native speakers of an RTL langauge, ideally both from people
speaking Arabic, and people speaking Hebrew. It appears that there
have been little contributions from Arabic speakers to BiDi aspects
of open source software, so therefore, nothing of this works out of
the box, yet. Other languages' aspects (European diacritic marks, CJK
disambiguation and font selection, Japanese input methods, ...) are
much better supported because speakers of these languages did
contribute in the past.

I would encourage you to contribute to the packages that interest
you most. If I was speaking Arabic (which I don't), I would look
into packages in this order:
- Tk: when this works, Tkinter will work as well
- Konsole: apparently the terminal program that we both use
- xterm: this should be contributed back into the X distribution,
when done - perhaps you need to first add support for BiDi into
the underlying X11 libraries
- gnome-terminal

There is probably more (Qt? PythonWin?), but those would have lower
priority for me.

Regards,
Martin

Serge Orlov

unread,

Dec 25, 2003, 3:11:11 PM12/25/03

to

"Ahmad" <eng...@link.net> wrote in message news:3014031e.0312...@posting.google.com...

> Hi all,
>
> I just wanted to tell everyone here, that none of the tips really
> worked. The RTLstreamer class seemed so messed up, and printing
> "\u200F" before my text didn't make any difference!! I can't beleive
> that after all this time, unicode and bidi support still isn't working
> nicely :(
>
> OTOH, I tried pyGtk, the text is automatically RTL, (nice) but still
> the first character in the scentence isn't showing.
>
> Any other tricks?

You can try to make a web application if it's possible. I bet major
browsers support bidi well. You can also look at Mozilla as your
cross platform display and input engine.

-- Serge.

Bill Trenker

unread,

Dec 25, 2003, 9:54:55 AM12/25/03

to pytho...@python.org

Ahmad wrote:

> Any other tricks?

I don't know if this is of any benefit but I was using the latest Opera web browser on my Linux system and happened to try out the Arabic introduction page on the unicode.org site. The url is:
http://www.unicode.org/standard/translations/arabic.html

Since I can't read Arabic I don't know if the writing on that page is correctly presented, but I thought you might want to try it out. I know that the Opera folks have put a lot of emphasis into RTL (bidi) support and internationalization. Maybe Opera is an example of at least one application that is making better progress in displaying Arabic.

Regards,
Bill

Skip Montanaro

unread,

Dec 26, 2003, 9:46:23 AM12/26/03

to Bill Trenker, pytho...@python.org

Bill> I don't know if this is of any benefit but I was using the latest
Bill> Opera web browser on my Linux system and happened to try out the
Bill> Arabic introduction page on the unicode.org site. The url is:
Bill> http://www.unicode.org/standard/translations/arabic.html

Looks gorgeous in Safari. Again, I don't read Arabic, so I can't tell what
it's saying, just that it looks nice. (Arabic strikes me as a very pretty
language.)

Interestingly enough, the highlighting is backwards. If I press and drag
the mouse button from the right edge of a line, the background highlighting
starts from the left edge. It appears Apple doesn't have all the kinks
worked out.

Skip

Serge Orlov

unread,

Dec 26, 2003, 5:59:18 PM12/26/03

to

"Skip Montanaro" <sk...@pobox.com> wrote in message news:mailman.118.1072449...@python.org...

It looks great in Firebird 0.7 and IE 6.0. But there are still minor selection
problems too. In Firebird double clicking Arabic word sometimes selects extra
characters on the left. Double clicking on English word among Arabic is
badly wrong. In IE selecting English words together with Arabic highlights
the English words in the wrong direction.

Unicode _is_ rocket science <wink>

-- Serge.

Suchandra Thapa

unread,

Dec 27, 2003, 3:01:08 AM12/27/03

to

On Fri, 26 Dec 2003 08:46:23 -0600, Skip Montanaro wrote:

> Interestingly enough, the highlighting is backwards. If I press and drag
> the mouse button from the right edge of a line, the background highlighting
> starts from the left edge. It appears Apple doesn't have all the kinks
> worked out.

The highlighting might be due to the fact that Arabic is read right to
left. Interestingly enough, I believe that numbers are written
left to right. I'm curious as to the layout of columns in a newspaper,
though.

"Martin v. Löwis"

unread,

Dec 27, 2003, 4:31:10 AM12/27/03

to

Serge Orlov wrote:

> Unicode _is_ rocket science <wink>

And Web browsers are the category of software that deals
best with Unicode and i18n. It's a long way until terminal
emulators can do what web browsers can do today, wrt to
rendering non-ASCII text.

Regards,
Martin

Peter Otten

unread,

Dec 29, 2003, 3:31:57 AM12/29/03

to

Martin v. Loewis wrote:

[RTL Mini-Howto]

Yup. Your outline made it clear that the problem does not lend itself to a
quick hack that trades simplicity against proper cut-and-paste behaviour.

> Then, for AL runs, you need to replace European numerals with Arabic
> numerals (but keeping the LTR order).

I always thought of numbers as most significant digit first. But the above
suggests that they are least significant digit first, preserving the
original RTL directionality.

Peter

"Martin v. Löwis"

unread,

Dec 29, 2003, 5:43:28 AM12/29/03

to

Peter Otten wrote:

>>Then, for AL runs, you need to replace European numerals with Arabic
>>numerals (but keeping the LTR order).
>
>
> I always thought of numbers as most significant digit first. But the above
> suggests that they are least significant digit first, preserving the
> original RTL directionality.

I'm actually uncertain: I recently learned that our (the European's)
number system was *not* copied from the Arabs, but instead, both
the Arabs and the Europeans copied the numbers from the Indians
in the same time frame.

So it may be that the Arabs have LTR for numbers as it is an
imported writing system.

As I said, I'm uncertain: It may also be that you are right,
and numbers "properly" have the least significand digit first,
and we copied the order from the Arabs.

Regards,
Martin

OKB (not okblacke)

unread,

Dec 29, 2003, 9:13:32 PM12/29/03

to

Peter Otten wrote:

>> Then, for AL runs, you need to replace European numerals with
>> Arabic numerals (but keeping the LTR order).
>
> I always thought of numbers as most significant digit first. But
> the above suggests that they are least significant digit first,
> preserving the original RTL directionality.

In Arab countries which use the Indian numerals (not the "Arabic"
ones that we use in the west), the numbers are written with the most
significant digit at the left. This is indeed "last" with respect to
the normal right-to-left direction of Arabic.

--
--OKB (not okblacke)
"Do not follow where the path may lead. Go, instead, where there is
no path, and leave a trail."
--author unknown