Problem with # and + characters in right to left context

15 views
Skip to first unread message

hAmid reZa

unread,
Oct 7, 2010, 8:12:37 AM10/7/10
to Persian Computing
Hi,
Try this: open Notepad, switch its direction to right-to-left, and
type or copy this paragraph:
این کتاب دربارهٔ زبانهای C# و C++ است.
The positions for # and ++ are after C. If you use this sentence in a
right-to-left HTML page, browsers (I've checked with Firefox and IE in
Windows) render the page the same pattern.
Is this a bug or it is normal?
I usually fix these kind of problems in my HTML pages by defining a
CSS style with
{direction:ltr;unicode-bidi:bidi-override}
and enclose problematic elements (C#, C++) with a "span" element,
styled with my predefined CSS style.
Do you know a better solution which preferably works with non-HTML
documents (plain text)?

Hossein Noorikhah

unread,
Oct 7, 2010, 9:53:10 AM10/7/10
to Persian Computing
Salaam,
The better solution is using appropriate Unicode control characters, like LTR embedding and PDF (pop directional formatting). As I remember, M$ Word does this automatically.
برنامه‌نویس با زبان C‪++‬ آسان است!

hAmid reZa

unread,
Oct 7, 2010, 11:33:30 AM10/7/10
to Persian Computing
On Oct 7, 4:53 pm, Hossein Noorikhah <hossein...@gmail.com> wrote:
> The better solution is using appropriate Unicode control characters, like
> LTR embedding and PDF (pop directional formatting). As I remember, M$ Word
> does this automatically.
> برنامه‌نویس با زبان C‪++‬ آسان است!

Thanks, it is a solution that works for plain text. My question is,
however: is this behavior (requiring some control characters for
achieving correct rendering) a standard behavior according to Unicode
standard? I mean, + or # in both cases are linked to C, there is no
space between characters and it should be considered as one word , if
it is a left-to-right phrase it should be considered left-to-right for
all characters (am I wrong?). Why this happens?

Ehsan Akhgari

unread,
Oct 7, 2010, 11:37:03 AM10/7/10
to hAmid reZa, Persian Computing
Characters such as # and + are weak according to the UBA, that's why their placement is affected by the direction of the next strong character following them.  The simplest solution to fix this is perhaps to use LRM right after them, for example (in logical order):

C++<LRM>

LRM is a strong LTR character, which makes the entire phrase ("C++<LRM>") an LTR run, and will give you the desired presentation.

--
Ehsan
<http://ehsanakhgari.org/>


Milad khajavi

unread,
Oct 7, 2010, 9:16:07 AM10/7/10
to hAmid reZa, Persian Computing
این کتاب دربارهٔ C++‎ است.
This sentence is written in this manner:
این کتاب دربارهٔ C++ [کاراکتر U+200E] است.

On Thu, Oct 7, 2010 at 3:42 PM, hAmid reZa <mohammadi.hr@gmail.com> wrote:



--
Milad Khajavi
http://lincafe.wordpress.com
I tried to change the world, but I couldn’t find the source code.

Javad

unread,
Dec 15, 2010, 4:31:18 AM12/15/10
to Milad khajavi, hAmid reZa, Persian Computing
How can we add left to right mark in linux text?
I could not find it on default persian keyboard in Fedora14!

Behnam Esfahbod ZWNJ

unread,
Dec 15, 2010, 2:53:16 PM12/15/10
to Javad, Persian Computing
Hi,


On Wed, Dec 15, 2010 at 1:01 PM, Javad <jav...@gmail.com> wrote:
How can we add left to right mark in linux text?
I could not find it on default persian keyboard in Fedora14!

LRM is on AltGr+9 (AltGr+Left-Parenthesis) and RLM is on AltGr+0 (AltGr+Right-Parenthesis).

ISIRI 9147 “Layout of Persian Letters and Symbols on Computer Keyboards”, page 31.

-Behnam


--
    '     بهنام اسفهبد
    '     Behnam Esfahbod
   '     
  *  ..   http://behnam.esfahbod.info
 *  `  *  http://zwnj.org
  * o *   3E7F B4B6 6F4C A8AB 9BB9 7520 5701 CA40 259E 0F8B


Reply all
Reply to author
Forward
0 new messages