Issue 10 in pseudolocalization-tool: fake bidi method can be improved by adding RLMs

17 views

Skip to first unread message

pseudolocal...@googlecode.com

unread,

Aug 7, 2014, 4:53:39 AM8/7/14

to pseudolocal...@googlegroups.com

Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 10 by aha...@google.com: fake bidi method can be improved by
adding RLMs
http://code.google.com/p/pseudolocalization-tool/issues/detail?id=10

The fake bidi method can produce output that even more closely resembles
real RTL text by adding an RLM before each RLO and after each PDF. For
example, where currently for "hello world" it produces "\u202Ehello\u202C
\u202Eworld\u202C", it would now produce "\u200F\u202Ehello\u202C\u200F
\u200F\u202Eworld\u202C\u200F".

While most of the time the visual output would be identical, adding the
RLMs has two advantages:

1. The first-strong directionality estimation method, as specified in the
Unicode Bidirectional Algorithm's rules P2 and P3
(http://www.unicode.org/reports/tr9/#P2), would then decide that fake bidi
text is RTL; currently it decides that it is LTR. As a result, fake bidi
text currently does not behave in the same way as real RTL text (e.g.
Hebrew or Arabic) in contexts like Android TextViews and HTML's dir="auto"
attribute, which use the first-strong algorithm. Adding the RLM would fix
this discrepancy.

2. When a message contains a placeholder followed by a localizable text
fragment that begins with a strong character (not a neutral character like
a space or punctuation), and the placeholder ends in a number, the visual
ordering that currently results for fake bidi localization is not
equivalent to that resulting for a real RTL translation: in an RTL context,
with fake bidi, the number appears to the left of the text fragment; with
real RTL text, the number appears to the right. For example, let's say that
the placeholder value is "12" and the localizable text fragment is "hello".
Then, when fake bidi changes the "hello" into "\u202Ehello\u202C", the
overall output is "12\u202Ehello\u202C". You can see the visual ordering
specified for that by the Unicode Bidi Algorithm in an RTL paragraph here:
http://unicode.org/cldr/utility/bidi.jsp?a=12%E2%80%AEhello%E2%80%AC&p=RTL;
the number is on the left. However, if the text fragment were the Hebrew
character alef, "\u05D0", and thus the whole string were "12\u05D0", the
number would come out on the right:
http://unicode.org/cldr/utility/bidi.jsp?a=12%D7%90&p=RTL. This is fixed by
adding the RLMs to fake bidi: "12\u200F\u202Ehello\u202C\u200F" is
displayed with the number on the right, as with real RTL text
(http://unicode.org/cldr/utility/bidi.jsp?a=12%E2%80%8F%E2%80%AEhello%E2%80%AC%E2%80%8F&p=RTL).
The same issue occurs when a placeholder follows a localizable text
fragment that ends in a strong character; this is why I am suggesting not
only to put an RLM before the RLO, but also to put an RLM after the PDF.
One may think that it is strange to have a placeholder come immediately
before or after strong text, not a neutral like a space or punctuation;
text like "hello: 12" or "12: hello" is a lot more common than "hello12"
or "12hello". However, the same issue occurs (and is fixed by the RLMs)
when between the placeholder and the localizable text fragment is a
nonlocalizable text fragment containing markup that introduces a space
between the two, e.g. "<span style='padding: 5px'>", and this is
unfortunately a fairly common practice in HTML.

--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

pseudolocal...@googlecode.com

unread,

Aug 10, 2014, 2:36:31 AM8/10/14

to pseudolocal...@googlegroups.com

Updates:
Status: Verified

Comment #1 on issue 10 by aha...@google.com: fake bidi method can be

improved by adding RLMs
http://code.google.com/p/pseudolocalization-tool/issues/detail?id=10