Numerals display for the AraSVD translation - Part 2

31 views
Skip to first unread message

Sami Abdel Malik

unread,
Oct 15, 2022, 1:06:56 PM10/15/22
to STEP Bible Forum
The problem presented here is to do with the order of displaying the chapter and verse. The expectation is that the book abbreviation is followed by the chapter number followed by a colon followed by the verse number. However, the result of a search in the Arabic mode and using AraVSD displays the book abbreviation followed by the verse number followed by a colon followed by the chapter number, which is confusing.

Browsers are designed to handle bilingual display of languages of opposite text directions. If the main language is right to left, once characters of the language of the opposite direction is encountered, they are inserted and pushed to the left.

Unfortunately, the Arabic Unicode/UTF-8 does not have an Arabic colon. Therefore, the English colon (ASCII code 0x3A) is used. When the browser encounters the colon, it interprets it as a change in language and therefore, although the chapter:verse is coded in the correct order, they are displayed in a reversed order because of the incorrect language switch.
To rectify this issue, we can use the Syriac colon (U-0706 - UTF-8 0xDC86) instead of the English colon.

Here is an example:
Gen 1:27
Currently is displayed as:
تك ١:٢٧
After replacing the English with the Syriac colon, it displays correctly as:
تك ١܃٢٧
The colon is a bit smaller, but that is ok.


Peter von Kaehne

unread,
Oct 15, 2022, 2:31:15 PM10/15/22
to sami.ab...@gmail.com, STEP Bible Forum
Ok, this fixes it by way of display, but the colon character is actually to the best of my knowledge not directional, so the flaw might be lying elsewhere. I would suggest looking further for what causes it for the display being wrong. 

One thought I have - seeing that numbers are LtoR , despite the text being RtoL that the colon between two numbers simply is read as bidirectional and as being embedded into two LtoR blocks means it becomes also LtoR and the orders the overall set of three blocks (chapter, colon, verse) in a LtoR fashion. So, instead of the replacement of this character with an extraneous character and as a special case solution I would propose keeping it as it is and then fix the ordering of the three blocks in a general way. 

Peter 

Sent from my phone. Please forgive misspellings and weird “corrections”

On 15 Oct 2022, at 18:06, Sami Abdel Malik <sami.ab...@gmail.com> wrote:


--
To restrict emails to only important news:
. . go to https://groups.google.com/forum/#!forum/stepbibleforum
. . then click on the Personal Options (the head+cog icon),
. . and select "...Email setting" > Don't send email updates"
---
You received this message because you are subscribed to the Google Groups "STEP Bible Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to StepBibleForu...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/StepBibleForum/a3297efa-d292-465e-a568-066e10dadd31n%40googlegroups.com.

Sami Abdel Malik

unread,
Oct 15, 2022, 3:57:14 PM10/15/22
to STEP Bible Forum
Thank you, Peter, for taking the time to respond.

The characters in themselves do not have a direction. The code of the character determines which family (or block of the Unicode) it belongs to. The browser, based on the code, decides whether it belongs to a right to left language or a left to right language. So, for example: code points U+0590 to U+060F are for Hebrew, therefore the browser determines that it is a right to left language. Same for U+0600 to U+067F (Arabic) and U+0700 to U+077F (Syriac). All ASCII displayable characters (including the colon) are right to left by default.

I'm including a simple html file that shows the effect of just replacing the ASCII colon by a Syriac colon. It displays as follows:
تك ١:٢٧ ASCII colon
تك ١܃٢٧ Syriac colon
ברא 1:27 Hebrew has no codes for numbers, it uses ASCII numbers

A look at the encoding inside the file you'll see that:
for the first line: D9 A1 3A D9 A2 D9 A7
for the second line: D9 A1 DC 83 D9 A2 D9 A7
D9 A1 is ١
3A is :
D9 A2 is ٢
D9 A7 is ٧

DC 83 is Syriac:

The behavior of the browser is even mor evident with Hebrew. I switched STEP display language to Hebrew and the result is as shown above. Here is a copy from STEP display

ברא 1:27 וַיִּבְרָא אֱלֹהִים ׀ אֶת־הָֽאָדָם בְּצַלְמוֹ בְּצֶלֶם אֱלֹהִים בָּרָא אֹתוֹ זָכָר וּנְקֵבָה בָּרָא אֹתָֽם׃

This is for Gen 1:27. Because in the "1:27" is all in ASCII, it is displayed in left to right mode, although the main language is right to left.

May be Hebrew speakers are OK with that. But it is not OK with Arabic speakers. All Arabic commentaries use the standard convention (from right to left) book chapter:verse.

(BTW: years back my job was to adapt my companies display products to Arabic, so I'm very familiar with the inner workings of this).   

All blessings
Sami
arabicColon.html

Sami Abdel Malik

unread,
Oct 15, 2022, 4:21:20 PM10/15/22
to STEP Bible Forum
Would you please elaborate on your suggestion of switching the order? It may be a good solution.

If standardizing on a Syriac colon for Arabic is not feasible, workaround is to add a space after the colon. The display will look as follows:
تك ١: ٢٧ ASCII colon followed by a space

تك ١܃٢٧ Syriac colon
On Saturday, October 15, 2022 at 2:31:15 PM UTC-4 refdoc wrote:

Sami Abdel Malik

unread,
Oct 15, 2022, 4:51:24 PM10/15/22
to STEP Bible Forum
Another option which might be more complex that using the Syriac colon, is to follow the ASCII colon by a RIGHT-TO-LEFT MARK from the Unicode General Punctuation Group - (U+200F). The attached demonstrates how this is done. 

On Saturday, October 15, 2022 at 2:31:15 PM UTC-4 refdoc wrote:
arabicColon.html

Peter Von Kaehne

unread,
Oct 15, 2022, 5:03:27 PM10/15/22
to STEP
Apologies, I accidentally sent the below to Sami directly instead of the group
>
> Hi Sami,
>  
> Just like you I have worked for a long time with this and a lot of the quirks have been persecuting me. Not professionally - but specifically with Bible software. Most of the CrossWire related adaptions to RtoL scripts have been either mine or were done with my contribution (CrossWire's java library JSWORD underlies STEP)
>  
> The one take home message from that work is - we never ever want language specific hacks to fix text direction. Do it generically and do it right and you will be fine most of the time - and where you are not, you probably simply need to look harder.
>
> Here now - again, you misunderstand the role of the colon - it is not a LtoR character, but can be used in either direction and does not force anything going wrong. the problem here is that it is surrounded by numbers (which are in Arabic derrived scripts ordered LtoR despite the overall flow of the text being RtoL. Pack the same colon into a text of letters, not numbers the flow of text will be fine. Please see attached test HTML which shows this. The text is Farsi and says "I said: Hello" (man goftam:Salam" ) and  then twice "123:456" respectively.
> What I have done in the second line is to pack the colon into a span and add a direction attribute - here hardcoded, but it can be easily done by variable depending on the language/script and is much more generic than replacing for one language only the colon with an extraneous character.
>  
> Hope this helps.
>
> Peter
>  
>  
>
> Gesendet: Samstag, 15. Oktober 2022 um 21:21 Uhr
> Von: "Sami Abdel Malik" <sami.ab...@gmail.com>
> An: "STEP Bible Forum" <StepBib...@googlegroups.com>
> Betreff: Re: [stepbibleforum] Numerals display for the AraSVD translation - Part 2
> Would you please elaborate on your suggestion of switching the order? It may be a good solution.
>  
> If standardizing on a Syriac colon for Arabic is not feasible, workaround is to add a space after the colon. The display will look as follows:
> تك ١: ٢٧ ASCII colon followed by a space
> تك ١܃٢٧ Syriac colon
>  
>
> On Saturday, October 15, 2022 at 2:31:15 PM UTC-4 refdoc wrote:
> Ok, this fixes it by way of display, but the colon character is actually to the best of my knowledge not directional, so the flaw might be lying elsewhere. I would suggest looking further for what causes it for the display being wrong. 
>  
> One thought I have - seeing that numbers are LtoR , despite the text being RtoL that the colon between two numbers simply is read as bidirectional and as being embedded into two LtoR blocks means it becomes also LtoR and the orders the overall set of three blocks (chapter, colon, verse) in a LtoR fashion. So, instead of the replacement of this character with an extraneous character and as a special case solution I would propose keeping it as it is and then fix the ordering of the three blocks in a general way. 
>  
> Peter 
>  
> Sent from my phone. Please forgive misspellings and weird “corrections”
>  On 15 Oct 2022, at 18:06, Sami Abdel Malik <sami.ab...@gmail.com> wrote:
>  
> 
>
> The problem presented here is to do with the order of displaying the chapter and verse. The expectation is that the book abbreviation is followed by the chapter number followed by a colon followed by the verse number. However, the result of a search in the Arabic mode and using AraVSD displays the book abbreviation followed by the verse number followed by a colon followed by the chapter number, which is confusing.
>  Browsers are designed to handle bilingual display of languages of opposite text directions. If the main language is right to left, once characters of the language of the opposite direction is encountered, they are inserted and pushed to the left.
>  
> Unfortunately, the Arabic Unicode/UTF-8 does not have an Arabic colon. Therefore, the English colon (ASCII code 0x3A) is used. When the browser encounters the colon, it interprets it as a change in language and therefore, although the chapter:verse is coded in the correct order, they are displayed in a reversed order because of the incorrect language switch.
> To rectify this issue, we can use the Syriac colon (U-0706 - UTF-8 0xDC86) instead of the English colon.
>  
> Here is an example:
> Gen 1:27
> Currently is displayed as:
> تك ١:٢٧
> After replacing the English with the Syriac colon, it displays correctly as:
> تك ١܃٢٧
> The colon is a bit smaller, but that is ok.
>  
>  
>  
>
> --
> To restrict emails to only important news:
> . . go to https://groups.google.com/forum/#!forum/stepbibleforum[https://groups.google.com/forum/#!forum/stepbibleforum]
> . . then click on the Personal Options (the head+cog icon),
> . . and select "...Email setting" > Don't send email updates"
> ---
> You received this message because you are subscribed to the Google Groups "STEP Bible Forum" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to StepBibleForu...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/StepBibleForum/a3297efa-d292-465e-a568-066e10dadd31n%40googlegroups.com[https://groups.google.com/d/msgid/StepBibleForum/a3297efa-d292-465e-a568-066e10dadd31n%40googlegroups.com?utm_medium=email&utm_source=footer].
>  
> --
> To restrict emails to only important news:
> . . go to https://groups.google.com/forum/#!forum/stepbibleforum[https://groups.google.com/forum/#!forum/stepbibleforum]
> . . then click on the Personal Options (the head+cog icon),
> . . and select "...Email setting" > Don't send email updates"
> ---
> You received this message because you are subscribed to the Google Groups "STEP Bible Forum" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to StepBibleForu...@googlegroups.com[mailto:StepBibleForu...@googlegroups.com].
> To view this discussion on the web, visit https://groups.google.com/d/msgid/StepBibleForum/68f10594-d2d0-4071-8a67-90db5ccb9194n%40googlegroups.com[https://groups.google.com/d/msgid/StepBibleForum/68f10594-d2d0-4071-8a67-90db5ccb9194n%40googlegroups.com?utm_medium=email&utm_source=footer].
test.html

Peter Von Kaehne

unread,
Oct 15, 2022, 5:04:49 PM10/15/22
to STEP
Generally best is to work with the HTML and attributes and not add extraneous or wrong characters. At least I think so.
 
 
Gesendet: Samstag, 15. Oktober 2022 um 21:51 Uhr

Von: "Sami Abdel Malik" <sami.ab...@gmail.com>
An: "STEP Bible Forum" <StepBib...@googlegroups.com>
Betreff: Re: [stepbibleforum] Numerals display for the AraSVD translation - Part 2

Sami Abdel Malik

unread,
Oct 15, 2022, 5:35:29 PM10/15/22
to STEP Bible Forum
I do agree with your approach and looked at the file sent. 

But I respectfully disagree with you about the colon. It is an ASCII character and therefore the browser treats it as a left to right character. However, as I mentioned in my latest post, there is a Unicode character that forces the preceding left to right character to be treated as a right to left character. It is the RIGHT-TO-LEFT MARK (U+200F). I'm sending your test file back with this solution, which, I believe is the right thing to do, but less readable by the programmer. 

I admire your knowledge of Arabic, where did you learn it? what does گفتم  stand for?

test.html

Peter Von Kaehne

unread,
Oct 15, 2022, 6:30:57 PM10/15/22
to sami.ab...@gmail.com, STEP
Sami,

Honestly - the ASCII origin is here and today now totally irrelevant. This is now unicode and a clean slate. The character is now bidrectional and orders surrounding text blocks.

Look at the file. It is the same character - but the surroiunding text dictates how it orders the blocks left and right.  if it is text - the character puts the sperated blocks in right to left order. If it is numbers it puts the blocks in left to right order as numbers are written left to right in Arabic. Why - I do not know. But being surrounded by two blocks of numbers things end up ordered in a left to right way. Once one block is unambigiusly right to left the order right to left in an overall right to left environment. See amended example file.

What you need is to override the ordering behaviour of the character and that is done by an attribute or, if you must, by a unicode ordering character.

Peter
 
 
Gesendet: Samstag, 15. Oktober 2022 um 22:35 Uhr

Von: "Sami Abdel Malik" <sami.ab...@gmail.com>
An: "STEP Bible Forum" <StepBib...@googlegroups.com>
Betreff: Re: Fw: Aw: Re: [stepbibleforum] Numerals display for the AraSVD translation - Part 2
test.html

sami.ab...@gmail.com

unread,
Oct 15, 2022, 7:05:15 PM10/15/22
to Peter Von Kaehne, STEP

Thank you Peter,

 

Another option is to follow the colon with the HTML code for the RIGHT-TO-LEFT MARK which is &#8207;

 

So now you have different options, please pick one.

 

All blessings

Sami

Sami Abdel Malik

unread,
Oct 15, 2022, 9:37:01 PM10/15/22
to Peter Von Kaehne, STEP
Hi Peter,

Will the issue of the chapter / verse order be resolved?
Did you have a look at the other issue (Arabic vs Indian numbers)?

I want to thank you for you patience.

I also want to share with you my understanding:
Unicode characters are at least two bytes (16 bits)
ASCII on the other hand is only one byte (8 bits).
As I showed in a previous post (from a bin dump) the Arabic numerals are coded with 2 bytes each. They are Unicode.
The colon on the other hand is coded with only one byte (0x3A), which cannot be called Unicode or bidirectional. It is the code in the ASCII chart. It is treated as a Left to right character, otherwise why does the Unicode include a special code to change punctuation characters to Right to left? (l’m just wondering, not asking)

I’m just sharing my understanding, I’m not trying to make you accept it.

The Lord bless you
Sami
<test.html>

David Instone-Brewer

unread,
Oct 16, 2022, 9:13:34 AM10/16/22
to sami.ab...@gmail.com, Patrick Tang, Peter Von Kaehne, STEP
Dear Patrick

It does look as if this problem is fixable - though I don't know how easy it will be to implement. 

image.png
The refs SHOULD say Gen 1:1, 1:3, 1:5, 1:7  - when reading Right to Left. 
BUT, reading right to left we see the word "Genesis" (تكوين ) then the verse number then a colon and then the chapter number "1" ( ١ ). 
In the screenshot the last verse has been corrected by substituting the Ethiopic colon (  ܃ - Unicode 0703) for the English one - as Sami suggested 
It looks good. It makes the verse number follow the chapter number. 
To make this work, we need to tell the system to use Unicode 0703 instead of 003A (the colon) when a Right-to-Left language is displayed

However, I suspect this is much harder to implement than to describe.

All the best 

David IB



Sami Abdel Malik

unread,
Oct 16, 2022, 11:30:57 AM10/16/22
to STEP Bible Forum
I had a lengthy exchange of emails with Peter von Kaehne on this.

In my opinion, the best way of fixing this, is to follow the colon with the Unicode RIGHT-TO-LEFT MARK (U+200F). For HTML, this is &#8207.
I believe the Unicode RIGHT-TO-LEFT MARK is intended for such a case. 

The following options came up in the discussion with Peter
1. Add &#8207 after the colon
2. Add [U+200F] after the colon
3. Add a space after the colon
4. replace the colon with the Syriac colon [U+0703]
I'm attaching an HTML file that displays the above options as follows:

تك ١:٧ original
تك ١:‏٧ added &#8207
تك ١:‏٧ added [U+200F]
تك ١: ٧ added a space - looks different
تك ١܃٧ Using Syriac colon - the colon is smaller
arabicColon.html
Reply all
Reply to author
Forward
0 new messages