How to render engish text alone left to right in arabic language in PDF using DITA OT?

Rahul H

unread,

May 9, 2017, 3:39:36 AM5/9/17

to DITA-OT Users

Hi All,

I have an Arabic topic which having some English text. I am able to render right to left(RTL) for all Arabic text. Is there any way to render English text to left to right(LTR). I am able to render Left to Right if I add an attribute translate="no" for English text in topic, adding attribute for each and every words will be difficult. I want to render the English text without adding any attribute for English text. Can some one help me to resolve this?

Thanks in advance

Rahul

Toshihiko Makita

unread,

May 12, 2017, 3:53:55 AM5/12/17

to DITA-OT Users

Hi Rahul,

What XSL-FO to PDF formatter are you using? If an English text (or phase/word) exists in Arabic text, it should be automatically formatted left to right according to the Unicode BIDI algorithm because English text has the own Latin script attribute. Refer to XSL specification for details:

7.29.7 "writing-mode"

https://www.w3.org/TR/xsl11/#writing-mode

> rl-tb

>If any left-to-right reading characters or numbers are present in the text, the inline-progression-direction for glyph-areas may be further modified by the Unicode BIDI algorithm.

Regards,

--
/*--------------------------------------------------
Toshihiko Makita
Development Group. Antenna House, Inc. Ina Branch
Web site:
http://www.antenna.co.jp/
http://www.antennahouse.com/
--------------------------------------------------*/

Julio Vazquez

unread,

May 12, 2017, 8:43:04 AM5/12/17

to DITA-OT Users

As long as the English text has the @xml:lang attribute specified (use <ph> around the text to set it), the text should render correctly.

Toshihiko Makita

unread,

May 13, 2017, 7:51:10 AM5/13/17

to DITA-OT Users

Hi Rahul,

> I am able to render Left to Right if I add an attribute translate="no" for English text in topic,

> I want to render the English text without adding any attribute for English text.

Following simple DITA concept was rendered correctly even if I use PDF2 & FOP via DITA-OT 2.4.6. All of the English text is rendered from left to right.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="concept_txq_pbq_wz" xml:lang="ar">
 <title>Arabic &amp; English text test</title>
 <conbody>
  <p>The quick brown fox jumps over the lazy dog
  </p>
  <p>الثعلب البني السريع يقفز فوق الكلب الكسول
  </p>
  <p>الثعل the lazy dog ب البنيfox jumps over السري عquick brown يقفز فوق The الكلب الكسول
  </p>
 </conbody>
</concept>

What unwanted result are you seeing in your PDF output?

Rahul H

unread,

May 16, 2017, 12:25:37 PM5/16/17

to DITA-OT Users

Hi Thank you for the reply,

Some cases English text is rendering correctly. Below is the one sample which is not rendering correctly in PDF2 in Hebrew text

יציאתUSB 3.1 מדור 1 ( Type-C) (רק במחשבים שכוללים מעבדי AMD Ryzen 3/Ryzen 5/Ryzen 7) חבר ציוד היקפי כגון התקני אחסון חיצוניים, מדפסות וצגים חיצוניים. מספקת מהירויות העברת נתונים של עד ‎5 Gbps.

PDF rendering

English text USB 3.1 rendering correctly but 5 GBPS is not rendering correctly. I use Antenna formatter.

Thanks in Advance

Rahul

Auto Generated Inline Image 1

Toshihiko Makita

unread,

May 16, 2017, 8:49:12 PM5/16/17

to DITA-OT Users

Hi Rahul,

I copied your two paragraphs and generated PDF via DITA-OT 2.4.6 & AH Formatter V6.4.

As you can see GBPS and other English texts are rendered correctly. I guess that your Hebrew topic is translated from English topic. Usually a translator inserts Unicode BIDI control characters to get the right direction result if the Unicode bidirectional algorithm is not sufficient. These codes includes:

U+202A LEFT-TO-RIGHT EMBEDDING (LRE)

U+202B RIGHT-TO-LEFT EMBEDDING (RLE)

U+202C POP DIRECTIONAL FORMATTING (PDF)

U+202D LEFT-TO-RIGHT OVERRIDE (LRO)

U+202E RIGHT-TO-LEFT OVERRIDE (RLE)

These codes directly affects text direction. So if a translator missed inserting these codes, we got unexpected text direction result. Could you consult your Hebrew topic whether it contains these characters?

Rahul H

unread,

May 22, 2017, 2:03:17 AM5/22/17

to DITA-OT Users

Hi,

Thank you for the information. Before I ask the translation team I need some more help. Currently xml encoding is in UTF 8 when I change encoding to ANSI in notepad++, I can see some strange character in Hebrew. Below is the ANSI encoding for the Hebrew text

×©×ª×™ ×™×¦×™× ×•×ª USB 3.1 ×ž×“×•×¨ 1×—×‘×¨ ×¦×™×•×“ ×”×™×§×¤×™ ×›×’×•×Ÿ ×”×ª×§× ×™ × ×—×¡×•×Ÿ ×•×ž×“×¤×¡×•×ª. ×ž×¡×¤×§×ª ×ž×”×™×¨×•×™×•×ª ×”×¢×‘×¨×ª × ×ª×•× ×™× ×©×œ ×¢×“ â€Ž5 Gbps.

Is the special character just before number 5 is creating the direction issue?

Thanks & regards

Rahul

Toshihiko Makita

unread,

May 22, 2017, 6:55:48 PM5/22/17

to DITA-OT Users

Hi Rahul,

I'm not sure that thee actual code is. But when I inserted U+202B (RLE) befor "5", I got the same result as yours:

Hope this helps!

Toshihiko Makita

unread,

May 23, 2017, 12:51:43 AM5/23/17

to DITA-OT Users

Hi Rahul,

The Unicode BIDI control character is invisible unless expressed as character reference (&#xYYYY).

So it will be better to use more professional editor. Following is the snapshot from oXygen.

Regards,

Rahul H

unread,

May 25, 2017, 4:17:09 AM5/25/17

to DITA-OT Users

Hi Toshihiko,

Thank you for giving me the detailed explanation. I can see the character before 5 in character map and it is Left-to-rigt control mark (200E).

as per your previous suggestions which is the right control character should translator need to add here

U+202A LEFT-TO-RIGHT EMBEDDING (LRE)

U+202B RIGHT-TO-LEFT EMBEDDING (RLE)

U+202C POP DIRECTIONAL FORMATTING (PDF)

U+202D LEFT-TO-RIGHT OVERRIDE (LRO)

U+202E RIGHT-TO-LEFT OVERRIDE (RLE)

below is similar issue but control character is not available in this codeblock and right angle '>' is misplaced.

שורת הפקודה:

<codeblock cid="vVDZ1">network-manager.nmcli con add type <type> ifname <ifname> con-name <connection-name> apn <apn></codeblock>

Result should be : apn <apn>, last right angle bracket is appearing before

Auto Generated Inline Image 1

Auto Generated Inline Image 2

Toshihiko Makita

unread,

May 26, 2017, 7:22:29 AM5/26/17

to DITA-OT Users

Hi Rahul,

I could reproduce your problem in my operating environment again. But you can correct this by inserting U+202A and U+202C before and after the "<apn>".

Hope this helps your publishing.

he-3-edit.dita

bidi-test-he-3.pdf

Toshihiko Makita

unread,

May 26, 2017, 7:52:37 AM5/26/17

to DITA-OT Users

For your reference, this is not the bug of Formatter. You can confirm the same phenomenon in HTML. and browser.

test_hebrew-3.html

Rahul H

unread,

May 26, 2017, 1:34:02 PM5/26/17

to DITA-OT Users

Hi,

Thank you for troubleshooting the issue. I guess during translation algorithm should take care this control character.