correctly displaying Arabic words

1,272 views
Skip to first unread message

Sami

unread,
Oct 21, 2014, 4:38:19 AM10/21/14
to psychop...@googlegroups.com


Dear all-

This is an issue that was previously raised in two occasions, one relating to Hebrew and the other to Arabic. 
I saw both threads. 

The suggestion made for Hebrew seems to work, at least with single words, but the suggestion made for Arabic was not optimal.
In essence the suggestion was to use images. I tried this solution but it proved to be error prone and labour intensive as I am running many experiments with an average of 300 words per experiment.

I am wondering if there is now another recommended approach, and whether PsychoPy has now been harnessed to handle the less common writing systems.
thanks
sami

Michael MacAskill

unread,
Oct 22, 2014, 6:38:07 PM10/22/14
to psychop...@googlegroups.com, Ali Yoonessi
Dear Sami,

This issue recently came up at the PsychoPy workshop in Tehran, where it seems that the text boxes in Builder image components work correctly, in accepting and displaying typed Arabic or Farsi text, but when rendered to the screen, that same text appeared left-to-right and with the characters in isolated rather than linked form.

What might be useful to you is this online service by Abdullah Diab which "reshapes" Arabic text into a specific Unicode sequence which can get around this problem:
<http://pydj.igeex.biz/arabic-reshaper/>

Using it I produced the attached screenshot. In one text stimulus component I pasted in the raw text "Raw: اللغة العربية رائعة" and in a second text stimulus I used this "reshaped" text: "Reshaped: ﺔﻌﺋاﺭ ﺔﻴﺑﺮﻌﻟا ﺔﻐﻠﻟا". The raw stuff looks OK in the text box but doesn't render on-screen correctly, but the opposite was true for the reshaped text.

I would suggest that if you are wanting to show 300 words, then paste the whole lot into his online translator, and then paste the output into a column in a conditions file which you can use for a PsychoPy loop. This means you wouldn't need to muck around with creating images. Could you try this and let us know how you you get on?

Of note is that Abdullah Diab's work is Python-based and is available to download from Github, so his functions could be used to reshape text on the fly if needed. See minimal code example here:
<http://mpcabd.igeex.biz/python-arabic-text-reshaper/>

This work-around is Arabic-specific. It will probably do a reasonable job with Farsi but it would need to be tested to see how it deals with the extra characters there. But it is open-source code, so could be altered/extended if required?

Regards,

Michael
Screen Shot 2014-10-23 at 10.52.08 a.m..PNG

Jonathan Peirce

unread,
Oct 27, 2014, 8:20:03 AM10/27/14
to psychop...@googlegroups.com
I'm afraid I still don't understand the issue.

PsychoPy merely puts the text on the screen. It doesn't care whether the person then reads it right-to-left or left-to-right. What is it that you're expecting to be done differently by PsychoPy when reading direction is reversed?

Jon
--
You received this message because you are subscribed to the Google Groups "psychopy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to psychopy-user...@googlegroups.com.
To post to this group, send email to psychop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/psychopy-users/f403889f-92e1-46cd-baa7-9299035e62b5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
Jon Peirce
http://www.peirce.org.uk

ShoinExp

unread,
Oct 27, 2014, 8:32:48 AM10/27/14
to psychop...@googlegroups.com
Hi,
   In Arabic, the way that the characters link up is different depending on surrounding characters - so linking left to right looks very different from linking right to left. Psychopy does the linking in the wrong direction so even if you get it to display the characters in the correct order, they aren't connected in the right way.  I get my (Arabic) students to use Psychopy in the classroom and they usually just take screenshots of the words written in a text editor and use them as image events in Psychopy.  For Sami's needs, though, that would not be a useful solution.  Mike's suggestion sounds like the way to go.
     Mark

Michael MacAskill

unread,
Oct 27, 2014, 4:43:06 PM10/27/14
to psychop...@googlegroups.com
On 28/10/2014, at 1:19 a.m., Jonathan Peirce <jon.p...@gmail.com> wrote:
> PsychoPy merely puts the text on the screen. It doesn't care whether the person then reads it right-to-left or left-to-right. What is it that you're expecting to be done differently by PsychoPy when reading direction is reversed?

Hi Jon,

It took me a while to understand this issue, I didn't get it from the previous e-mails either until seeing it actually happen. Following on from Mark, the issue is that PsychoPy *doesn't* merely put the text on the screen. It (or more likely, some underlying libraries…) actually modifies what is entered in the dialog box in a text component, so that what is shown on screen does not match what was entered. Perhaps it isn't right to characterise this as an active modification, I think it is just what all non-fully Unicode compliant systems do when faced with right-to-left text systems.

In the case of Arabic/Farsi, there are two issues:

(1) The order of characters is reversed, so the first character (which should be right-most) becomes left-most. The equivalent situation in English would be that you entered this in the dialog box for a text component:

HELLO

but this was displayed on-screen:

OLLEH

(2) In Arabic, the letters are like cursive English script (but much more so), in that the shape of each character alters depending on its neighbours, as they sort of link and flow into each other, and I think characters can have different forms depending on their position in a word. What PsychoPy is currently doing is breaking up the characters so that they are all isolated, which gives them different shapes than intended. This is hard to represent in ASCII English, but would be like entering this:

Hello

and actually displaying this:

h ℇ L L 0

i.e. the character shapes have changed due to being isolated rather than taking their neighbours into account.

An example is in the attached screenshot. What is wanted to be displayed is the text in white. It reads right-to-left and the characters flow into each other. This looks exactly like what appears in the text component dialog box: when using an Arabic/Farsi text input setting, the characters appear in the field right-to-left when typed, and previous characters dynamically change shape as successive characters are typed.

But when actually displayed on screen, we get the output in red: e.g. the first four red characters that look like "IJJℇ" are the isolated forms of the four cursively-linked characters reading from the right in the white text (and yes, this is the same font).

A good place to start to get a handle on this is the blog post by Abdullah Diab on his "Python Arabic Text Reshaper", <http://mpcabd.igeex.biz/python-arabic-text-reshaper/>, which is what I've been recommending people use as a workaround.

We could incorporate his reshaper library into PsychoPy, but I'd be reluctant to do that, as it would be a single language-specific workaround that wouldn't solve the general problem (for Hebrew and so on, and possibly Japanese and other East Asian languages?)

I'm guessing that the problem lies in underlying libraries like Pyglet that aren't fully Unicode compliant (whereas whatever handles the text input fields in Builder components works fine). i.e. Perhaps once Pyglet gets that issue updated, our problems disappear, and we automatically get a WYSIWYG correspondence between the Builder input and the drawn output?

The example text I used from that blog post is this (saying "Arabic is wonderful"):
اللغة العربية رائعة

It can simply be pasted into a Builder text component dialog. This was used to produce the attached screenshot. The "raw" red text is what PsychoPy will produce. The white text is what happens if instead you paste in that text after it has been "reshaped" by his algorithm (he has a simple online service to do that although it can also be done in Python). This "re-shaped" text looks just like what was entered in the dialog box.

For what its worth, this problem seems widespread in other Python libraries, e.g. I wrote this quick script to try the same thing using PIL, without any PsychoPy libraries, and it gives the same result:

#!/usr/bin/env python2
# -*- coding: utf-8 -*-

from PIL import Image
from PIL import ImageFont, ImageDraw
image = Image.new("RGB",[320,320])
draw = ImageDraw.Draw(image)
a = u'Raw: اللغة العربية رائعة'
b = u'Reshaped: ﺔﻌﺋاﺭ ﺔﻴﺑﺮﻌﻟا ﺔﻐﻠﻟا'
font = ImageFont.truetype("/Library/Fonts/Arial Unicode.ttf",14)
draw.text((50, 50), a, font=font)
draw.text((50, 150), b, font=font)

image.save("a.png")


Cheers,

Mike



Screen Shot 2014-10-23 at 10.52.08 a.m..PNG

Jonathan Peirce

unread,
Oct 28, 2014, 6:18:44 AM10/28/14
to psychop...@googlegroups.com
Thanks Mark and Mike. That makes more sense now.

Actually Mike, I think the solution really is for us to make the change for this to occur. We'll be waiting too long for pyglet to change the way it renders text and that wouldn't solve the problem for Sol's new TextBox (faster rendering but so far only supporting monospace fonts).

Ultimately a similar solution may have to be found for other languages too, but we can add those one at a time as needed. I guess it's out choice whether we make that a part of pyglet (upstream) or whether we do it within PsychoPy but I don't think anyone else will do it for us!

Within PsychoPy it would work something like this:
    - we add a method to TextStim called reshapeFunc() that provides a reshaped copy
    - by default reshapeFunc() simply returns a copy of the original string
    - it can be substituted (at runtime) for any custom reshape function
    - we can provide a function _reshapeArabic() that does this for arabic
    - in __init__ we could also do something to try and detect that the arabic reshaper should be used

best wishes
Jon
--
You received this message because you are subscribed to the Google Groups "psychopy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to psychopy-user...@googlegroups.com.
To post to this group, send email to psychop...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Sami

unread,
Oct 29, 2014, 1:16:47 AM10/29/14
to psychop...@googlegroups.com
Thank you ever so much Mark, Mike and Jon for your thoughts on this. I will be looking forward to future updates on this.
best
sami


-----------------------------------------------------------

Sol Simpson

unread,
Oct 29, 2014, 9:55:43 AM10/29/14
to psychop...@googlegroups.com
For interest sake, attached is what TextBox draws for the test arabic text being used. So TextBox makes the same mistakes as TextStim, plus since it is monospace the letters do not 'flow'.

Using one of the textbox stim demo's,  I set font_name kwarg in the textbox stim to be "'Simplified Arabic Fixed'" so that the appropriate unicode char points were even available in the TTF. 

textbox = visual.TextBox(window=window,
                         font_name='Simplified Arabic Fixed',
                         text=sometext,
                         .......


Note that TextBox only uses pyglet for opengl API access, nothing more. So changing the TextBox rendering code to work correctly would not require pyglet code changes and /maybe/ more 'accessible'. Easier said than done though. I only mention it because if someone was going to spend significant time on the issue, TextBox would be the place to start, not TextStim, since textbox is much much faster at rendering the text (even if the text changes every frame), and has no dependency on pyglets Label class.

Jon's suggestion is an excellent short term solution, perhaps it should be added to TextBox stim as well?

Thank you.
textbox_arabic_test.png

Sol Simpson

unread,
Oct 29, 2014, 10:03:09 AM10/29/14
to psychop...@googlegroups.com
Also, here is the textbox output without the textgrid, using black chars and white background....
bw_textbox_arabic_test.png
Reply all
Reply to author
Forward
0 new messages