Question on Ligatures

6 views
Skip to first unread message

Knigaman

unread,
Dec 9, 2009, 3:50:48 AM12/9/09
to Persian Computing
I was just re-reading Mr. Esfahbod's article, and noticed that many
fonts automatically display ligatures. Not knowing Arabic, I don't yet
know which letters tend to combine as a ligature.

Can someone share with me the "unwanted" list of ligatures that the
Windows fonts automatically render?

As for Lam-Alef -- when the rendering takes place, is it a simple
matter of one letter in the font "disappearing" and the other taking
the shape of the ligature? Or do both characters change shape and butt
up against each other to make it appear as if there is one character?

What I am trying (clumsily) to ask, is this: does the underlying text
still retain the "la" combination (for searching, analyzing, etc)?

I am a software engineer, but I have yet to explore the depths of the
internals of fonts!

John Hudson

unread,
Dec 9, 2009, 3:47:47 PM12/9/09
to Persian Computing
Knigaman wrote:

> I was just re-reading Mr. Esfahbod's article, and noticed that many
> fonts automatically display ligatures. Not knowing Arabic, I don't yet
> know which letters tend to combine as a ligature.

A ligature is particular technical solution to the display of multiple
characters as a single glyph, whether that glyph is a little piece of
metal, a cell in a phototype matrix, or a GID in an OpenType font. As
such, it should not be confused -- as it too often is -- with an
analysis of the Arabic writing system and its various forms and
adaptations to other languages. It is perfectly possible -- as Tom
Milo's ACE/Tasmeem technology and some other fonts demonstrate -- to
correctly display Arabic text without using any ligatures at all.

Arabic letters have joining behaviours. These behaviours are of two
kinds: letter joining behaviours, and shape joining behaviours. The
letter joining behaviours -- left+right joining or right joining --
determine which letters in a word will form connected letter groups. The
shape joining behaviours determine what that letter group may look
like. The letter joining behaviours are (largely) standardised features
of the writing system. The shape joining behaviours are particular to
individual styles of writing or typography, and in the case of the name
Allah, to individual words. Some fonts handle shape joining behaviour
using ligatures; some do not. Generally speaking, those fonts that do
not use ligatures are capable of much more flexible and stylistically
correct display of Arabic text.


John Hudson

--

Tiro Typeworks www.tiro.com
Gulf Islands, BC ti...@tiro.com

Car le chant bien plus que l'association d'un texte
et d'une m�lodie, est d'abord un acte dans lequel
le son devient l'expression d'une m�moire, m�moire
d'un corps immerg� dans le mouvement d'un geste
ancestral. - Marcel P�r�s

Knigaman

unread,
Dec 9, 2009, 5:58:04 PM12/9/09
to Persian Computing
Thank you for the detailed response.

> A ligature is particular technical solution to the display of multiple
> characters as a single glyph,

So, my next question is, in the case of lam-alef, let's say I'm using
a font that automatically "visually" replaces both glyphs with what
appears to be a single glyph. I assume that underlyingly there are
still 2 code points in the text. Does one code point hide itself in
order to facilitate the correct glyph to appear?

As a linguist (even though I'm learning Persian primarily because I
want to, not for any linguistic purpose) -- I care mostly about that
the correct code points are present in the text. For instance, if some
presentation form were present, it would radically complicate
searching and analysis.

And I really would like to know the answer to one of my original
questions, that is, which letters tend to morph into ligatures, that
are not desirable in rendering Persian text?

John Hudson

unread,
Dec 9, 2009, 8:04:22 PM12/9/09
to Persian Computing
Knigaman wrote:

> So, my next question is, in the case of lam-alef, let's say I'm using
> a font that automatically "visually" replaces both glyphs with what
> appears to be a single glyph. I assume that underlyingly there are
> still 2 code points in the text. Does one code point hide itself in
> order to facilitate the correct glyph to appear?

There are several different mechanisms that have been used over the
years to handle the display of Arabic script on computers, including
mechanisms that have performed character level substitutions of
'presentation forms' incl. ligatures. The common mechanism today,
though, makes a distinction between character processing and glyph
processing, and glyph operations such as ligature substitution happen at
a level above text encoding.

Taking the lam+alif as an example, and presuming an OpenType format font
and layout model using ligatures:

These two characters are encoded in text using the Unicode character
codes U+0644 and U+0627. These character codes are stored in the
'backing string' of text and are not changed during layout and display.

The application queries the cmap table of the font and finds entries
that map these two Unicode characters to glyph IDs in the font, e.g. GID
45 and GID 12 (these can be any number, since there is no externally
defined ordering for glyphs in a font). The GIDs are then passed to the
layout engine for display processing.

The layout engine recognises that these two Unicode characters are
Arabic letters, so applies standard Arabic text shaping to them. Text
shaping is based on what I referred to earlier as the letter joining
behaviours of the characters. Presuming this sequence of lam+alif is
occuring in isolation, i.e. not preceded by other left-joining letters,
the layout engine identifies the lam as being in an initial
(left-joining) position and the alif and being in a final
(right-joining) position. The layout engine applies the <init> and
<fina> OpenType Layout features accordingly, and these map the GIDs 45
and 12 to new GIDs for the appropriately shaped glyphs, e.g. GIDs 46 and 13.

The layout engine now performs secondary shaping features to the new
glyph string, including the <rlig> 'Required Ligatures' feature. [Of
course, as I explained in my previous message, no ligatures per se are
actually required for Arabic script, only certain shape joining
behaviour, which may or may not be handled using actual ligatures.] In
this example, there is a ligature glyph in the font for lam+alif, and a
ligature lookup that maps the two GIDs 46+13 to a single GID, e.g. 78 or
whatever. This glyph is displayed.

> As a linguist (even though I'm learning Persian primarily because I
> want to, not for any linguistic purpose) -- I care mostly about that
> the correct code points are present in the text. For instance, if some
> presentation form were present, it would radically complicate
> searching and analysis.

Yes. Presentation form codepoints should be avoided. They are not
necessary, and cause more problems than they ever solved.

> And I really would like to know the answer to one of my original
> questions, that is, which letters tend to morph into ligatures, that
> are not desirable in rendering Persian text?

My guess is that the 'Allah' word form ligature is something that one
wouldn't want to occur automatically in most languages that use the
Arabic script. In most of the Arabic OT fonts I have worked on, this is
treated as a discretionary ligature, one that needs to be actively
turned on by the user (presuming he or she is using an application that
enables this at all), and not something that happens automatically.

JH

Behnam

unread,
Dec 9, 2009, 8:19:55 PM12/9/09
to Knigaman, Persian Computing
Loren,
I'm not a linguist nor programer but I think you misunderstood the
concept of ligatures in Arabic script. Actually 'Arabic ligatures' is
a misconception. It is a poor substitute for natural behavior of the
script. A behavior that can not be re-produced with current text
rendering technology. Among these poor substitutes, the only one that
the arabic text usually can not do without, is the special formation
of the combination 'lam+alef".

> So, my next question is, in the case of lam-alef, let's say I'm using
> a font that automatically "visually" replaces both glyphs with what
> appears to be a single glyph. I assume that underlyingly there are
> still 2 code points in the text. Does one code point hide itself in
> order to facilitate the correct glyph to appear?
The encoded text always remains with two individual codes, one for
Lam and another one for Alef. The visual appearance of this
combination has no bearing in what is encoded in the text. Different
fonts use different solutions to reach the final visual result. In
fact, the so called ligature of 'Lam+Alef' in the fonts that I made
(X Series 2) remain a composition of two separate glyphs, one for
each letter but 'visually' they produce the same shaping that is
generally called 'ligature'.
>
> As a linguist (even though I'm learning Persian primarily because I
> want to, not for any linguistic purpose) -- I care mostly about that
> the correct code points are present in the text. For instance, if some
> presentation form were present, it would radically complicate
> searching and analysis.
Presentation forms encoded in Unicode are never (almost never) used
in encoding the Persian or Arabic text. Those forms are now produced
by the font itself, and not by direct encoding of the presentation
forms. So you should ignore Unicode plain for Arabic presentation
forms A and B altogether. There is some exceptions which are not
related to your concern about ligatures in the text. For your
concern, ignore the presentation forms totally.
>
> And I really would like to know the answer to one of my original
> questions, that is, which letters tend to morph into ligatures, that
> are not desirable in rendering Persian text?
I think the above explanation answered this question as well. It is
essentially a non issue.

Cheers,
Behnam

Knigaman

unread,
Dec 9, 2009, 8:21:30 PM12/9/09
to Persian Computing
Once again, thank you for the wealth of information.

Loren sZendre

Connie Bobroff

unread,
Dec 9, 2009, 8:27:29 PM12/9/09
to John Hudson, Persian Computing
On Wed, Dec 9, 2009 at 7:04 PM, John Hudson <jo...@tiro.ca> wrote:

 In most of the Arabic OT fonts I have worked on, this is
treated as a discretionary ligature, one that needs to be actively
turned on by the user (presuming he or she is using an application that
enables this at all), and not something that happens automatically.
I never figured out how to turn on or off ligatures in MS Word. What is the secret?
-Connie

Behnam

unread,
Dec 9, 2009, 9:41:42 PM12/9/09
to Connie Bobroff, Persian Computing
On 9-Dec-09, at 8:27 PM, Connie Bobroff wrote:

I never figured out how to turn on or off ligatures in MS Word. What is the secret?

From what I understand, you can't. Word is part of MS OFFICE suite, which is conceived for office use, with Roman script in mind.
In Roman script, the OT features that are basic for Persian and Arabic are considered sophisticated typographic features for Roman script, not suitable for a concept of office needs. So those typographic features are not supported in MS Word.
Roman fonts having those features are mainly used for publication purpose, using programs designed as such, like InDesign. So if you want to have access and control of such features, you should look at those programs. In Word, you won't have them.
-b

Knigaman

unread,
Dec 9, 2009, 10:23:40 PM12/9/09
to Persian Computing
I just downloaded about a dozen different fonts, and it appears that
they all create the ligature for lam-alef. There were 2 fonts that
appeared to make more of a "U" shape than the others. So is desired
Persian appearance like a "U"?

> I never figured out how to turn on or off ligatures in MS Word. What is the
> secret?
> -Connie

Then my next question: If even the fonts that are Persian-centric
create the lam-alef ligature -- then what hope do we have of achieving
the correct effect on the computer?

I guess this naturally leads (at least for me) -- to the conclusion
that lam and alef (or at least one of them) need Persian specific code
points in the Unicode standard. Anytime I hear "just set your locale
properly, and it will work" -- harks back to 8-bit character sets
where the encoding information is critical. The "Unicode" way
guarantees that if you use the correct code point, you never have to
worry about locales.

Loren sZendre

John Hudson

unread,
Dec 9, 2009, 11:22:27 PM12/9/09
to Persian Computing
Connie Bobroff wrote:

> I never figured out how to turn on or off ligatures in MS Word. What is
> the secret?

It isn't possible in Word. The 'Required Ligatures' and 'Standard
Ligatures' are on by default and in Word cannot be turned off;
'Discretionary Ligatures' is off by default and in Word cannot be turned on.

In InDesign ME, one has control over the Standard and Discretionary
Ligature features.

JH

--

Tiro Typeworks www.tiro.com
Gulf Islands, BC ti...@tiro.com

Car le chant bien plus que l'association d'un texte
et d'une mélodie, est d'abord un acte dans lequel
le son devient l'expression d'une mémoire, mémoire
d'un corps immergé dans le mouvement d'un geste
ancestral. - Marcel Pérès

John Hudson

unread,
Dec 9, 2009, 11:24:10 PM12/9/09
to Persian Computing
Knigaman wrote:

> I just downloaded about a dozen different fonts, and it appears that
> they all create the ligature for lam-alef. There were 2 fonts that
> appeared to make more of a "U" shape than the others. So is desired
> Persian appearance like a "U"?

No. The typical lam+alef ligated form is normal for all language using
the Arabic script. The U-like connection is incorrect.

J.

John Hudson

unread,
Dec 9, 2009, 11:28:32 PM12/9/09
to Persian Computing
Behnam wrote:

> From what I understand, you can't. Word is part of MS OFFICE suite,
> which is conceived for office use, with Roman script in mind.
> In Roman script, the OT features that are basic for Persian and Arabic
> are considered sophisticated typographic features for Roman script, not
> suitable for a concept of office needs. So those typographic features
> are not supported in MS Word.

Latin script ligatures are, indeed, not supported in Word, but Arabic
ligatures are.

Arabic text is handled via the Uniscribe text engine. Word is a client
of Uniscribe, so uses whatever aspects of Arabic text shaping are
default in Uniscribe, including the Required and Standard Ligature
layout features.

Knigaman

unread,
Dec 10, 2009, 12:48:23 AM12/10/09
to Persian Computing
OK, I've figured it out. Please disregard my last question. I was
under the mistaken impression that all ligatures were undesirable --
but now I realize that the lam-alef is normal.

It's taking me awhile to appreciate the subtle nuances of the script.
I had Russian figured out in and comfortable in about 2 days. This is
taking longer. But it is an absolutely beautiful script.

Loren sZendre

Mohsen Saboorian

unread,
Dec 10, 2009, 1:38:09 AM12/10/09
to John Hudson, Persian Computing
FYI Microsoft added ligature support to MS Word 2010.

Check this out:

Mohsen

On Thu, Dec 10, 2009 at 7:58 AM, John Hudson <jo...@tiro.ca> wrote:
Behnam wrote:

>  From what I understand, you can't. Word is part of MS OFFICE suite,
> which is conceived for office use, with Roman script in mind.
> In Roman script, the OT features that are basic for Persian and Arabic
> are considered sophisticated typographic features for Roman script, not
> suitable for a concept of office needs. So those typographic features
> are not supported in MS Word.

Latin script ligatures are, indeed, not supported in Word, but Arabic
ligatures are.

Arabic text is handled via the Uniscribe text engine. Word is a client
of Uniscribe, so uses whatever aspects of Arabic text shaping are
default in Uniscribe, including the Required and Standard Ligature
layout features.

JH

--

Tiro Typeworks        www.tiro.com
Gulf Islands, BC      ti...@tiro.com

Car le chant bien plus que l'association d'un texte
et d'une mélodie, est d'abord un acte dans lequel

le son devient l'expression d'une mémoire, mémoire
d'un corps immergé dans le mouvement d'un geste
ancestral.  - Marcel Pérès

--

Reply all
Reply to author
Forward
0 new messages