Kashida Justification

705 views
Skip to first unread message

Ebrahim Byagowi

unread,
Nov 16, 2012, 3:02:35 PM11/16/12
to Persian Computing
Hi.

Both of Webkit(Chrome, Safari, ...) and Mozilla(Firefox) are unable to doing justification using Kashida. I found these bugs on WebKit (1, 2) and this (intentionally without implementation of kashida justification) on Mozilla bug tracker. This feature is supported in IE for a long time...

I simply requesting Persian-Computing attention to this :)

I think HarfBuzz currently is supporting Kashida justification (I think it is pointed on this header of HarfBuzz) so implementing this feature would not be a hard task, would be? I think some implementation advice from Persian-Computing experts will be so useful for developers of both browsers engines (and for us).

Thanks!

--
Ebrahim Byagowi

Behdad Esfahbod

unread,
Nov 16, 2012, 3:26:28 PM11/16/12
to Ebrahim Byagowi, Persian Computing
On 12-11-16 12:02 PM, Ebrahim Byagowi wrote:
> Hi.
>
> Both of Webkit(Chrome, Safari, ...) and Mozilla(Firefox) are unable to doing
> justification using Kashida
> <http://www.w3.org/TR/css3-text/#fig-text-justify-kashida>. I found these bugs
> on WebKit (1 <https://bugs.webkit.org/show_bug.cgi?id=6203>, 2
> <https://bugs.webkit.org/show_bug.cgi?id=99945>) and this
> <https://bugzilla.mozilla.org/show_bug.cgi?id=276079> (intentionally without
> implementation of kashida justification) on Mozilla bug tracker. This feature
> is supported in IE for a long time...
>
> I simply requesting Persian-Computing attention to this :)
>
> I think HarfBuzz currently is supporting Kashida justification (I think it is

Not really. The old HarfBuzz did, and Qt/KDE is the only system that used
that feature. The new HarfBuzz that I work on these days, and which is now
used in GNOME, Firefox, and Chrome Linux, doesn't do Kashida justification,
because we consider it a hack. We do want to implement justification
eventually, it just has not bee priority so far.

behdad


> pointed on this header
> <http://skia.googlecode.com/svn/trunk/third_party/harfbuzz/src/harfbuzz-shaper.h>
> of HarfBuzz) so implementing this feature would not be a hard task, would be?
> I think some implementation advice from Persian-Computing experts will be so
> useful for developers of both browsers engines (and for us).
>
> Thanks!
>
> --
> Ebrahim Byagowi
>
> --
> http://persian-computing.org/
> http://groups.google.com/group/persian-computing/

--
behdad
http://behdad.org/

Behnam Rassi

unread,
Nov 16, 2012, 5:31:22 PM11/16/12
to Ebrahim Byagowi, Persian Computing
In my opinion, Kashida should be considered a code. Not a character. A code that generates an alternative version of previous character, if available to the font. The font can use that code as a character if it chooses to do so. But the current understanding of Kashida as a character is typographically very very limiting.
The other issue however, is that in current implementation of justification there is no text run after inserting kashidas, which is useless for typographic implementation of Kashida in justification.
Behnam

Behdad Esfahbod

unread,
Nov 17, 2012, 5:23:44 PM11/17/12
to Behnam Rassi, Ebrahim Byagowi, Persian Computing
On 12-11-16 02:31 PM, Behnam Rassi wrote:
> In my opinion, Kashida should be considered a code. Not a character. A code
> that generates an alternative version of previous character, if available to
> the font.

That's fully possible right now.

> The font can use that code as a character if it chooses to do so.
> But the current understanding of Kashida as a character is typographically
> very very limiting.

This makes little sense. All formatting "codes" need to be encoded in Unicode
too, and they just happen to be called character. What you call something
can't be technically limiting.

> The other issue however, is that in current implementation of justification
> there is no text run after inserting kashidas, which is useless for
> typographic implementation of Kashida in justification.

In *what* system? I don't know any two systems that work the same way.

b

> Behnam
>
> On 2012-11-16, at 3:02 PM, Ebrahim Byagowi <ebr...@byagowi.com
> <mailto:ebr...@byagowi.com>> wrote:
>
>> Hi.
>>
>> Both of Webkit(Chrome, Safari, ...) and Mozilla(Firefox) are unable to doing
>> justification using Kashida
>> <https://bugzilla.mozilla.org/show_bug.cgi?id=276079> (intentionally without
>> implementation of kashida justification) on Mozilla bug tracker. This
>> feature is supported in IE for a long time...
>>
>> I simply requesting Persian-Computing attention to this :)
>>
>> I think HarfBuzz currently is supporting Kashida justification (I think it
>> is pointed on this header
>> <http://skia.googlecode.com/svn/trunk/third_party/harfbuzz/src/harfbuzz-shaper.h>
>> of HarfBuzz) so implementing this feature would not be a hard task, would
>> be? I think some implementation advice from Persian-Computing experts will
>> be so useful for developers of both browsers engines (and for us).
>>
>> Thanks!
>>
>> --
>> Ebrahim Byagowi
>>
>>
>> --
>> http://persian-computing.org/
>> http://groups.google.com/group/persian-computing/
>
> --
> http://persian-computing.org/
> http://groups.google.com/group/persian-computing/

--
behdad
http://behdad.org/

Behnam Rassi

unread,
Nov 17, 2012, 6:09:05 PM11/17/12
to Behdad Esfahbod, Ebrahim Byagowi, Persian Computing
In any system. This is what our writing system requires and this is what should be supplied.
Kashida is one of basic features of our writing system and it's not a character. It's a code. I don't argue about Unicode often lousy definitions. It can be used as code as it stands now. The difference is exactly your initial question. In what system? Any system that supports our writing system. And if Kashida was understood properly we didn't have to struggle to make it work properly in our writing system. Kashida is *not* a character. At least someone who speaks Persian should know that!
behnam

Behnam Esfahbod

unread,
Nov 17, 2012, 6:29:23 PM11/17/12
to Behnam Rassi, Behdad Esfahbod, Ebrahim Byagowi, Persian Computing
Behnam,

Please note that the definition of "character" is a little loose itself, and to some level it's not that different from "code". For example, we have many "Control Characters" in Unicode (actually in all characters-sets/code-pages) which by your definition are "code"s. If I understand correctly, what you mean is that Kashida (U+0640) should not be treated as a standalone glyph, but a *modifier* for the *letter* characters before/after it. Well, that's indeed how it's been categorized in Unicode: Modifier_Letter (a modifier letter).

I believe the main problem here rests in how this Unicode character is treated in the fonts. Looks like you and Behdad both agree that instead of a simple glyph for TATWEEL (U+0640), there should be additional glyphs in fonts for other letters in those cases that they appear before/after a Kashida. And I cannot agree more.

Although there always remain the cases for fix-width fonts (for console-like usages, like IDEs) and standalone appearances of the character. I think we don't have better options in these cases, and frankly, they are not that bad.

-Behnam Esfahbod






--
Behnam Esfahbod | بهنام اسفهبد
http://behnam.es/
GPG Fingerprint: 3E7F B4B6 6F4C A8AB 9BB9 7520 5701 CA40 259E 0F8B


Behnam Rassi

unread,
Nov 17, 2012, 7:07:42 PM11/17/12
to Behnam Esfahbod, Behdad Esfahbod, Ebrahim Byagowi, Persian Computing
Got your point Behnam

But 'Modifier letter' is still a letter which kashida is not. Don't get me wrong. On the font side, this is not a big deal. But once it is defined as a 'letter' it can confuse a lot of people who are not familiar with the language. There is no text run after inserting kashida for justification scheme because kashida is defined as a letter and the job is done after the insertion and it is understood that there is no need for a text run afterward. If they knew Kashida was just a modifier code, they would have thought about a text run after justification. You see where I'm getting at?
I believe there is no need for any change or modification to Unicode U+0640 definition and behaviour as long as there is an efficient way to convey this message that kashida is *not* a character, it is not a modifier letter. It's just a modifier code and there is an absolute need for a text run after justification scheme to let the modifications take place.
This is crucial for our writing system, in order to have a chance in typography in digital age.

(the other) Behnam!

Hooman Mehr

unread,
Nov 18, 2012, 6:55:40 PM11/18/12
to Behnam Rassi, Behnam Esfahbod, Behdad Esfahbod, Ebrahim Byagowi, Persian Computing
Hi Behnams (both of you) and other Guys,

It has been a long time since I participated in Persian Computing discussions. I am very busy and a lot is going on in my personal life right now, but this particular issue is very sensitive to me, so I had to speak up.

U+0640 Tatweel or Kashida is a special legacy character and is not directly related to proper full justification of the languages written with Arabic Script. It is a relic from typewriter days. Some people still use it to manually stretch some words with it. Some others use it as ZWJ (U+200D). It is not the proper glyph to use as an extension bar for full justification of text in Arabic script in general (not just Persian). Full justification extension bars are not encoded in Unicode and the rules that govern how much and where they are applied is usually hardcoded into text layout engine with support for some tuning (e.g. JSTF table in OpenType) in the fonts. In theory, advanced control over justification is possible (advanced modes of JSTF and Apple Advanced Typography 'just' tables, or ductile glyphs in QuickDraw GX) but I have not seen a single font that puts them to good use for Persian or other languages that use Arabic script. 

Full justification of text with Roman alphabet is usually achieved mostly with adjusting spacing of words and rarely involves changing spacing of the characters within the words. On the other hand, increasing amount of space between words is not desirable for Arabic script, and does not work at all if it becomes excessive. The traditional manual calligraphic solution involves changing the writing style (shape) or width of glyphs themselves, with more stretch possible for some glyphs than the others, and there are a lot of other fine rules to keep the beauty and uniformity of the text. These rules are language and locale dependent, which means there are different schools of calligraphy with different preferences. Also, the alternate of compressing the words and reducing the width of text to fit the line is very common in some schools of calligraphy, as is curving up the baseline towards the end of the line with words being stacked on top of each other at the end of the line.

In the old days of letterpress typesetting, we used to have multiple versions of some characters, such as kaf (which can be a big space absorber and it even had an arbitrarily stretchable version especially in Arabic, less so in Persian). Then things moved towards simplification as we approached typewriter days and all those stretched glyphs disappeared and just the Keshida or Tatweel bars practically remained and carried over to digital typography. Letterpress typography also used either a very long extension bar that would be cut to size, or multiple discrete tatweel blocks with different widths and a typesetter (the person) would put multiple tatweels of different sizes to properly and nicely justify the text, keeping an eye on the adjacent lines, sometimes moving boundary words up or down a line from a well stretchable line into the adjacent less stretchable lines to keep everything nice.

What I am trying to say is that proper full justification of text in Arabic script is very complex and sometimes involves more art than algorithm and science, and is affected by personal or regional taste. It is not properly handled by any of the normal text engines in widespread use today. Even the knowledge of what is missing and what is wrong with Arabic script full justification seems to be diminishing, which is very sad. Having the very basic and dumb full justification of distributing a number of extension bar glyphs (call them kashida, tatweel or whatever) based on simple weighed joint priorities is not the ideal solution, but is far better than increasing word spacing or having flush-right text.

I don't know how to raise awareness of the importance of the issue of proper Arabic script full justification with the players that matter (Say Google, Microsoft, Apple, IBM and others). Properly full justified text is the expected normal way to set a paragraph of Arabic text, not flush right or word-spacing-based full justification. Somebody with influence in these companies please help!

Hooman Mehr

John Hudson

unread,
Nov 19, 2012, 6:45:46 PM11/19/12
to Persian Computing
On 18/11/12 3:55 PM, Hooman Mehr wrote:

> What I am trying to say is that proper full justification of text in
> Arabic script is very complex and sometimes involves more art than
> algorithm and science, and is affected by personal or regional taste.

And by the nature of the text. The use of kashida in poetry, for
example, differs in manuscripts from its use in prose. [It needs to be
said, however, that in general the use of kashida for justification in
manuscripts is *much* less common than most typography would lead one to
think: in typography it has tended, especially on computers, to be
treated as the primary means of text justification, whereas a scribe has
a whole range of variations at his disposal that render the use of
kashida relatively infrequent.]

It should also be noted that the rules for kashida insertion are to an
extent style-specific. While there are some rules that are common to all
styles, there are some differences between e.g. naskh and nastaliq, and
these are poorly documented (the rules for less common styles even more so).

There are also contextual implications of kashida for the shape of some
following letters, e.g. mim, and this -- along with other aspects of
kashida shaping such as elongated variant glyphs, curved kashidas with
<curs> attachment positioning -- require at least some OpenType or other
smart font layout to be applied *after* kashida justification has taken
place. At present, to my knowledge, no layout engine does this, which
means that unless a font has a flat baseline stroke with a simple flat
kashida glyph, such justification will always break the text.

JH


--

Tiro Typeworks www.tiro.com
Gulf Islands, BC ti...@tiro.com

The criminologist's definition of 'public order
crimes' comes perilously close to the historian's
description of 'working-class leisure-time activity.'
- Sidney Harring, _Policing a Class Society_

Behnam Esfahbod

unread,
Nov 21, 2012, 4:34:09 PM11/21/12
to John Hudson, Persian Computing
Okay, good points are brought up here about the eligibility of Kashida as a character (U+0640 ARABIC TATWEEL)  nd/or for justification in Perso-Arabic script. Here's my view of the situation.

1. Shall we treat U+0640 as a deprecated character?
No! There is a good amount of text in modern Persian (specially contemporary poetry) in which Kashida is used *semantically*, and not to justify the text. Removing this character from those text will damage the content.

2. Can Kashida be used in Nasta'liq style?
No! Nasta'liq has it's own techniques for elongation/justification.

3. Can Kashida be used in Naskh style?
Maybe. Some fonts/platforms are handle Kashida character in this style, and some are even able to use Kashida glyphs for justification. Example: https://www.tug.org/TUGboat/tb27-2/tb87benatia.pdf

4. Can Kashida be used in Typewriter style for justification?
Yes. It has always been used and is the only way to do justification in this style. And many platforms has been supporting this feature since 1990s.

This (4) is the most-request feature, when people ask for justification in applications, like web browsers, and it's no surprise, as (almost) all fonts for such applications are in Typewriter style, not true Naskht or Nasta'liq.

It's unfortunate, but to the same level very interesting, that in contemporary Persian literature we have content that has no obvious presentation in Nasta'liq (based on points 1 and 2). But it shows us that how important it is to preserve all these styles!

Also, unfortunately again, some platforms/applications confuse the Naskh style with Typewriter style (Exampmle: https://www.adobe.com/content/dam/Adobe/en/devnet/indesign/cs5_docs/indesign_server/ids_me_scriptingguide.pdf). It's true that Naskh is the origin of Arabic Typewriter style, but they are fairly different now.

-Behnam Esfahbod



John Hudson

unread,
Nov 21, 2012, 5:04:06 PM11/21/12
to Persian Computing
On 21/11/12 1:34 PM, Behnam Esfahbod wrote:

> 1. Shall we treat U+0640 as a deprecated character?
> No! There is a good amount of text in modern Persian (specially
> contemporary poetry) in which Kashida is used *semantically*, and not to
> justify the text. Removing this character from those text will damage
> the content.

In any case, unless Unicode formally deprecates a character, there is no
point in trying to treat it as deprecated, since it may legitimately
occur in any text. Also, in the absence of better agorithms for
automated justification, some users may find this character to be the
only way to affect manual kashida justification in some software.

> 2. Can Kashida be used in Nasta'liq style?
> No! Nasta'liq has it's own techniques for elongation/justification.

I think the answer to this and related questions depends on how you
define 'kashida'. I think it makes sense to consider kashida, in a
technical text processing context, as a request for elongation that is
independent of the method of elongation. So in this sense it is agnostic
as to script style, and only becomes relevant at the point of
implementation, at which point the right questions are a) how is
elongation handled in this style? b) what are the rules for where
elongation may be applied in this style? and c) does the current font
and layout technology support this elongation?

If, on the other hand, you define kashida as a piece of type -- whether
metal or digital -- that you stick between other pieces of type to
elongate their connection, then you are limited to the ways in which you
can ask any questions, and your answers have less to do with script
styles than with technologies and their limitations. In my approach, the
technologies and their limitations are the last barrier, not the first.

J.

richar...@gmail.com

unread,
May 27, 2014, 10:59:29 AM5/27/14
to persian-...@googlegroups.com
Fwiw, I just published a blog post (http://rishida.net/blog/?p=1059) that tries at a very high level to clarify for the layman some of the difficulties and questions associated with the use of baseline extensions for *justification* of Web based text.  Here are a couple of points that I think may help the conversation.

1. I find it useful to say 'tatweel' only when referring to manual extensions using U+0640 or its equivalent in type, and use the word 'kashida' to refer to elongation without any additional characters.

2. I think it's important to be clear, when talking about elongation, what font style one is talking about - the rules and context is different for naskh vs nastaliq (and very different for ruqah, which has no elongation).  In particular, I think that the likelihood of adjustment to space is quite different from Nastaliq to Naskh.

3. Many of the techniques I've come across for justification so far have related to a world of fixed line lengths and, for the Koran, page lengths, and the techniques used to address justification in that world are somewhat different from the Web world where a user can change the length of lines by changing the window size, or viewing on a different device, or the user may even change the font for accessibility reasons or because they don't have many fonts on their mobile device, etc. In this latter world, tatweel character usage is problematic.

4. Justification of running text in paragraphs is just one context in which word elongation occurs. I've seen a lot of justification on signs, for example, that attempts to make the arabic text the same length as the english translation below it. I assume that automatic line wrapping is out of the question here, and that therefore a tatweel-based approach may be more acceptable.  Other times, elongation is used for visual effect, and again i suspect that this calls for a manual, tatweel-based approach or for a kashida font elongation approach where automatic layout rules are not used.

Saleh Souzanchi

unread,
May 27, 2014, 3:37:05 PM5/27/14
to persian-...@googlegroups.com, richar...@gmail.com
Reply all
Reply to author
Forward
0 new messages