CSS3 Lists : list-style: persian

682 views
Skip to first unread message

hAmid reZa

unread,
Aug 1, 2010, 12:43:00 PM8/1/10
to Persian Computing
Hi,
As you may know, we can control appearance of ordered lists (ol tag)
in HTML using list-style property. The problem, when trying to use
this tag, in Persian web sites is that you cannot style it to show
Persian numbers instead of Latin ones (using CSS2). There are some
JavaScript hacks which might do the job, but these methods are not
standard and reliable. Take a look at the one I've used earlier to
create a fully localized ordered list:
http://www.gozir.com/1388/10/15/persian-ol-tags/
As I mentioned in the above blog post, CSS3 is about to provide a
Persian numeric list type which can solve this problem. Search for
Persian in this page:
http://www.w3.org/TR/css3-lists/
As I checked with Firefox 3.6, IE8 and Opera 10.6, none of these
browsers support this property yet.
I know that there are people in this group that are connected somehow
to Firefox development team. So I think it might be useful to know
when we can have Persian ordered lists using CSS 3 in Firefox. In
addition to that, I guess CSS 3 specification is not still complete
for what a fully localized Persian web site needs for ordered lists.
How about an ordered list with Persian alphabet as list symbols? How
about a list with Persian alphabet in ABJAD order (which you may know
that uses combination of symbols for displaying 2+ digit numbers, for
example ی is 10, یا is 11, یب is 12 and ...). How can we request for
addition of these list types to CSS standard specifications?

Hamed Nemati

unread,
Aug 2, 2010, 3:54:53 AM8/2/10
to Persian Computing
As I know Firefox supports it. Take look at:

https://developer.mozilla.org/en/css/list-style-type

and search for -moz-persian.

hAmid reZa

unread,
Aug 2, 2010, 8:43:22 AM8/2/10
to Persian Computing
Thanks for the link,
It was good to know Firefox is on its way to support it (As I know -
moz-[property value] indicates that it is a Firefox specific property
value and it is not supported as part of CSS3 standard supported by
browser, Am I right on this?).

One of group members mentioned that ابجد ordering/numbering system is
not a Persian system and it is Arabic. I don't know, it was only a
suggestion, and I thought having some more options does not hurt
anybody, even when these options are not specifically created by/for
Persians. In a case when we need nested ordered lists I thought having
more localized numbering/ordering systems becomes necessary. However I
have seen this ordering system (ابجد) in old printed books and I think
some people may prefer to use such a system when they are trying to
move similar content to web.

Hamed Nemati

unread,
Aug 2, 2010, 9:35:28 AM8/2/10
to Persian Computing
"Some of these properties have been proposed for inclusion in the CSS
specification, though the standard property may be different from the -
moz- implementation." - MDC - https://developer.mozilla.org/en/CSS_Reference/Mozilla_Extensions
Message has been deleted

Hossein Noorikhah

unread,
Aug 2, 2010, 6:06:24 AM8/2/10
to Persian Computing
Hi,
But changing numerals according to context, does no seem to be implemented.
Numbered lists with custom characters is only a fraction of the whole problem.

Hossein Noorikhah

unread,
Aug 2, 2010, 2:30:20 AM8/2/10
to Persian Computing
Hi,
I agree with you that this is the problem with the browser.
A good solution to this problem is that Firefox uses Hindi or Arabic numerals according to the context of the text. This is what IE, MS Word have been doing for a long time, and lately OpenOffice.org has implemented this requested feature.
Maybe filling a bug report could help. We did the same thing in OpenOffice.org and they fixed it.

hAmid reZa

unread,
Aug 2, 2010, 10:09:35 PM8/2/10
to Persian Computing
Hi,
I think the problem you are now considering is somehow different from
what I've explained here. The one I've stated it, is only about
styling ordered lists in HTML using CSS, which there exists a
predefined standard for it. My concern here is to make a push to speed
up its implementation across different browsers.

Showing Persian numerals according to context -which might solve lots
of problems for us- is not stated explicitly in any current/future web
standard as I guess. So what do you think? Do you think this feature
should be part of standard behavior of a browser? As I think, using
Persian digits glyphs -which I personally use in my all web content
publishing- produces lots of problems for search, find in page,
exporting tables to Excel and ..., so, I think there must be a
different/automatic rendering solution global to operating systems
regardless of functionality of applications (browser, desktop
publishing) for this problem.

On Aug 2, 10:30 am, Hossein Noorikhah <hossein...@gmail.com> wrote:
> Hi,
> I agree with you that this is the problem with the browser.
> A good solution to this problem is that Firefox uses Hindi or Arabic
> numerals according to the context of the text. This is what IE, MS Word have
> been doing for a long time, and lately OpenOffice.org has implemented this
> requested feature.
> Maybe filling a bug report could help. We did the same thing in
> OpenOffice.org and they fixed it.
>

Roozbeh Pournader

unread,
Aug 3, 2010, 12:48:30 AM8/3/10
to hAmid reZa, Persian Computing
On Mon, Aug 2, 2010 at 7:09 PM, hAmid reZa <mohamm...@gmail.com> wrote:
> using
> Persian digits glyphs -which I personally use in my all web content
> publishing- produces lots of problems for search, find in page,
> exporting tables to Excel and ..., so, I think there must be a
> different/automatic rendering solution global to operating systems
> regardless of functionality of applications (browser, desktop
> publishing) for this problem.

Not really. The actual solution is better support for non-European
numerals in searches and in Microsoft products like Excel. We need to
insist on using Persian digits as much as possible and ask the
software vendors for supporting them properly. The underlying
standards like Unicode and CLDR have all the necessary information.
The vendors just need to support the standards.

Roozbeh

Hossein Noorikhah

unread,
Aug 3, 2010, 4:23:39 AM8/3/10
to Persian Computing
Hi,
 
Showing Persian numerals according to context -which might solve lots
of problems for us- is not stated explicitly in any current/future web
standard as I guess. So what do you think?

I think directly using Hindi numerals (۰۱۲۳۴۵۶۸۹) is a non-standard hack, and is not a good solution, because those "Hindi Numerals" are just like "presentation form" used by programs to show numerals in a right manner.
As you may know, we can use presentation form of the numerals to write Persian text, where shaping algorithm is not supported. Instead of writing علی we can write:
ﻋ  ﻠ  ﻰ
which consists of presentation form of the characters.
But
this causes a lot of problems in programs which are able to process unicode text right now. We can push for the programs that can process this kind of text, but I think this contradicts with the philosophy of the Unicode which tries to seperate meaning of the text from the presentation of it.

As I remember, even a simple calculator in GNOME can't process these characters and work with them as numbers. So, using this approach needs every single program to be changed to fix the problem.

Instead, we can change widget libraries and after that only programs that need to draw their own graphical widgets, have to implement this. Not everything that Microsoft does is wrong. I think Microsoft has been doing great in supporting Arabic and Persian.

On Tue, Aug 3, 2010 at 9:18 AM, Roozbeh Pournader <roo...@gmail.com> wrote:

hAmid reZa

unread,
Aug 3, 2010, 10:47:31 AM8/3/10
to Persian Computing
I think if according to Unicode standard applications should treat
Persian digits just like the Latin ones (is it really stated as this
in Unicode documentations?) prior to end user applications
(Excel, ...) there must become a core support available in development
tools. I just testes with C# and tried this block of code:

int a = Convert.ToInt32("۱");

it raises an exception for the input string not being in proper
format. I guess same thing might happen if you try this with standard
C/C++ compilers passing Persian numerals to itoa, ... , and there
might not be a localized formatting option (something like %d and ...
I mean) for sprintf, format and functions alike which can create
strings with localized numbers.

If this behavior (treat Persian digits just like Latin ones in
mathematical operations) is a standard behavior, I think our first
step to make it widely adapted is to make them part of open source
development tools such as gcc.


On Aug 3, 8:48 am, Roozbeh Pournader <rooz...@gmail.com> wrote:

Roozbeh Pournader

unread,
Aug 3, 2010, 3:46:08 PM8/3/10
to Hossein Noorikhah, Persian Computing
On Tue, Aug 3, 2010 at 1:23 AM, Hossein Noorikhah <hosse...@gmail.com> wrote:
> I think directly using Hindi numerals (۰۱۲۳۴۵۶۸۹) is a non-standard hack,
> and is not a good solution, because those "Hindi Numerals" are just like
> "presentation form" used by programs to show numerals in a right manner.

It's not a hack. It's what the Unicode standards recommend. Using the
European numbers and displaying them as (Extended) Arabic-Indic
numbers is a hack, which Microsoft needed to create to make sure they
can interoperate with their older documents and systems (CP1256, which
was what the early Arabic Windows's supported, did not enough space
for an extra set of digits).

> I think this contradicts with the philosophy of the
> Unicode which tries to seperate meaning of the text from the presentation of
> it.

Not at all. The philosophies of Unicode are much mode complicated.

> As I remember, even a simple calculator in GNOME can't process these
> characters and work with them as numbers.

It can. It just needs a patch :)

> So, using this approach needs
> every single program to be changed to fix the problem.

Not really. Every library for parsing numbers needs to do that. And
with the convergence of i18n libraries towards ICU and other
CLDR-based information, more and more libraries are doing that.

Roozbeh

Roozbeh Pournader

unread,
Aug 3, 2010, 3:54:19 PM8/3/10
to hAmid reZa, Persian Computing
On Tue, Aug 3, 2010 at 7:47 AM, hAmid reZa <mohamm...@gmail.com> wrote:
> I think if according to Unicode standard applications should treat
> Persian digits just like the Latin ones (is it really stated as this
> in Unicode documentations?)

Yes. Please spend some time to try to find it (it's a rewarding
experience, the Unicode Standard is a tome of wisdom and information).
Ping me if you tried and couldn't.

Still, this doesn't necessarily mean that programming languages need
to support parsing Persian digits in source code or standard functions
need to do that. For example, existing code may depend on "%d" to
always generate European numbers, as these may need to get passed to
protocols. That's the reason that in glibc, one needs to say "%Id" to
get localized numbers.

All this means is that libraries need to provide functionality to
parse and generate localized numbers for user-facing parts of
applications.

> If this behavior (treat Persian digits just like Latin ones in
> mathematical operations) is a standard behavior, I think our first
> step to make it widely adapted is to make them part of open source
> development tools such as gcc.

glibc does support parsing and generating Persian digits, and so does
ICU. I haven't been following Microsoft tools much. At the moment I'm
trying to make sure the JavaScript library I use for my work would
support them.

Roozbeh

Ehsan Akhgari

unread,
Aug 3, 2010, 6:42:54 PM8/3/10
to Hamed Nemati, Persian Computing
I filed this bug to rename the -moz-persian value to persian:

https://bugzilla.mozilla.org/show_bug.cgi?id=584222

--
Ehsan
<http://ehsanakhgari.org/>


Ehsan Akhgari

unread,
Aug 3, 2010, 6:48:08 PM8/3/10
to Hossein Noorikhah, Persian Computing
Well, Firefox has had support for this type of context specific numeral switching support using the bidi.numeral pref.  The problem is that there is no good algorithm to decide where you actually want the numerals to be switched.  Currently, what we do in Firefox is we look at the characters preceding the numeral (skipping things such as white-spaces) and look and see if the previous character is a Persian/Arabic one.  This is the most basic type of context detection possible, which fails for things like:

ساعت 12:34 است.

which gets rendered as:

ساعت ۱۲:34 است.

We could probably improve our heuristics to skip over punctuation characters as well, or look at the page language, etc., but we can't come up with a perfect algorithm really.

I've been wary of advertizing this capability though, since I also think that we should encourage more people to use real Persian numerals instead of trying to hack around them by replacing the numerals inside browsers.

--
Ehsan
<http://ehsanakhgari.org/>


Roozbeh Pournader

unread,
Aug 4, 2010, 2:45:26 AM8/4/10
to Ehsan Akhgari, Hossein Noorikhah, Persian Computing
On Tue, Aug 3, 2010 at 3:48 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
> I've been wary of advertizing this capability though, since I also think that we should
> encourage more people to use real Persian numerals instead of trying to hack around
> them by replacing the numerals inside browsers.

+1 Insightful

Roozbeh

Hossein Noorikhah

unread,
Aug 4, 2010, 8:13:04 AM8/4/10
to Persian Computing
Salaam,


Yes. Please spend some time to try to find it (it's a rewarding
experience, the Unicode Standard is a tome of wisdom and information).
Ping me if you tried and couldn't.

So, let me say that it was just a guess by you. Let's cite sources. ;-)
 
Using the  European numbers and displaying them as (Extended) Arabic-Indic
numbers is a hack...

This is just using other presentation forms of numerals and not a hack. Those characters (Extended Arabic-Indic numbers) are within the Arabic presentation block.

According to definition:

presentation form

In the presentation of some scripts, a form of a graphic symbol representing a character that depends on the position of the character relative to other characters.

http://www.opengroup.org/onlinepubs/9638399/glossary.htm


> As I remember, even a simple calculator in GNOME can't process these
> characters and work with them as numbers.

It can. It just needs a patch :)

I think it's better to patch and fix the current keyboard layout, instead of patching every single GNOME application.

Using context option and selecting appropriate form for presenting a digit is not only the choice of Microsoft, but this is also implemented in projects like OpenOffice and Java.

SUN Java 1.4
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4210199

(OpenOffice 3.1) New "Context" mode for numerals setting
http://sw.openoffice.org/servlets/ReadMsg?list=features&msgNo=297


> I've been wary of advertizing this capability though, since I also think that we should
> encourage more people to use real Persian numerals instead of trying to hack around
> them by replacing the numerals inside browsers.

Sure! Let's ask the browser to take the responsiblity.

hAmid reZa

unread,
Aug 4, 2010, 9:17:07 AM8/4/10
to Persian Computing
> I filed this bug to rename the -moz-persian value to persian:
>
> https://bugzilla.mozilla.org/show_bug.cgi?id=584222
>
> --
> Ehsan
> <http://ehsanakhgari.org/>


Great! How about suggesting some more local list styles? Alphabetic
list styles such as normal Persian alphabet ordered list or ABJAD
ordered list?

hAmid reZa

unread,
Aug 4, 2010, 9:41:03 AM8/4/10
to Persian Computing
Actually, I think I've found something here:
http://unicode.org/versions/Unicode5.2.0/ch05.pdf
Take a look at section "5.5 Handling Numbers", it says:

There are many sets of characters that represent decimal digits in
different scripts. Systems
that interpret those characters numerically should provide the correct
numerical values.
For example, the sequence <U+0968 devanagari digit two, U+0966
devanagari digit
zero> when numerically interpreted has the value twenty.
...

Its main concern is mathematical behavior of local numerals. So
according to this paragraph any Unicode calculator application should
treat all numerals for all scripts the same.
But what about situations when localized numbers are treated as string
literals? For example when you search "نوکیا ۵۸۰۰" in a database,
should a Unicode database application also list any occurrence of
"نوکیا 5800"? Has Unicode standard any suggestion for this?

Hossein Noorikhah

unread,
Aug 4, 2010, 9:58:00 AM8/4/10
to Persian Computing
Salaam,
Hamid reza, thank you for pointing this:

Systems that interpret those characters numerically should provide the correct numerical values.
...
Its main concern is mathematical behavior of local numerals. So
according to this paragraph any Unicode calculator application should
treat all numerals for all scripts the same.

This is for the input of numerical data. If we don't use those characters in our keyboard and use a presentation form according to context, we don't even need to think about it on desktop apps.

Roozbeh Pournader

unread,
Aug 4, 2010, 4:17:54 PM8/4/10
to hAmid reZa, Persian Computing
On Wed, Aug 4, 2010 at 6:41 AM, hAmid reZa <mohamm...@gmail.com> wrote:
> But what about situations when localized numbers are treated as string
> literals? For example when you search "نوکیا ۵۸۰۰" in a database,
> should a Unicode database application also list any occurrence of
> "نوکیا 5800"?  Has Unicode standard any suggestion for this?

There is this idea of loose searching that various vendors (including
major efforts by Google and Apple) are getting into these days. The
main idea is that when you search, you may search for مسئول and still
want to find مسؤول too. Or you may search for کرج and want to find
کَرَجْ too.

The Unicode CLDR committee is searching for good solutions, but their
present recommendation is using collation data. (If two things sort
the same primarily, they're probably very similar.). For Persian, I
wrote a specification based on a more complex model for FarsiWeb a
while ago, available here:

http://fa.farsiweb.ir/mediawiki-fa/images/a/ab/Collation.pdf

Roozbeh

Ehsan Akhgari

unread,
Aug 4, 2010, 4:51:33 PM8/4/10
to Hossein Noorikhah, Persian Computing
On Wed, Aug 4, 2010 at 8:13 AM, Hossein Noorikhah <hossein.ir@gmail.com> wrote:
> I've been wary of advertizing this capability though, since I also think that we should
> encourage more people to use real Persian numerals instead of trying to hack around
> them by replacing the numerals inside browsers.

Sure! Let's ask the browser to take the responsiblity.

You seem to be agreeing with my comments while contradicting them!  ;-)

I was arguing that it's better for the page author to use the correct numerals instead of relying on the browser to do the correct conversion.  I'm not saying this because I'm lazy (I'm the person who implemented bidi.numeral for Arabic in Gecko's new text engine, and also who added support for Persian numerals to bidi.numeral), but because I believe that the browser is not the right place to fix this type of problem.


--
Ehsan
<http://ehsanakhgari.org/>

Ehsan Akhgari

unread,
Aug 4, 2010, 4:52:25 PM8/4/10
to hAmid reZa, Persian Computing
You should probably suggest those on the W3C CSS mailing list:

http://lists.w3.org/Archives/Public/www-style/

--
Ehsan
<http://ehsanakhgari.org/>


Behdad Esfahbod

unread,
Aug 4, 2010, 8:00:35 PM8/4/10
to Hossein Noorikhah, Persian Computing
On 08/04/10 08:13, Hossein Noorikhah wrote:
> Salaam,
>
> Yes. Please spend some time to try to find it (it's a rewarding
> experience, the Unicode Standard is a tome of wisdom and information).
> Ping me if you tried and couldn't.
>
> So, let me say that it was just a guess by you. Let's cite sources. ;-)
>
> Using the European numbers and displaying them as (Extended)
> Arabic-Indic
> numbers is a hack...
>
> This is just using other presentation forms of numerals and not a hack.
> *Those characters (*Extended Arabic-Indic numbers*) are within the
> Arabic presentation block*.
>
> According to definition:
>
>
> /presentation form/
>
> /In the presentation of some scripts, _a form of a graphic symbol
> representing a character that *depends on the position of the
> character relative to other characters*_. /
>
>
> http://www.opengroup.org/onlinepubs/9638399/glossary.htm


If citing sources we are, please cite your source when you claim that Extended
Arabic-Indic numbers are Presentation Forms. No, just being located in the
"Arabic Presentation Forms" block does not count. Block names are
informational and do not apply to all the characters encoded in the block.
For example, if your reasoning was to hold, why are the ARABIC-INDIC DIGITS
(U+0660..U+0669) not Presentation Forms but the Persian ones are?


> > As I remember, even a simple calculator in GNOME can't process these
> > characters and work with them as numbers.
>
> It can. It just needs a patch :)
>
>
> I think it's better to patch and fix the current keyboard layout,
> instead of patching every single GNOME application.

No. Neither is right. The correct solution is to fix the platform. That's
something Roozbeh and I started approaching years ago but didn't have time to
finish :(.


> Using context option and selecting appropriate form for presenting a
> digit is not only the choice of Microsoft, but this is also implemented
> in projects like OpenOffice and Java.

It's not "the choice of Microsoft", it's how Microsoft did it. I'm sure there
are many in Microsoft that regret that now. Same about Java. And
OpenOffice.org copies what MS Word does. Basing your reasoning on those is
not very helpful. For example, Microsoft and Java got a lot of other things
wrong too. Like basing their platform on UTF16/UCS2. But one has to look at
those in historical context. They couldn't do much better back then.


behdad

Hossein Noorikhah

unread,
Aug 7, 2010, 2:40:56 AM8/7/10
to Persian Computing
Hi,

> Those characters (Extended Arabic-Indic numbers) are within the Arabic
> presentation block.

No. Extended Arabic-Indic numbers are in the range U+06F0..06F9. That
is the main Arabic block (U+0660..06FF). The two Arabic presentation
forms blocks are at U+FB50..FDFF and U+FE70..FEFF.

So the rest of your reasoning is moot, as these are not presentation forms.

No, just being located in the "Arabic Presentation Forms" block does not count.

Ok, they're not in presentation block A and B, but does this implies they're presentation forms or not? As Behdad says, this does not imply anything. So, we better look at the actual standard to see what to do. I'll be happy if you find a recommendation from the Unicode standard that matches to what you've said.

Anyway, Unicode standard says:
 
There are many sets of characters that represent decimal digits in different scripts. Systems that interpret those characters numerically should provide the correct
numerical values.
For example, the sequence <U+0968 devanagari digit two, U+0966
devanagari digit zero> when numerically interpreted has the value twenty.

So, Unicode standard recommends that applications which want to process numerals, consider Hindi numerals as numeric values; So, it's all about processing input data. This is really useful in applications like PDF readers in which work with presentation forms of the characters. But is there anything against shaping digits according to context (in which you call 'hack')? I couldn't find anything.

Another thing, about keyboard layout for Persian in GNOME. If your reasoning was correct, why there are both Hindi and Arabic numerals on the keyboard? And also, let's see what Arabs did. Do they use Hindi numearls on their keyboard or we're the only one who goes this way?

I think your approach causes a lot of problems. For example, consider that you should send a number to via sms, to activate a service. In your approach, user should know whether sms server has implemented this feature or not! I think this is not a reasonable assumption.

Using context option, does not contradicts with processing Persian numerals. It does not need it in the first place, because it does not generate those characters normally, except for displaying purposes.

So, I should say that I'm not against patching applications and libraries and giving them ability to work with Persian numerals. I think dding context option could be a parallel solution that have several benefits, and seems to be compatible with Unicode standard.

Hooman Mehr

unread,
Aug 7, 2010, 4:06:22 AM8/7/10
to Hossein Noorikhah, Persian Computing
Hi, may I jump in?

Let me try to clarify the issue by asking a series of questions which (I hope) have obvious answers:

Consider a function that takes a number and returns a formatted and localized string representation of that number. 

1. Is it reasonable to expect such a function to be present in an application that claims to be internationalized and localizable?

2. Is it reasonable to assume that the above function should return Persian numerals when asked for localized string representation of a number for fa_IR (Persian/Iran) locale? 

3. If answer to 2 is yes, is it also reasonable to expect there to be an inverse function that takes the result of the above function and return the correct number? 

4. Is it reasonable to assume an application claiming to be internationalized and localizable has a bug if its number to/from strings conversion functions do not behave as above? 

5. Is it reasonable to assume an application claiming to be internationalized and localizable has a bug if it does not use the above functions when dealing with number to/from strings conversion?

So, I think for any application claiming to be internationalized and localizable, should work correctly with Persian numerals when it is run with fa_IR locale present and/or active. I thought this is so obvious that I actually left it out of my Persian GUI guidelines. I will have to go back and correct this.

Converting ASCII numerals to locale-specific numerals by the operating system based on context is a useful convenience function that might let a user use some legacy and none internationalized software with better fidelity under none-Roman locales. 

The fact that an application (not the operating system) such as MS Office and others support such context-based numeral switching is a historical relic of pre-Unicode days and their desire to be backward compatible (or in case of OpenOffice, being MS Office compatible). 

Also, important to note is that older keyboard layouts (and even the newest ones) offer keyboard layouts with (at least) option of ASCII numerals, which is again a convenience function to let users enter numbers into legacy and none internationalized software, or in places where it is required by some existing standards, such as creating a specially formatted XML document. 

This backward compatibility and legacy software support should not be relied on by modern software. They should behave correctly without a need for such legacy support functions as contextual numeral switching. 

Whether they should support contextual numeral switching, depends on whether they face a legacy of existing documents, etc. that they want to keep working. For a completely new feature in a new technology that is not facing the burden of existing legacy documents (such as CSS3) it is totally wrong to support contextual numeral switching to open the door for creating a fresh wave of documents that are not the way they should be.

On the other hand, lack of contextual numeral switching should encourage users (such as web page authors) to use a modern keyboard layout with correct Persian numerals to produce Persian documents and don't taint it with foreign characters. The issue of Persian text files full of ASCII characters such as numerals and double quotes (frequently used instead of tanween! and almost universally used instead of Persian guillemet گیومه) won't be ever solved until application software and keyboard layouts take some bold moves to pressure users to correct their bad typing habits by making such mistakes a bit harder to make (not easier to make by hiding them by contextual display switching).

Another important issue with contextual numeral substitution is the lack of a standard definition for the context in which such switching occurs. It is difficult to come up with a perfect algorithm for this purpose and and it is even more difficult to standardize it and make the software developers implement it.

- Hooman Mehr



Roozbeh Pournader

unread,
Aug 7, 2010, 4:06:40 AM8/7/10
to Hossein Noorikhah, Persian Computing, Behdad Esfahbod
On Fri, Aug 6, 2010 at 11:40 PM, Hossein Noorikhah <hosse...@gmail.com> wrote:
> Ok, they're not in presentation block A and B, but does this implies they're
> presentation forms or not? As Behdad says, this does not imply anything.

You are misreading Behdad.

There are no presentation forms in the main Arabic block.

> So,
> we better look at the actual standard to see what to do. I'll be happy if
> you find a recommendation from the Unicode standard that matches to what
> you've said.

Look at the CLDR data for example. I think the field is named nativeZero.

> Another thing, about keyboard layout for Persian in GNOME. If your reasoning
> was correct, why there are both Hindi and Arabic numerals on the keyboard?

That's based on the preference of the creator of the keyboard layout
file, Mr Behnam Esfahbod. You should ask him. It's not in ISIRI 9147.

Roozbeh

Roozbeh Pournader

unread,
Aug 7, 2010, 4:10:44 AM8/7/10
to Hooman Mehr, Hossein Noorikhah, Persian Computing
On Sat, Aug 7, 2010 at 1:06 AM, Hooman Mehr <hooma...@gmail.com> wrote:
> Another important issue with contextual numeral substitution is the lack of
> a standard definition for the context in which such switching occurs. It is
> difficult to come up with a perfect algorithm for this purpose and and it is
> even more difficult to standardize it and make the software developers
> implement it.

Thanks a lot.

That's the most important reason to use actual Persian digits
(U+06F0..06F9), I believe. The author of the document knows which
digit forms he wants much better than the final renderer. And think of
all the cases the algorithm is implemented slightly differently on the
author's machine and the reader's. That's a huge source of
frustration.

Roozbeh

Hossein Noorikhah

unread,
Aug 7, 2010, 5:31:23 AM8/7/10
to Persian Computing
Hi,
 
Consider a function that takes a number and returns a formatted and localized string representation of that number. 

1. Is it reasonable to expect such a function to be present in an application that claims to be internationalized and localizable?

2. Is it reasonable to assume that the above function should return Persian numerals when asked for localized string representation of a number for fa_IR (Persian/Iran) locale? 

3. If answer to 2 is yes, is it also reasonable to expect there to be an inverse function that takes the result of the above function and return the correct number? 

4. Is it reasonable to assume an application claiming to be internationalized and localizable has a bug if its number to/from strings conversion functions do not behave as above? 

5. Is it reasonable to assume an application claiming to be internationalized and localizable has a bug if it does not use the above functions when dealing with number to/from strings conversion?

So, I think for any application claiming to be internationalized and localizable, should work correctly with Persian numerals when it is run with fa_IR locale present and/or active. I thought this is so obvious that I actually left it out of my Persian GUI guidelines. I will have to go back and correct this.

Converting ASCII numerals to locale-specific numerals by the operating system based on context is a useful convenience function that might let a user use some legacy and none internationalized software with better fidelity under none-Roman locales. 

The fact that an application (not the operating system) such as MS Office and others support such context-based numeral switching is a historical relic of pre-Unicode days and their desire to be backward compatible (or in case of OpenOffice, being MS Office compatible). 

I agree with you that an application that is fully compliant with this approach will have no problem, but we're talking about the presentation phase, as there is no recommendation about it in the Unicode standard. I also agree that applications should have that "inverse function" and should be able to process Persian numerals. But I'm talking about the way we show generate and show those numerals, not the way they're processed.

And thank you for clarifying things. Generating correct numerals should be implemented in OS libraries or the application itself; So, it's just the choice of selecting where to implement this. If an application uses standard widgets, they will be rendered correctly. But if an application needs to draw its custom widget (like what OpenOffice and MS Word do), then it should implemented in the application itself. This choice has nothing to do with the Unicode standard.


The author of the document knows which digit forms he wants much better than the final renderer.

So, the problem is who should render the actual numerals, and not the Unicode standard.


And think of all the cases the algorithm is implemented slightly differently on the
author's machine and the reader's. That's a huge source of frustration.

and changing every single application is not any source of frustration? I don't think so.

Right now, we're facing a lot of problems generating Persian numerals in Wikipedia. Several bots have been implemented to generate correct numerals in a tedious process, and we're still having problems there. Yes, changing every single application in the world may help, and it will be rewarding for companies who are active the area of the localization, but let's think about usability of the approach and the users who want to use open source software.

Hossein

Roozbeh Pournader

unread,
Aug 7, 2010, 6:12:28 AM8/7/10
to Hossein Noorikhah, Persian Computing
On Sat, Aug 7, 2010 at 2:31 AM, Hossein Noorikhah <hosse...@gmail.com> wrote:
>> And think of all the cases the algorithm is implemented slightly
>> differently on the
>> author's machine and the reader's. That's a huge source of frustration.
>
> and changing every single application is not any source of frustration? I
> don't think so.

There are many more document authors that the applications they use.
Even more readers than authors. And we don't need to change all
applications. Just the first hundred most popular ones would do. The
rest will follow. :)

Hooman Mehr

unread,
Aug 7, 2010, 6:40:36 AM8/7/10
to Hossein Noorikhah, Persian Computing
Hi,

Let me also add re-emphesize the following:

The features that affect the rendering of numerals in operating systems and applications are usually optional, and may be turned on and off. It is also important to note that since such features are not designed as an integral part of display rendering system and are system/application dependent (none-standard) convenience features, these settings are not stored in documents. 

So, if you have turned contextual numerals on for your MS Office installation and have typed some numbers using ASCII digits in the middle of Persian text (such as section numbers, dates, etc.) you may see them correctly rendered in Persian. But when you send such a document to a user who has different settings in their installation of MS Office, they may not see any persian numerals.

Now my question is: Why should any Persian user who is typing Persian text want to (or have to) type (Persian) numbers using Latin ASCII digits? Is it for any reason other than: 

a) Their keyboard layout does not easily allow it (like old Windows Persian keyboards) 
b) Their application does not properly work with Persian numerals?

The solution to the first problem is easy: They need to update the keyboard layout installed on their system. Suitable keyboard layouts are freely available for all reasonably current operating systems.

The solution to the second problem is more tricky: User needs to use properly internationalized software applications that properly handle Persian numerals. 

Let me go a bit further into the detail of the particular problem of automatically numbered lists:

An application wants to generate a numbered list. The concept of a numbered list is clear: The first item gets the smallest number (usually one) and the next item the number of the previous item plus (usually) one. This is simple arithmetic calculation. Then the application needs to insert these numbers into the text (a string of characters). Now the question is should it insert ASCII digits or should it call the function I mentioned in my pervious post to generate a properly localized string representation of that number? 

I think the correct answer is the latter. So, in this context what we mean by "rendering" is converting the number (a mathematical concept independent of language) into its correct string representation (which is a text expressed in a specific human language). Such a string representation could be copied and pasted elsewhere and hence is potentially persistent and not a screen-view-only artifact. In the particular example of CSS list styles, the correct locale to use to render the numbers as text is best inferred from the language of the current cascade. Specifically mentioning a Persian list style not only would imply Persian locale for generating correct text numerals, but also other stylistic choices such as correct nesting (which part of the number comes first and what is the correct delimiter between the numbers) and what is the correct delimiter separating the item number from its text.

As you see, there is no need for contextual override of numerals on the fly (for final screen display) if the applications behave correctly.

- Hooman Mehr

On Aug 7, 2010, at 2:01 PM, Hossein Noorikhah wrote:

Hi,
 

I agree with you that an application that is fully compliant with this approach will have no problem, but we're talking about the presentation phase, as there is no recommendation about it in the Unicode standard. I also agree that applications should have that "inverse function" and should be able to process Persian numerals. But I'm talking about the way we show generate and show those numerals, not the way they're processed.

And thank you for clarifying things. Generating correct numerals should be implemented in OS libraries or the application itself; So, it's just the choice of selecting where to implement this. If an application uses standard widgets, they will be rendered correctly. But if an application needs to draw its custom widget (like what OpenOffice and MS Word do), then it should implemented in the application itself. This choice has nothing to do with the Unicode standard.

The author of the document knows which digit forms he wants much better than the final renderer.

So, the problem is who should render the actual numerals, and not the Unicode standard.

And think of all the cases the algorithm is implemented slightly differently on the
author's machine and the reader's. That's a huge source of frustration.

and changing every single application is not any source of frustration? I don't think so.

Behnam Esfahbod ZWNJ

unread,
Aug 7, 2010, 10:09:53 AM8/7/10
to Hossein Noorikhah, Persian Computing, Behdad Esfahbod, Roozbeh Pournader
Hi there,

On Sat, Aug 7, 2010 at 12:36 PM, Roozbeh Pournader <roo...@gmail.com> wrote:
On Fri, Aug 6, 2010 at 11:40 PM, Hossein Noorikhah <hossein.ir@gmail.com> wrote:> So,

> Another thing, about keyboard layout for Persian in GNOME. If your reasoning
> was correct, why there are both Hindi and Arabic numerals on the keyboard?

That's based on the preference of the creator of the keyboard layout
file, Mr Behnam Esfahbod. You should ask him. It's not in ISIRI 9147.

As Roozbeh said, I have added a few additions to the standard in the X11/GNOME Iranian layout.  The "extension" part is in a separate section in the file named "ir(pes_part_ext)".

The main reason for this extension is to allow inputing numerical values in applications that cannot handle Persian numericals *yet*. The extension was optional till a few monthes ago, but the number of layout options was too much for normal users (4 different layouts).

Another reason was the lack of Keypad area in laptop keyboards.  The standard (ISIRI 9147) leaves the behaviour of Keypad area to the platforms, so they can configure it depending on the supported features.  But using laptops, users don't usually switch to the Keypad system to enter numbers, and mandating user to switch the keyboard layout for a numerical field doesn't seem right.  Having ASCII digits on level-4 resolves this problem for now, as the problem itself should be considered a *temporary* thing, and this solution is not the ideal case.

Also, please note that such an extension does not break the standard.  It doesn't change any character mapping and doesn't introduce any big different behaviour.  Without the extension,typing "<AltGr><Shift>1" would give you nothing, and with the extension it gives an ASCII DIGIT ONE.  If you look at the standard, you will see there is no rule agains extending the layout.  And as the standard "does not include any rule on ASCII digits" [page 1, PDF page 11] and the platform requires an Iranian/Persian user to enter these characters, I think it was and is a good decision to have this extension.

About the Arabic/Persian digits on Keypad, please look at https://bugs.freedesktop.org/show_bug.cgi?id=24020 .

-Behnam


--
    '     بهنام اسفهبد
    '     Behnam Esfahbod
   '     
  *  ..   http://behnam.esfahbod.info
 *  `  * 
  * o *   http://zwnj.org

Roozbeh Pournader

unread,
Aug 7, 2010, 3:11:46 PM8/7/10
to Behnam Esfahbod ZWNJ, Hossein Noorikhah, Persian Computing, Behdad Esfahbod
On Sat, Aug 7, 2010 at 7:09 AM, Behnam Esfahbod ZWNJ <beh...@zwnj.org> wrote:
> The standard (ISIRI 9147) leaves the behaviour of Keypad area to the platforms, so they can configure it depending on the supported features.

Here's what the standards says about the keypad area:

در صورتی که صفحه‌کلید ناحیهٔ عددی نیز داشته باشد، کاربردها بهتر است در صورت فشرده‌شدن آنها در حالت صفحه‌کلید فارسی، به جای ارقام اروپایی ارقام فارسی (با کد U+06F0 تا U+06F9) تولید کنند.

So this is a SHOULD, but clearly not a MUST. When we wrote the standard, we did this to take care of platforms like MS Windows where the keypad area couldn't be easily remapped. But I believe that this means that if the platform could do it, it should better. Still, I agree that if the applications has a very good reason to keep the keypad in European numerals, it can, and it will still be ISIRI 9147 compatible.

Roozbeh

Behnam Esfahbod ZWNJ

unread,
Aug 14, 2010, 6:34:17 PM8/14/10
to Roozbeh Pournader, Hossein Noorikhah, Persian Computing, Behdad Esfahbod

Right.

But what I meant was that we left the decision about the other keys in keypads (plus, minus, product, divide, etc) to the platform.  And now, GNOME provides the option to select which set of characters should be mapped on these keys: the legacy map that uses ASCII-only characters (+, -, *, /), and the modern map that uses Unicode's mathematical equivalents (U+002B PLUS SIGN, U+2212 MINUS SIGN, U+00D7 MULTIPLICATION SIGN, U+00F7 DIVISION SIGN).

And the bug I mentioned before is about adding some maps with Arabic and Persian/Urdu digits to the keypad options.

Behnam Esfahbod ZWNJ

unread,
Aug 14, 2010, 6:43:44 PM8/14/10
to Hossein Noorikhah, Persian Computing
Here is a simple example to show that the "context" method should not be used on the rendering side.

How I can write "رقم چهار در اکثر نقاط دنیا به صورت 4 نوشته می‌شود.‏" ("digit four is written as 4 in most parts of the world") if the context method is used in the rendering side?

Note that I don't reject the option for "context" feature on the data-entry side.  For example, MS Word already has a good understand on how to handle parenthesis on behalf of the user: if you close a parenthesis in an RTL paragraph while you are still using a LTR keyboard layout, it knows to put a LRM mark after the parenthesis...  So why not do the same with the digits?  Doing so, the user will have the option to turn the feature off to input something complex (like the example above) or just leave it alone and type in their native language with using ASCII digits or the native ones.

-Behnam


On Mon, Aug 2, 2010 at 11:00 AM, Hossein Noorikhah <hossein.ir@gmail.com> wrote:
Hi,

Hossein Noorikhah

unread,
Aug 26, 2010, 4:14:35 PM8/26/10
to Persian Computing
Thanks Behnam,

Yes, but what you've mentioned is an exception. Using foreign "characters" for demonstration purpose is usually done with the help of a character map, so I think it's not in the scope of our topic. As you've mentioned, MS Word inserts appropriate RLM and LRM marks in real world usages.

But using alternate solutions, and abandoning context option, causes a lot of problems wich are not only exceptions. It's not backward compatible. Also, it can not be implemented gradually and implementing it everywhere causes headaches for everyone in the transition period.

There's nothing against it in Unicode standard, because it's a choice of selecting where to put a functionality (OS, libraries, Appliction, etc), and has nothing to do with the standard. So, I think it's reasonable to see which choice is better for the developers, users, etc, and then choose if it's needed to use it or not. As I've stated before, context option have various benefits that is not limited to the

Reply all
Reply to author
Forward
0 new messages