Not really. The actual solution is better support for non-European
numerals in searches and in Microsoft products like Excel. We need to
insist on using Persian digits as much as possible and ask the
software vendors for supporting them properly. The underlying
standards like Unicode and CLDR have all the necessary information.
The vendors just need to support the standards.
Roozbeh
Showing Persian numerals according to context -which might solve lots
of problems for us- is not stated explicitly in any current/future web
standard as I guess. So what do you think?
It's not a hack. It's what the Unicode standards recommend. Using the
European numbers and displaying them as (Extended) Arabic-Indic
numbers is a hack, which Microsoft needed to create to make sure they
can interoperate with their older documents and systems (CP1256, which
was what the early Arabic Windows's supported, did not enough space
for an extra set of digits).
> I think this contradicts with the philosophy of the
> Unicode which tries to seperate meaning of the text from the presentation of
> it.
Not at all. The philosophies of Unicode are much mode complicated.
> As I remember, even a simple calculator in GNOME can't process these
> characters and work with them as numbers.
It can. It just needs a patch :)
> So, using this approach needs
> every single program to be changed to fix the problem.
Not really. Every library for parsing numbers needs to do that. And
with the convergence of i18n libraries towards ICU and other
CLDR-based information, more and more libraries are doing that.
Roozbeh
Yes. Please spend some time to try to find it (it's a rewarding
experience, the Unicode Standard is a tome of wisdom and information).
Ping me if you tried and couldn't.
Still, this doesn't necessarily mean that programming languages need
to support parsing Persian digits in source code or standard functions
need to do that. For example, existing code may depend on "%d" to
always generate European numbers, as these may need to get passed to
protocols. That's the reason that in glibc, one needs to say "%Id" to
get localized numbers.
All this means is that libraries need to provide functionality to
parse and generate localized numbers for user-facing parts of
applications.
> If this behavior (treat Persian digits just like Latin ones in
> mathematical operations) is a standard behavior, I think our first
> step to make it widely adapted is to make them part of open source
> development tools such as gcc.
glibc does support parsing and generating Persian digits, and so does
ICU. I haven't been following Microsoft tools much. At the moment I'm
trying to make sure the JavaScript library I use for my work would
support them.
Roozbeh
+1 Insightful
Roozbeh
Yes. Please spend some time to try to find it (it's a rewarding
experience, the Unicode Standard is a tome of wisdom and information).
Ping me if you tried and couldn't.
Using the European numbers and displaying them as (Extended) Arabic-Indic
numbers is a hack...
presentation form
In the presentation of some scripts, a form of a graphic symbol representing a character that depends on the position of the character relative to other characters.
> As I remember, even a simple calculator in GNOME can't process these
> characters and work with them as numbers.
It can. It just needs a patch :)
> I've been wary of advertizing this capability though, since I also think that we should
> encourage more people to use real Persian numerals instead of trying to hack around
> them by replacing the numerals inside browsers.
Systems that interpret those characters numerically should provide the correct numerical values.
...
Its main concern is mathematical behavior of local numerals. So
according to this paragraph any Unicode calculator application should
treat all numerals for all scripts the same.
There is this idea of loose searching that various vendors (including
major efforts by Google and Apple) are getting into these days. The
main idea is that when you search, you may search for مسئول and still
want to find مسؤول too. Or you may search for کرج and want to find
کَرَجْ too.
The Unicode CLDR committee is searching for good solutions, but their
present recommendation is using collation data. (If two things sort
the same primarily, they're probably very similar.). For Persian, I
wrote a specification based on a more complex model for FarsiWeb a
while ago, available here:
http://fa.farsiweb.ir/mediawiki-fa/images/a/ab/Collation.pdf
Roozbeh
> I've been wary of advertizing this capability though, since I also think that we should
> encourage more people to use real Persian numerals instead of trying to hack around
> them by replacing the numerals inside browsers.
Sure! Let's ask the browser to take the responsiblity.
If citing sources we are, please cite your source when you claim that Extended
Arabic-Indic numbers are Presentation Forms. No, just being located in the
"Arabic Presentation Forms" block does not count. Block names are
informational and do not apply to all the characters encoded in the block.
For example, if your reasoning was to hold, why are the ARABIC-INDIC DIGITS
(U+0660..U+0669) not Presentation Forms but the Persian ones are?
> > As I remember, even a simple calculator in GNOME can't process these
> > characters and work with them as numbers.
>
> It can. It just needs a patch :)
>
>
> I think it's better to patch and fix the current keyboard layout,
> instead of patching every single GNOME application.
No. Neither is right. The correct solution is to fix the platform. That's
something Roozbeh and I started approaching years ago but didn't have time to
finish :(.
> Using context option and selecting appropriate form for presenting a
> digit is not only the choice of Microsoft, but this is also implemented
> in projects like OpenOffice and Java.
It's not "the choice of Microsoft", it's how Microsoft did it. I'm sure there
are many in Microsoft that regret that now. Same about Java. And
OpenOffice.org copies what MS Word does. Basing your reasoning on those is
not very helpful. For example, Microsoft and Java got a lot of other things
wrong too. Like basing their platform on UTF16/UCS2. But one has to look at
those in historical context. They couldn't do much better back then.
behdad
> Those characters (Extended Arabic-Indic numbers) are within the Arabic
> presentation block.
No. Extended Arabic-Indic numbers are in the range U+06F0..06F9. That
is the main Arabic block (U+0660..06FF). The two Arabic presentation
forms blocks are at U+FB50..FDFF and U+FE70..FEFF.
So the rest of your reasoning is moot, as these are not presentation forms.
No, just being located in the "Arabic Presentation Forms" block does not count.
There are many sets of characters that represent decimal digits in different scripts. Systems that interpret those characters numerically should provide the correct
numerical values.
For example, the sequence <U+0968 devanagari digit two, U+0966
devanagari digit zero> when numerically interpreted has the value twenty.
You are misreading Behdad.
There are no presentation forms in the main Arabic block.
> So,
> we better look at the actual standard to see what to do. I'll be happy if
> you find a recommendation from the Unicode standard that matches to what
> you've said.
Look at the CLDR data for example. I think the field is named nativeZero.
> Another thing, about keyboard layout for Persian in GNOME. If your reasoning
> was correct, why there are both Hindi and Arabic numerals on the keyboard?
That's based on the preference of the creator of the keyboard layout
file, Mr Behnam Esfahbod. You should ask him. It's not in ISIRI 9147.
Roozbeh
Thanks a lot.
That's the most important reason to use actual Persian digits
(U+06F0..06F9), I believe. The author of the document knows which
digit forms he wants much better than the final renderer. And think of
all the cases the algorithm is implemented slightly differently on the
author's machine and the reader's. That's a huge source of
frustration.
Roozbeh
Consider a function that takes a number and returns a formatted and localized string representation of that number.1. Is it reasonable to expect such a function to be present in an application that claims to be internationalized and localizable?2. Is it reasonable to assume that the above function should return Persian numerals when asked for localized string representation of a number for fa_IR (Persian/Iran) locale?3. If answer to 2 is yes, is it also reasonable to expect there to be an inverse function that takes the result of the above function and return the correct number?4. Is it reasonable to assume an application claiming to be internationalized and localizable has a bug if its number to/from strings conversion functions do not behave as above?5. Is it reasonable to assume an application claiming to be internationalized and localizable has a bug if it does not use the above functions when dealing with number to/from strings conversion?So, I think for any application claiming to be internationalized and localizable, should work correctly with Persian numerals when it is run with fa_IR locale present and/or active. I thought this is so obvious that I actually left it out of my Persian GUI guidelines. I will have to go back and correct this.Converting ASCII numerals to locale-specific numerals by the operating system based on context is a useful convenience function that might let a user use some legacy and none internationalized software with better fidelity under none-Roman locales.The fact that an application (not the operating system) such as MS Office and others support such context-based numeral switching is a historical relic of pre-Unicode days and their desire to be backward compatible (or in case of OpenOffice, being MS Office compatible).
The author of the document knows which digit forms he wants much better than the final renderer.
And think of all the cases the algorithm is implemented slightly differently on the
author's machine and the reader's. That's a huge source of frustration.
There are many more document authors that the applications they use.
Even more readers than authors. And we don't need to change all
applications. Just the first hundred most popular ones would do. The
rest will follow. :)
Hi,
I agree with you that an application that is fully compliant with this approach will have no problem, but we're talking about the presentation phase, as there is no recommendation about it in the Unicode standard. I also agree that applications should have that "inverse function" and should be able to process Persian numerals. But I'm talking about the way we show generate and show those numerals, not the way they're processed.
And thank you for clarifying things. Generating correct numerals should be implemented in OS libraries or the application itself; So, it's just the choice of selecting where to implement this. If an application uses standard widgets, they will be rendered correctly. But if an application needs to draw its custom widget (like what OpenOffice and MS Word do), then it should implemented in the application itself. This choice has nothing to do with the Unicode standard.
The author of the document knows which digit forms he wants much better than the final renderer.
So, the problem is who should render the actual numerals, and not the Unicode standard.And think of all the cases the algorithm is implemented slightly differently on the
author's machine and the reader's. That's a huge source of frustration.
and changing every single application is not any source of frustration? I don't think so.
On Fri, Aug 6, 2010 at 11:40 PM, Hossein Noorikhah <hossein.ir@gmail.com> wrote:> So,
> Another thing, about keyboard layout for Persian in GNOME. If your reasoning
> was correct, why there are both Hindi and Arabic numerals on the keyboard?
That's based on the preference of the creator of the keyboard layout
file, Mr Behnam Esfahbod. You should ask him. It's not in ISIRI 9147.
Hi,