Rendering of ellipsis for different scripts

Axel Hecht

unread,

Oct 27, 2007, 10:27:41 AM10/27/07

to

Hi,

there is some discussion going on in
https://bugzilla.mozilla.org/show_bug.cgi?id=400237 on how to lay out
the unichar ellipsis for different scripts, and for different localized
versions of OSes. As that bug grows hard to grok, I'm asking some
questions here.

On a regular XP, the ellipsis is rendered similarily to ... (just more
closely spaced). On a Japanese Windows, it's apparently rendered with MS
UI Gothic, which places the dots on the middle of the line vertically.
Now that obviously looks wrong for English text, but I wonder if it's
right in other scripts, in particular, for Japanese. Independent of
other apps using '.''.''.' and thus being on the baseline, the font
might be right for Japanese script. (The bug has a testcase for the
ellipsis in both fonts.)

Are there similar problems for other scripts?

Could this happen to other glyphs?

Axel

Hendrik Maryns

unread,

Oct 27, 2007, 11:59:36 AM10/27/07

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

Axel Hecht uitte de volgende tekst op 10/27/2007 04:27 PM:

There is something in the Unicode standard for this. If I remember and
understand correctly, it is the same character, but the glyph can differ
per language. That is, you use \U2026, and each locale will make sure
it has the right glyph. That is, if you have Japanese from top to
bottom, it will (and should) be three dots below each other. For Latin
script, it will always be three dots next to each other, etc. So
basically, as long \U2026 is used, everything should be fine.

But please correct me if I’m wrong.

H.
--
Hendrik Maryns
http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
www.lieverleven.be
http://aouw.org
http://catb.org/~esr/faqs/smart-questions.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)

iD8DBQFHI2Boe+7xMGD3itQRAt/CAJ9mTPSAScLsN3jv7XgMM5iX2kMrgQCfV4wE
xHw9X7oCOJ9VrH7bSnpaPTs=
=LZf+
-----END PGP SIGNATURE-----

Jean-Marc Desperrier

unread,

Oct 27, 2007, 12:21:07 PM10/27/07

to

Axel Hecht wrote:
> On a regular XP, the ellipsis is rendered similarily to ... (just more
> closely spaced). On a Japanese Windows, it's apparently rendered with MS
> UI Gothic, which places the dots on the middle of the line vertically.
> Now that obviously looks wrong for English text, but I wonder if it's
> right in other scripts, in particular, for Japanese. Independent of
> other apps using '.''.''.' and thus being on the baseline, the font
> might be right for Japanese script. (The bug has a testcase for the
> ellipsis in both fonts.)

> Are there similar problems for other scripts?
> Could this happen to other glyphs?

First, I think this is truly a i18n problem, so I'll restrict there.

Yes, you would have similar problem running a japanese firefox under a
chinese OS, all the kanji for menus and button would be rendered using
the chinese version that looks ugly.

That's the real problem I believe. Using the font the OS tells you to
use to display your application's menus and buttons means the result
will be bad as soon as the localization of the OS and of the application
don't match.

Now this is not as bad as it looks, because doing otherwise means you
will loose appearance consistency between applications.

Rimas Kudelis

unread,

Oct 27, 2007, 9:01:32 PM10/27/07

to

Jean-Marc Desperrier rašė:

I think this is the expected behaviour...

RQ

Rimas Kudelis

unread,

Oct 27, 2007, 9:18:20 PM10/27/07

to

Hendrik Maryns rašė:

I don't think you're right.

First, I can't see anything like that mentioned in the latest version of
unicode chart table: http://www.unicode.org/charts/PDF/U2000.pdf

Second, I believe that this would defeat the purpose of Unicode (which
is to be consistent, no matter what the context or language is).

I'd rather think it's stated somewhere else (perhaps even in Unicode
standard) that the ellipsis itself can be expressed by using different
characters, depending on the language used. But those different
characters have their own codepoints, I guess.

RQ

Hendrik Maryns

unread,

Oct 28, 2007, 9:45:13 AM10/28/07

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

Rimas Kudelis uitte de volgende tekst op 10/28/2007 02:18 AM:

Ah, that sounds reasonable, I think I remembered it incorrectly. Then
it has to do with fonts, probably.

H.
--
Hendrik Maryns
http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
www.lieverleven.be
http://aouw.org
http://catb.org/~esr/faqs/smart-questions.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)

iD8DBQFHJJJpe+7xMGD3itQRApEoAJ9NFOo/xHAIPKa7dKRqHTNDuxZ31ACeNIvh
Int2a8Zy6iphBB42NE+fKOw=
=u4NO
-----END PGP SIGNATURE-----

Jean-Marc Desperrier

unread,

Oct 28, 2007, 2:00:33 PM10/28/07

to

Rimas Kudelis wrote:
> Second, I believe that this would defeat the purpose of Unicode (which
> is to be consistent, no matter what the context or language is).

(I'm setting the follow-up on i18n again).

No, the purpose of Unicode is to encode abstract characters, that are
recognised as representing the same element, but can have very various
graphical representations (glyphs) depending on both context and language.

http://unicode.org/reports/tr17/#CharactersVsGlyphs
"The elements of the character repertoire are abstract characters.
Characters are different from glyphs, which are the particular images
representing a character or part of a character. Glyphs for the same
character may have very different shapes"
"[...] the connection between glyphs and characters is at times even
less direct. Glyphs may be required to change their shape, position and
width depending on the surrounding glyphs"

Chinese characters unification is the reference case where the
appearance is very dependent on the language :
http://unicode.org/faq/han_cjk.html#3
"Q: If the character shapes are different in different parts of East
Asia, why were the characters unified?

A: The Unicode standard is designed to encode characters, not glyphs.
Even where there are substantial variations in the standard way of
writing a character from locale to locale, if the fundamental identity
of the character is not in question, then a single character is encoded
in Unicode."

> I'd rather think it's stated somewhere else (perhaps even in Unicode
> standard) that the ellipsis itself can be expressed by using different
> characters, depending on the language used. But those different
> characters have their own codepoints, I guess.

As seen above, definitively no. There is one and one only unicode
character to represent the ellipsis, U2026, and the preferred glyph to
represent it will depend on language and cultural preferences.

If you want a nice display, you need to carry that language an cultural
preferences information into your display engine.

Rimas Kudelis

unread,

Oct 29, 2007, 3:14:49 AM10/29/07

to

Jean-Marc Desperrier rašė:

Hm, you might be right.

Is it possible to tell the OS what locale the program interface is in?
And if yes, then does Firefox do that ATM?

RQ

Jean-Marc Desperrier

unread,

Oct 30, 2007, 1:11:22 PM10/30/07

to

Rimas Kudelis wrote:
> Is it possible to tell the OS what locale the program interface is in?
> And if yes, then does Firefox do that ATM?

This is clearly OS dependent, but I think there's very little support
for it.

Under Windows, you can change the locale under which each application
runs ( using applocale
http://www.microsoft.com/globaldev/tools/apploc.mspx ), but the UI fonts
do not depend on the locale.

Under linux, the X/11 settings that apply to each application can be
customized, but I wonder of this applies to the menus.:
http://mit.edu/answers/xwindows/xwindows_fonts.html
"most programs will obey a "font" resource in your .Xresources file; for
example, you could put the line "xterm*font: 8x13" to make 8x13 your
default xterm font."

It seems likely the menus/button depend either on the desktop
environment (Gnome/KDE/etc.) or on the widget toolkits the application
uses (Qt/GTK/Motif). I don't know precisely what support they have for
per locale/per app customization.

L. David Baron

unread,

Oct 30, 2007, 1:25:23 PM10/30/07

to dev-...@lists.mozilla.org

On Monday 2007-10-29 09:14 +0200, Rimas Kudelis wrote:
> Is it possible to tell the OS what locale the program interface is in?
> And if yes, then does Firefox do that ATM?

If you're talking about issues with Web page display, you generally
want to do things based on the language of the Web page, not the
language of the interface. Switching based on the language of the
interface introduces yet another variable that requires Web authors
to do more testing.

(Although when the language of the page we may default to assuming
the page is the same as the interface, which may be reasonable.
That said, many authors of English pages probably don't specify
language at all, but should...)

-David

--
L. David Baron http://dbaron.org/
Mozilla Corporation http://www.mozilla.com/

Rimas Kudelis

unread,

Oct 30, 2007, 4:36:37 PM10/30/07

to

L. David Baron rašė:

> On Monday 2007-10-29 09:14 +0200, Rimas Kudelis wrote:
>> Is it possible to tell the OS what locale the program interface is in?
>> And if yes, then does Firefox do that ATM?
>
> If you're talking about issues with Web page display, you generally
> want to do things based on the language of the Web page, not the
> language of the interface. Switching based on the language of the
> interface introduces yet another variable that requires Web authors
> to do more testing.

No, i'm talking about the ellipsis issue we're discussing here. To be
more precise, about the possibility to get "western" ellipsis instead of
a Japanese one in English interface of a program.

Some, but not much, information on this topic can be found here:
http://www.microsoft.com/globaldev/handson/dev/AppCompatInMUI.mspx#ERB .

However, even in theory it's quite interesting. I wonder if it's
possible to tell screen readers, for example, that the application
running on e.g. German windows has an interface in English. Is it
possible at all?

RQ

Rimas Kudelis

unread,

Oct 30, 2007, 4:59:55 PM10/30/07

to

Jean-Marc Desperrier rašė:

> Rimas Kudelis wrote:
>> Is it possible to tell the OS what locale the program interface is in?
>> And if yes, then does Firefox do that ATM?
>
> This is clearly OS dependent, but I think there's very little support
> for it.
>
> Under Windows, you can change the locale under which each application
> runs ( using applocale
> http://www.microsoft.com/globaldev/tools/apploc.mspx ), but the UI fonts
> do not depend on the locale.

That's not what I meant...

Back to the initial problem, I'll soon reply to your older post (or mine).

> Under linux, the X/11 settings that apply to each application can be
> customized, but I wonder of this applies to the menus.:
> http://mit.edu/answers/xwindows/xwindows_fonts.html
> "most programs will obey a "font" resource in your .Xresources file; for
> example, you could put the line "xterm*font: 8x13" to make 8x13 your
> default xterm font."
>
> It seems likely the menus/button depend either on the desktop
> environment (Gnome/KDE/etc.) or on the widget toolkits the application
> uses (Qt/GTK/Motif). I don't know precisely what support they have for
> per locale/per app customization.

It depends on the widget set, definitely.

RQ

Rimas Kudelis

unread,

Oct 31, 2007, 3:57:30 AM10/31/07

to

Jean-Marc Desperrier rašė:

Hmm....

Actually, I think you're wrong about this particular case.

Check http://www.unicode.org/charts/PDF/U3000.pdf, which depicts a few
CJK punctuation symbols, like IDEOGRAPHIC COMMA and IDEOGRAPHIC FULL
STOP, for example.

I tend to think, that if there actually exists a tradition to use an
ellipsis character in Japanese, then perhaps there should be something
like an IDEOGRAPHIC ELLIPSIS character in Unicode (similarly to the
above cases).

Remember, that our actual problem is that the user runs an English
version of Firefox (or any other Latin or even Cyrillic script-based
language version anyway) on Japanese version of Windows. I can't say for
sure, and I guess we should consult someone with a good knowledge in
Japanese here, but perhaps the problem we're dealing with now is
actually nothing but a bug in MS UI Gothic?

Oh, and by the way, a similar question has already been raised by Nokia
five years ago(!) on a mailing list of w3c. Here's the thread:
http://lists.w3.org/Archives/Public/www-international/2002OctDec/0102.html
http://lists.w3.org/Archives/Public/www-international/2002OctDec/0110.html
http://lists.w3.org/Archives/Public/www-international/2002OctDec/0115.html

RQ

Jean-Marc Desperrier

unread,

Oct 31, 2007, 7:40:50 AM10/31/07

to

Rimas Kudelis wrote:
> [...]

> Some, but not much, information on this topic can be found here:
> http://www.microsoft.com/globaldev/handson/dev/AppCompatInMUI.mspx#ERB .

"[...] To get maximum application compatibility, you can set all the
language related settings to match the application’s language. This
includes:
[...]
• Shell UI Font (this setting can only be set from XP MUI Setup and only
applies to Japanese applications)."

Here we are. So there is a bit of support for it even if it's currently
very restricted.

> However, even in theory it's quite interesting. I wonder if it's
> possible to tell screen readers, for example, that the application
> running on e.g. German windows has an interface in English. Is it
> possible at all?

Yes, another case where support for this would be needed.

Jean-Marc Desperrier

unread,

Oct 31, 2007, 8:55:59 AM10/31/07

to

Rimas Kudelis wrote:
> [...]

> Actually, I think you're wrong about this particular case.
>
> Check http://www.unicode.org/charts/PDF/U3000.pdf, which depicts a few
> CJK punctuation symbols, like IDEOGRAPHIC COMMA and IDEOGRAPHIC FULL
> STOP, for example.

I'll tell you the little dirty secret of unicode :-)

Unicode is not perfect, sometimes the rules were not applied in a really
coherent manner, and a few of the characters encoded in unicode are
definitively errors. And punctuations, as well as spaces, are probably
the two most inconsistent areas.

> I tend to think, that if there actually exists a tradition to use an
> ellipsis character in Japanese, then perhaps there should be something
> like an IDEOGRAPHIC ELLIPSIS character in Unicode (similarly to the
> above cases).

In the ELLIPSIS case, unicode correctly applies it's rule of character
unification.

It's for the characters you cite that it doesn't.
Note that there probably is a very good reason for most of those
characters.
In addition to unification, unicode also as a rule of supporting
round-tripping of pre-unicode encodings.

I think each of those you cite already existed both in the JIS tables
and in ASCII. So to support round-trip of ASCII + JIS text, they had to
have a separate code point in unicode.

Ellipsis was already in JIS, but not in any of the basic western
encoding, so no compatibility need for separate encoding.

> Remember, that our actual problem is that the user runs an English
> version of Firefox (or any other Latin or even Cyrillic script-based
> language version anyway) on Japanese version of Windows. I can't say for
> sure, and I guess we should consult someone with a good knowledge in
> Japanese here, but perhaps the problem we're dealing with now is
> actually nothing but a bug in MS UI Gothic?

No, it's not. MS UI Gothic is displaying U2026 - ELLIPSIS with the
prefered glyph to use in association with japanese text, knowing that
U2026 is defined as the unicode code point corresponding to JISX0208
1-36 HORIZONTAL ELLIPSIS and the official glyph for JIS 1-36 is with the
middle dots. Which means that it's not the prefered glyph to use in
association with latin text.

The irony is that there is a unicode character with three middle dot,
U22EF, but it a mathematical symbol, not an ellipsis, therefore it's not
allowed to convert 'JISX0208 1-36 HORIZONTAL ELLIPSIS' to it.
Some old version of MacOS did it though:
http://hp.vector.co.jp/authors/VA010341/unicode/

Damjan Georgievski

unread,

Nov 1, 2007, 3:26:54 PM11/1/07

to

> No, the purpose of Unicode is to encode abstract characters, that are
> recognised as representing the same element, but can have very various
> graphical representations (glyphs) depending on both context and language.

...

> If you want a nice display, you need to carry that language an cultural
> preferences information into your display engine.

Does Firefox do this today? - on Linux?
How about Firefox 3?

I ask this because we have a similar problem, different glyphs for cyrillic
itallic letters in macedonian/serbian vs russian/bulgarian (I don't know
for other languages).

Also, does anyone know of a tool I could use to inspect if the font has
language specific glyphs?

--
damjan

Jean-Marc Desperrier

unread,

Nov 2, 2007, 9:01:02 PM11/2/07

to

Damjan Georgievski wrote:
>> No, the purpose of Unicode is to encode abstract characters, that are
>> recognised as representing the same element, but can have very various
>> graphical representations (glyphs) depending on both context and language.
> ...
>> If you want a nice display, you need to carry that language an cultural
>> preferences information into your display engine.
>
> Does Firefox do this today? - on Linux?
> How about Firefox 3?

It does it for the web content, but not for the interface where it uses
what it's told to do by the "OS" (more precisely by components in the
GUI layer that in some case are not fully part of the OS)

> I ask this because we have a similar problem, different glyphs for cyrillic
> itallic letters in macedonian/serbian vs russian/bulgarian (I don't know
> for other languages).

It sounds worth investigating, but I'm not an expert on this.
If I understand correctly, macedonian/serbian are cyrillic, so what
encoding is used for them ?
If it's the same encoding as for russian/bugarian, it makes things hard.
It could still be handled by adding a LANG attribute with the proper
code (mk, sr), and separating the current cyrillic setting in the option
into "cyrillic (russia/bulgaria)" and "cyrillic (macedonia/serbia)" to
allow setting different fonts for the two (tradionnal chines is already
similarly separated in two)

> Also, does anyone know of a tool I could use to inspect if the font has
> language specific glyphs?

It's supposed to work by selecting the font according to the language,
not by selecting language specific glyphs inside the font.

Jean-Marc Desperrier

unread,

Nov 5, 2007, 1:18:46 PM11/5/07

to

Jean-Marc Desperrier wrote:
> Damjan Georgievski wrote:
>[...]

>> I ask this because we have a similar problem, different glyphs for
>> cyrillic
>> itallic letters in macedonian/serbian vs russian/bulgarian (I don't know
>> for other languages).
>
> It sounds worth investigating, but I'm not an expert on this.
> If I understand correctly, macedonian/serbian are cyrillic, so what
> encoding is used for them ?
> If it's the same encoding as for russian/bugarian, it makes things hard.
> It could still be handled by adding a LANG attribute with the proper
> code (mk, sr), and separating the current cyrillic setting in the option
> into "cyrillic (russia/bulgaria)" and "cyrillic (macedonia/serbia)" to
> allow setting different fonts for the two (tradionnal chines is already
> similarly separated in two)

You know the problem you rise here seems very interesting.

It seems it will require some change in Firefox to handle it correctly,
but just make it known so that those changes can come.

Honestly it's now too late for firefox 3, but the problem must be
properly reported and described so that the changes can come as soon as
possible. I could help you report it in a bug, and justify what needs to
be changed to the i18n developpers.

Justin Wood (Callek)

unread,

Nov 10, 2007, 8:19:45 PM11/10/07

to

But the _application_ should be able to hint to the OS language it wants
the OS to pull the glyphs with regard to. (I would hope) and if a
correct language isn't present, then the fallback of the current language.

But somehow, I doubt current OS's and font designs allow for this
distinction. :/