Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Cocoa emacs renders Unicode combining diacritics improperly

109 views
Skip to first unread message

Dan Maftei

unread,
Jul 16, 2012, 1:09:38 PM7/16/12
to help-gn...@gnu.org
This is regarding Emacs.app, built with --with-ns from GNU emacs 24.1.1 source.

Pre-composed characters (e.g. 0xf1, ñ) render perfectly, but combining diacritics (e.g. n\x303) render the character oddly. Namely, the diacritic very miniscule, and is not placed directly above the main character (but not entirely to the left or right either). However, a single glyph IS created, since C-f and C-b skip over the entire rendered character.

Oddly, dired correctly displays both pre-composed and de-composed characters, but other modes do not (org, shell, text modes, GNU's python.el inferior shell).

Of course, either method works fine in other appliations (Chrome, Terminal.app, LibreOffice, native OS X apps). Further, both methods work when running in non-windowed mode in Terminal.app.

Note that this is not precisely the problem described here (http://www.emacswiki.org/emacs/CarbonEmacsPackage#toc23)  In particular, dired works fine, non-windowed mode works fine, and in other modes, the main character and the diacritic ARE being composed, it's just the final glyph is rendered oddly.

Dan

Peter Dyballa

unread,
Jul 16, 2012, 4:16:48 PM7/16/12
to Dan Maftei, help-gn...@gnu.org

Am 16.07.2012 um 19:09 schrieb Dan Maftei:

> Pre-composed characters (e.g. 0xf1, ñ) render perfectly, but combining
> diacritics (e.g. n\x303) render the character oddly. Namely, the diacritic
> very miniscule, and is not placed directly above the main character (but
> not entirely to the left or right either). However, a single glyph IS
> created, since C-f and C-b skip over the entire rendered character.

Try again with a font having the COMBINING accents! You probably used an inadequate font so that GNU Emacs had to use two different fonts. Which you can check yourself by putting the text cursor on the basic character or on the accent and typing each time C-u C-x =.

For tests you could try to use Lucida Grande, a quite rich font. A good mono-spaced font is DejaVu Sans Mono. Lucida Sans Typewriter from Java is also a pretty good candidate. (Create a library in Font Book and populate it with the TT fonts!)

--
Greetings

Pete

Engineer: a mechanism for converting caffeine into designs


Dan Maftei

unread,
Jul 16, 2012, 6:15:18 PM7/16/12
to Peter Dyballa, help-gn...@gnu.org
On Mon, Jul 16, 2012 at 9:16 PM, Peter Dyballa <Peter_...@web.de> wrote:

Am 16.07.2012 um 19:09 schrieb Dan Maftei:

> Pre-composed characters (e.g. 0xf1, ñ) render perfectly, but combining
> diacritics (e.g. n\x303) render the character oddly. Namely, the diacritic
> very miniscule, and is not placed directly above the main character (but
> not entirely to the left or right either). However, a single glyph IS
> created, since C-f and C-b skip over the entire rendered character.

Try again with a font having the COMBINING accents! You probably used an inadequate font so that GNU Emacs had to use two different fonts. Which you can check yourself by putting the text cursor on the basic character or on the accent and typing each time C-u C-x =.

I was not clear enough.

It is not possible to "put the text cursor on the basic character or on the accent" because the two are combined.

My font is apple-monaco for both pre-composed and compositional characters.

I have uploaded images of both compositional (http://i.imgur.com/yWPmv.png) and pre-composed (http://i.imgur.com/5Kfy2.png) characters. I've included describe-char output as well.

Notice: the combining diacritic is rendered extremely small, and off-center. It doesn't even look like a tilde in the final glyph.
 

For tests you could try to use Lucida Grande, a quite rich font. A good mono-spaced font is DejaVu Sans Mono. Lucida Sans Typewriter from Java is also a pretty good candidate. (Create a library in Font Book and populate it with the TT fonts!)
np

The default Cocoa emacs font (apple-monaco) properly renders combining characters in TextEdit. Nevertheless, I tried Lucida Grande, and it did not fix the issue.

Cheers,
Dan

Peter Dyballa

unread,
Jul 16, 2012, 7:09:48 PM7/16/12
to Dan Maftei, help-gn...@gnu.org

Am 17.07.2012 um 00:15 schrieb Dan Maftei:

> Notice: the combining diacritic is rendered extremely small, and
> off-center. It doesn't even look like a tilde in the final glyph.

This "combined diacritic" is actually LATIN SMALL LETTER N. That's what the *Help* buffer says. You can also see this when reading the "preferred charset" value.

How is it when you launch GNU Emacs without customisation? This can be achieved from the command line as "<path to>/Emacs.app/Contents/MacOS/Emacs -Q &".

With Cmd-T the Mac OS X font chooser comes up and you can select a font and also its size. C-h H produces the *HELLO* buffer with a few interesting scripts…

--
Greetings

Pete

Eternity is a terrible thought. I mean, where's it going to end?
- Tom Stoppard


Dan Maftei

unread,
Jul 16, 2012, 7:23:12 PM7/16/12
to Peter Dyballa, help-gn...@gnu.org
On Tue, Jul 17, 2012 at 12:09 AM, Peter Dyballa <Peter_...@web.de> wrote:

Am 17.07.2012 um 00:15 schrieb Dan Maftei:

> Notice: the combining diacritic is rendered extremely small, and
> off-center. It doesn't even look like a tilde in the final glyph.

This "combined diacritic" is actually LATIN SMALL LETTER N. That's what the *Help* buffer says. You can also see this when reading the "preferred charset" value.

Yeah, I wasn't sure how to interpret that. It is indeed ASCII 0x6e but what to make of 'Composed with the following character(s) "~"'?

Since combining characters work in non-windowed mode, I tried to look at describe-char output running emacs -nw -Q but describe-char on combining characters causes a fatal error. >.< I sent a bug report.


How is it when you launch GNU Emacs without customisation? This can be achieved from the command line as "<path to>/Emacs.app/Contents/MacOS/Emacs -Q &".

I always test such things without loading .emacs before posting to mailing lists. :-)  So, same results.
 

With Cmd-T the Mac OS X font chooser comes up and you can select a font and also its size. C-h H produces the *HELLO* buffer with a few interesting scripts…


As I mentioned, other fonts produce the same results, including the ones you mentioned (e.g. Lucida Grande). Further, the default font, Monaco, has support for combining characters (I tested in TextEdit).

Not sure what I'm to do with the *HELLO* buffer. Everything is rendered properly. I imagine most diacritics were displayed on pre-composed characters (I only checked a few).

Cheers,
Dan

Peter Dyballa

unread,
Jul 17, 2012, 5:51:44 AM7/17/12
to Dan Maftei, help-gn...@gnu.org

Am 17.07.2012 um 01:23 schrieb Dan Maftei:

> Yeah, I wasn't sure how to interpret that. It is indeed ASCII 0x6e but what
> to make of 'Composed with the following character(s) "~"'?

It obviously reports that you tried to compose ñ, which failed. So you have the combining accent character and the n character side by side. How did you try to compose?

>
> Since combining characters work in non-windowed mode, I tried to look at
> describe-char output running emacs -nw -Q but describe-char on combining
> characters causes a fatal error. >.< I sent a bug report.

In Terminal you are using Terminal's ability to display Unicode characters. GNU Emacs is just a guest there. It's different when it uses its own windows.

The NS variant uses a lot of Emacs software to, for example, render text. There is a set of patches and some extra source files available at /ftp:anon...@ftp.math.s.chiba-u.ac.jp:/ (in TRAMP notation). This set, emacs-24.1-mac-3.0.tar.gz, plus the released code for GNU Emacs 24.1 build, when configured --with-mac, together the "AppKit Emacs" which is much more integrated into Mac OS X, uses much more of Mac OS X than GNU Emacs. Try it! (You need to compile and install it yourself.)

--
Greetings

Pete

It isn't pollution that's harming the environment. It's the impurities in our air and water that are doing it.


Dan Maftei

unread,
Jul 17, 2012, 8:52:49 AM7/17/12
to Peter Dyballa, help-gn...@gnu.org
On Tue, Jul 17, 2012 at 10:51 AM, Peter Dyballa <Peter_...@web.de> wrote:

Am 17.07.2012 um 01:23 schrieb Dan Maftei:

> Yeah, I wasn't sure how to interpret that. It is indeed ASCII 0x6e but what
> to make of 'Composed with the following character(s) "~"'?

It obviously reports that you tried to compose ñ, which failed. So you have the combining accent character and the n character side by side. How did you try to compose?

I'm not convinced mine has failed. For one, the diacritic and root character are not side-by-side. Perhaps I'm wrong, but if they were truly side-by-side, wouldn't I be able to put point on either the accent or the root character? 

Here's how to make ñ compositionally:

n C-x 8 <RET> 0303 <RET>

Could you run describe-char on a compositional character and post the results? I want to see how it differs from my output. (Presuming, of course, that your emacs renders them correctly :-)
 

>
> Since combining characters work in non-windowed mode, I tried to look at
> describe-char output running emacs -nw -Q but describe-char on combining
> characters causes a fatal error. >.< I sent a bug report.

In Terminal you are using Terminal's ability to display Unicode characters. GNU Emacs is just a guest there. It's different when it uses its own windows.

The NS variant uses a lot of Emacs software to, for example, render text. There is a set of patches and some extra source files available at /ftp:anon...@ftp.math.s.chiba-u.ac.jp:/ (in TRAMP notation). This set, emacs-24.1-mac-3.0.tar.gz, plus the released code for GNU Emacs 24.1  build, when configured --with-mac, together the "AppKit Emacs" which is much more integrated into Mac OS X, uses much more of Mac OS X than GNU Emacs. Try it! (You need to compile and install it yourself.)

Thanks for the patches. I've applied them to the 24.1.1 source but make segfaults when compiling profile.c. I don't have the time to fix this unfortunately.

I presume you use emacs on OS X? Did you build it using this patch? Do compositional characters work? Further, if you have the time, could you build the regular source --with-ns and see if they work there? Perhaps the issue is with my OS.

Cheers,
Dan

Peter Dyballa

unread,
Jul 17, 2012, 5:15:20 PM7/17/12
to Dan Maftei, help-gn...@gnu.org

Am 17.07.2012 um 14:52 schrieb Dan Maftei:

>
> Here's how to make ñ compositionally:
>
> n C-x 8 <RET> 0303 <RET>

I perform this much simple: ~n. ~ is on me German keyboard combining. The same is true for ´,`, ^, ¨.

>
> Could you run describe-char on a compositional character and post the
> results? I want to see how it differs from my output. (Presuming, of
> course, that your emacs renders them correctly :-)

This is from the NS variant of GNU Emacs 23.4:

character: ñ (241, #o361, #xf1)
preferred charset: iso-8859-1 (Latin-1 (ISO/IEC 8859-1))
code point: 0xF1
syntax: w which means: word
category: .:Base, j:Japanese, l:Latin
buffer code: #xC3 #xB1
file code: #xC3 #xB1 (encoded by coding system utf-8-unix)
display: by this font (glyph code)
nil:-apple-Lucida_Sans_Typewriter-medium-normal-normal-*-9-*-*-*-m-0-iso10646-1 (#x78)

Character code properties: customize what to show
name: LATIN SMALL LETTER N WITH TILDE
general-category: Ll (Letter, Lowercase)
canonical-combining-class: 0 (Spacing, split, enclosing, reordrant, and Tibetan subjoined)
decomposition: (110 771) ('n' '̃')

There are text properties here:
fontified t

and this is from the NS variant of GNU Emacs 24.1:

character: ñ (displayed as ñ) (codepoint 241, #o361, #xf1)
preferred charset: iso-8859-1 (Latin-1 (ISO/IEC 8859-1))
code point in charset: 0xF1
syntax: w which means: word
category: .:Base, L:Left-to-right (strong), j:Japanese, l:Latin
buffer code: #xC3 #xB1
file code: #xC3 #xB1 (encoded by coding system utf-8-unix)
display: by this font (glyph code)
nil:-apple-Menlo-medium-normal-normal-*-9-*-*-*-m-0-iso10646-1 (#xB3)

Character code properties: customize what to show
name: LATIN SMALL LETTER N WITH TILDE
general-category: Ll (Letter, Lowercase)
canonical-combining-class: 0 (Spacing, split, enclosing, reordrant, and Tibetan subjoined)
decomposition: (110 771) ('n' '̃')

There are text properties here:
fontified t

You can see the different "character:" lines and font (type) descriptions.


This comes from the "AppKit Emacs":

character: ñ (displayed as ñ) (codepoint 241, #o361, #xf1)
preferred charset: iso-8859-1 (Latin-1 (ISO/IEC 8859-1))
code point in charset: 0xF1
syntax: w which means: word
category: .:Base, L:Left-to-right (strong), j:Japanese, l:Latin
buffer code: #xC3 #xB1
file code: #x6E #xCC #x83 (encoded by coding system utf-8-hfs-unix)
display: by this font (glyph code)
mac-ct:-*-Monaco-normal-normal-normal-*-10-*-*-*-m-0-iso10646-1 (#x78)

Character code properties: customize what to show
name: LATIN SMALL LETTER N WITH TILDE
general-category: Ll (Letter, Lowercase)
canonical-combining-class: 0 (Spacing, split, enclosing, reordrant, and Tibetan subjoined)
decomposition: (110 771) ('n' '̃')

There are text properties here:
fontified t

You can see that the two 24.1 versions use different coding systems.


>
> Thanks for the patches. I've applied them to the 24.1.1 source but make
> segfaults when compiling profile.c. I don't have the time to fix this
> unfortunately.

I wrote "GNU Emacs 24.1" and YAMAMOTO Mitsuharu mentions in NEWS-mac at its top:

* emacs-24.1-mac-3.0 (2012-06-10)
Based on Emacs 24.1.

So using the sources for GNU Emacs 24.1.1 is not correct. Use the sources from the official GNU Emacs 24.1 release!

>
> I presume you use emacs on OS X? Did you build it using this patch? Do
> compositional characters work?

Three times: yes.

> Further, if you have the time, could you build the regular source --with-ns and see if they work there? Perhaps the issue is with my OS.

It works. Your fault is that you try to use an Emacs input method, which is not necessary. Just use your keyboard and its own dead (combining) accents! If I try to use your input method I get:

character: n (displayed as n) (codepoint 110, #o156, #x6e)
preferred charset: ascii (ASCII (ISO646 IRV))
code point in charset: 0x6E
syntax: w which means: word
category: .:Base, L:Left-to-right (strong), a:ASCII, l:Latin, r:Roman
buffer code: #x6E
file code: #x6E (encoded by coding system utf-8-unix)
display: composed to form "ñ" (see below)

Composed with the following character(s) "̃" using this font:
nil:-apple-Menlo-medium-normal-normal-*-9-*-*-*-m-0-iso10646-1
by these glyphs:
[0 1 110 81 5 0 4 5 0 nil]
[0 1 771 648 5 0 3 1 0 [-4 0 0]]

Character code properties: customize what to show
name: LATIN SMALL LETTER N
general-category: Ll (Letter, Lowercase)
canonical-combining-class: 0 (Spacing, split, enclosing, reordrant, and Tibetan subjoined)
decomposition: (110) ('n')

There are text properties here:
fontified t

The combined character looks quite good with Menlo on Snow Leopard but as awful as your screenshot with Monaco (differently awful with Lucida Sans Typewriter). In the "AppKit Emacs" with Monaco the accented character looks exactly like the ~n composed character and is described as:

character: n (displayed as n) (codepoint 110, #o156, #x6e)
preferred charset: ascii (ASCII (ISO646 IRV))
code point in charset: 0x6E
syntax: w which means: word
category: .:Base, L:Left-to-right (strong), a:ASCII, l:Latin, r:Roman
buffer code: #x6E
file code: #x6E (encoded by coding system utf-8-hfs-unix)
display: composed to form "ñ" (see below)

Composed with the following character(s) "̃" using this font:
mac-ct:-*-Monaco-normal-normal-normal-*-10-*-*-*-m-0-iso10646-1
by these glyphs:
[0 1 110 120 6 0 6 8 0 nil]

Character code properties: customize what to show
name: LATIN SMALL LETTER N
general-category: Ll (Letter, Lowercase)
canonical-combining-class: 0 (Spacing, split, enclosing, reordrant, and Tibetan subjoined)
decomposition: (110) ('n')

There are text properties here:
fontified t


--
Greetings

Pete

The best way to accelerate a PC is 9.8 m/s²


Dan Maftei

unread,
Jul 17, 2012, 6:16:08 PM7/17/12
to Peter Dyballa, help-gn...@gnu.org
Aha, let me clarify something: there are some sounds in the world's languages that cannot be created in any other way except using combining characters. For example, the voiced alveolar lateral affricate (d͡ɮ 0x64 0x361 0x26E). Theoretically, I need to place any of the combining characters in the Unicode Spacing Modifier Letters block (0x2B0 through 0x2FF) onto any other character. Therefore, as a linguist, I quite literally need the combining method to work. :-)

Thanks for the describe-char output. Mine is working as expected. (un-related: can you reproduce this bug? http://lists.gnu.org/archive/html/bug-gnu-emacs/2012-07/msg00615.html)

So, I'm only a little better off than I started. Thanks for the Menlo font tip: it does render ñ better. It's still off on, e.g., the afore-mentioned voiced alveolar lateral affricate. I will look through the other fonts.

I guess there's nothing to do but try and integrate Yamamoto's patches into the emacs trunk. This is clearly a bug with the way NS emacs interacts with OS X.

Speaking of the patch, I didn't download 24.1.1 on purpose: the newest official version as obtained from the FTP site (http://ftp.gnu.org/gnu/emacs/) is listed as 24.1, which is what I used. I'm not sure if a version 24.1.0 even exists... but I haven't dug through the source repository much.

Cheers,
Dan

Peter Dyballa

unread,
Jul 17, 2012, 7:10:16 PM7/17/12
to Dan Maftei, help-gn...@gnu.org

Am 18.07.2012 um 00:16 schrieb Dan Maftei:

> Thanks for the describe-char output. Mine is working as expected.
> (un-related: can you reproduce this bug?
> http://lists.gnu.org/archive/html/bug-gnu-emacs/2012-07/msg00615.html)

Not exactly. I get, in tcsh:

Fatal error (11)Abort (core dumped)
Exit 134


> Speaking of the patch, I didn't download 24.1.1 on purpose: the newest
> official version as obtained from the FTP site (
> http://ftp.gnu.org/gnu/emacs/) is listed as 24.1, which is what I used. I'm
> not sure if a version 24.1.0 even exists... but I haven't dug through the
> source repository much.

The version number 24.1.1 has in the first two fields the major and the minor version number of GNU Emacs. GNU Emacs releases are identified by the major and minor version number. The last or right-most field is the number of your build, in this case your first try.

So you used the right sources. Are you used to working on the command line? And to patch source files? Did you save the output from running configure? (In that case you could send me privately the output.)

By putting that to your customisation:

'(read-quoted-char-radix 16)

you could insert any Unicode character by typing C-q <the hex number><something non-hex like cursor movement, SPACE, RET, ESC)>. This worked to produce d͡.

--
Mit friedvollen Grüßen

Pete

War springs from unseen and generally insignificant causes.
– Anonymous


Dan Maftei

unread,
Jul 17, 2012, 8:03:51 PM7/17/12
to Peter Dyballa, help-gn...@gnu.org
On Wed, Jul 18, 2012 at 12:10 AM, Peter Dyballa <Peter_...@web.de> wrote:

Am 18.07.2012 um 00:16 schrieb Dan Maftei:

> Thanks for the describe-char output. Mine is working as expected.
> (un-related: can you reproduce this bug?
> http://lists.gnu.org/archive/html/bug-gnu-emacs/2012-07/msg00615.html)

Not exactly. I get, in tcsh:

        Fatal error (11)Abort (core dumped)
        Exit 134


> Speaking of the patch, I didn't download 24.1.1 on purpose: the newest
> official version as obtained from the FTP site (
> http://ftp.gnu.org/gnu/emacs/) is listed as 24.1, which is what I used. I'm
> not sure if a version 24.1.0 even exists... but I haven't dug through the
> source repository much.

The version number 24.1.1 has in the first two fields the major and the minor version number of GNU Emacs. GNU Emacs releases are identified by the major and minor version number. The last or right-most field is the number of your build, in this case your first try.

So you used the right sources. Are you used to working on the command line? And to patch source files? Did you save the output from running configure? (In that case you could send me privately the output.)

Yes, I am. I'll reply privately.
 

By putting that to your customisation:

        '(read-quoted-char-radix 16)

you could insert any Unicode character by typing C-q <the hex number><something non-hex like cursor movement, SPACE, RET, ESC)>. This worked to produce d͡.

C-x <RET> <4-digit hexadecimal code point> seems to have the exact same results. (Is it still a "digit" if it's hexadecimal? :)

However, as expected, there is still the rendering problem with combining characters, whichever input method I use.

As I said, I think there's nothing else to do but find a nice font as a temporary workaround (or use the mac patch you linked). I will hold off a little before filing a bug, but after this discussion, I can't see how it's anything but.

Cheers,
Dan

Peter Dyballa

unread,
Jul 18, 2012, 4:48:38 AM7/18/12
to Dan Maftei, help-gn...@gnu.org

Am 18.07.2012 um 02:03 schrieb Dan Maftei:

> C-x <RET> <4-digit hexadecimal code point> seems to have the exact same
> results. (Is it still a "digit" if it's hexadecimal? :)

Yes, has to be! By definition. They're hexadecimal *numbers* and numbers consist of *digits*.

--
Greetings

Pete

They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.
-Benjamin Franklin, Historical Review of Pennsylvania


0 new messages