Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Some but not all Chinese chars rendered incorrectly

5 views
Skip to first unread message

John Chambers

unread,
Apr 15, 2007, 10:33:00 AM4/15/07
to
I first noticed problems with the page "http://cojak.org/" while
working with seamonkey and firefox on my Mac, but just now noticed
that Camino is an even better illustration. All the mozilla-suite
browsers (mozilla, FF, SM, NS and Camino) render some of the Chinese
characters incorrectly, while both safari and opera do them correctly.
The same page seems to work fine in the mozilla-suite browsers on our
linux and Windows boxes; the problem is only on my Mac. I've checked
with the View -> Text Encoding menu, and all the browsers say that
the page is UTF-8, including camino.

The page has a rather simple table of Chinese "radicals", the chars
that are the pieces used to build most other Chinese characters. So
it should be fairly comprehensible to people who don't know much about
East Asian writing systems. There are two kinds of characters in the
table. Most are dark on a light background, and are Chinese characters.
A few are light on a dark background, and are circled numbers in the
Enclosed Alphanumerics block of characters starting at U+2460, and
are the stroke counts of the characters (which is useful if you're
looking things up in a dictionary but otherwise not too interesting).

These latter characters are all rendered incorrectly by all the mozilla
browsers on my Mac. The circled-1 char in the first cell, U+2460, is
drawn as "!". Now, "!" has hex code 21, which doesn't seem to have any
obvious derivation from hex 2460 or from the UTF-8 encoding E2 91 A0.

The next 4 chars are rendered correctly by all these browsers. All but
camino render the next char, 4E28, correctly as a vertical line. Camino
renders it as two glyphs: y-umlaut and not-equal. There's nothing special
about this char; U+4E28 (UTF-8 E4 B8 A8) is "just a character" in the CJK
Unified Ideographs block. Why would it be displayed

The 7th char in the table is drawn correctly by all my browsers. But the
8th char, U+2461 (circled-2) is drawn as a right-double-quote by all the
mozilla browsers, including Camino. The rest of the first row is drawn
correctly by all the browsers. The second char in the second row is the
next that has a problem. It is U+8BA0 (simplified Speech, aka radical 149).
Only Camino draws it wrong, as a slash and a raised-dot. Then a string of
correctly-rendered characters, until the circled-3 appears as "#".

It's possible that I have something configured wrong, though this seems
unlikely. I haven't configured camino at all; it's just as it came "out
of the box". It says that it's Version 2005122909 (1.0b2). The other
mozilla browsers are all up to date, I think; at least they've all been
upgraded recently (except mozilla of course). And no amount of tweaking
of any of the mozilla browsers seems to change how they render the chars
in this table.

Also, a piece of interesting evidence is that I can copy any of these
characters from any of the browsers to other Mac apps, and they display
correctly. I just copied the first 8 chars of the first row to a Terminal
window, and they are all rendered correctly there. So camino has the
correct UTF-8 codes behind those incorrect glyphs. They're just drawn
with the wrong pixels.

Does anyone have any good ideas for diagnosing and fixing this problem?

I've seen similar problems in lots of other documents, but this simple
HTML table with one hex-encoded char per cell, seems to be a good test
case that should be fairly easy to diagnose. The HTML (XHTML) isn't at
all tricky, and the CSS is rather minimal.

Anyone have clues?


John Chambers

unread,
Apr 16, 2007, 11:40:29 AM4/16/07
to
Since my copy of camino wasn't up to date, I downloaded
the latest version and installed it. It identifies itself
as "Version 2007022813 (1.0.4Int)". It does the same sort
of incorrect renderings of the cojak.org page as the older
version did.

What's especially weird is that it gets most of the chars
in the table right. There's no obvious pattern to the Han
characters that it shows as mojibake. But it does mess up
all the "circled numbers", the chars that are white on a
dark background.

Smokey Ardisson

unread,
Apr 16, 2007, 12:03:33 PM4/16/07
to
This was probably bug 212745 <https://bugzilla.mozilla.org/
show_bug.cgi?id=212745>; at any rate, the Enclosed Alphanumerics
render correctly on the trunk (i.e., code for Camino 2 and Firefox 3)
and the radicals don't look terribly crazy at a quick glance. You can
try the work-around mentioned in the bug and see if it fixes things in
the current versions.

John Chambers

unread,
Apr 17, 2007, 1:00:38 PM4/17/07
to

Hmmm ... I read that, but it's not obvious that it's the same bug.
part of the problem, of course, is that they only use the card-suit
symbols in their examples. But the display comes out as single
glyphs, all straight lines in some orientation. What I see with
the CJK characters is that the ones that are misrendered appear
as two glyphs, mostly an accented Latin letter and a punctuation
mark (some as two punctuations). The circled-number chars do show
as one glyph, but they all seem to be ASCII symbols.

Anyway, I downloaded the daily build and ran it. It does the same
misrendering as the "Stable 1.0.4" version. It also identifies itself
as "Version 2007041622 (1.1b+)", FWIW. I don't seem to see any mention
of a "Camino 2". What does it refer to? Where do you get it?

Maybe I should register with bugzilla. But first, I suppose I should
try to learn how to use it, and I see that bug 212745 doesn't seem
to contain any of the keywords that I'd have used. I don't think I
would have ever found it, and it's at least partially relevant. Is
there any sort of advice for locating a given bug when you don't know
what words might have been used? I'd be a bit wary of barging in and
wasting people's time by merely repeating earlier bug reports in my
own different words.

0 new messages