Replacement of charsetalias.properties file.

Jay

unread,

Nov 14, 2010, 9:29:59 AM11/14/10

to

Dear Developers,

Could you tell me how the charset alias works now?

Let me give you some background why I wanted to know. Previously, I
have written a small utility that runs on Mac OS X which tinkers the
charsetalias.properties file a bit to handle many "big5-HKSCS" encoded
web sites that declared themselves incorrectly as "big5". It simply
replaces the target encoding of big5 with big5-HKSCS instead. As big5-
HKSCS is a superset of big5, it shows big5 encoded site correctly as
well as misconfigured big5-HKSCS sites. I have just downloaded Firefox
4.0 beta 7 and discovered that the charsetalias.properties file is no
longer there. Could you tell me how the charset handling works in
Gecko 2.0?

My little utility can be downloaded from here:
http://www.macupdate.com/info.php/id/19216/i-speak-cantonese

Axel Hecht

unread,

Nov 14, 2010, 12:45:41 PM11/14/10

to

That moved into the compiled code,
https://bugzilla.mozilla.org/show_bug.cgi?id=563536.

No idea if there's anything left that allows you to tweak it.

That said, is there a bug filed on what you're trying to fix? Add-ons
like yours sound like something we shouldn't need.

Axel

Jean-Marc Desperrier

unread,

Nov 19, 2010, 9:07:34 AM11/19/10

to

Axel Hecht wrote:
> That moved into the compiled code,
> https://bugzilla.mozilla.org/show_bug.cgi?id=563536.
>
> No idea if there's anything left that allows you to tweak it.
>
> That said, is there a bug filed on what you're trying to fix? Add-ons
> like yours sound like something we shouldn't need.

It's not a great idea to have hard-coded those identifiers :-(

It's not just a matter of something that mozilla got wrong that's just
need to be fixed. There will be some identifier that about nobody uses,
that's it's a nonsense to include by defaut, and there will be a few
case where things have been done wrong, so that it's useful to be able
to override the default, even if the defaut is correct.

big5 is a case of a situation that can't quite be satisfyingly solved
with a hard coded solution. http://en.wikipedia.org/wiki/Big5 list at
least 10 different extensions to big5 that possibly could have been
truly used when a page is tagged as big5.

Of those HKSCS is probably the only one that's a real standard and has
currently a large usage. But what's if someone wants a warning when some
of the characters are not truly big5 and use the hkscs extension ? (a
taiwanese for exemple for whom hkscs is not a standard) And what if he
is handling some old content, that's truly HKSCS incompatible, where
he'd love to have big5 interpreted as something else than HKSCS ?

Having the identifiers defined in a ressource file that can be
overwritten to change them makes the situation significantly easier.

Jean-Marc Desperrier

unread,

Dec 29, 2010, 7:23:22 PM12/29/10

to

On 19/11/2010 15:07, Jean-Marc Desperrier wrote:
> Of those HKSCS is probably the only one that's a real standard and has
> currently a large usage. But what's if someone wants a warning when some

> of the characters are not truly big5 and use the hkscs extension ? [...]

>
> Having the identifiers defined in a ressource file that can be
> overwritten to change them makes the situation significantly easier.

I've found another case that's a lot more pertinent : Proprietary
extensions to the japanese S-JIS encoding for emoji characters.

There appear to be currently 3 such extensions, DoCoMo, KDDI and
SoftBank. Basically every japanese cell network has it's own proprietary
way of encoding emoji, documented in
http://www.unicode.org/Public/UNIDATA/EmojiSources.txt

As those emoji characters now are encoded inside the unicode standard,
those extension can't simply be ignored.