Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Replacement of charsetalias.properties file.

42 views
Skip to first unread message

Jay

unread,
Nov 14, 2010, 9:29:59 AM11/14/10
to
Dear Developers,

Could you tell me how the charset alias works now?

Let me give you some background why I wanted to know. Previously, I
have written a small utility that runs on Mac OS X which tinkers the
charsetalias.properties file a bit to handle many "big5-HKSCS" encoded
web sites that declared themselves incorrectly as "big5". It simply
replaces the target encoding of big5 with big5-HKSCS instead. As big5-
HKSCS is a superset of big5, it shows big5 encoded site correctly as
well as misconfigured big5-HKSCS sites. I have just downloaded Firefox
4.0 beta 7 and discovered that the charsetalias.properties file is no
longer there. Could you tell me how the charset handling works in
Gecko 2.0?

My little utility can be downloaded from here:
http://www.macupdate.com/info.php/id/19216/i-speak-cantonese

Axel Hecht

unread,
Nov 14, 2010, 12:45:41 PM11/14/10
to

That moved into the compiled code,
https://bugzilla.mozilla.org/show_bug.cgi?id=563536.

No idea if there's anything left that allows you to tweak it.

That said, is there a bug filed on what you're trying to fix? Add-ons
like yours sound like something we shouldn't need.

Axel

Jean-Marc Desperrier

unread,
Nov 19, 2010, 9:07:34 AM11/19/10
to
Axel Hecht wrote:
> That moved into the compiled code,
> https://bugzilla.mozilla.org/show_bug.cgi?id=563536.
>
> No idea if there's anything left that allows you to tweak it.
>
> That said, is there a bug filed on what you're trying to fix? Add-ons
> like yours sound like something we shouldn't need.

It's not a great idea to have hard-coded those identifiers :-(

It's not just a matter of something that mozilla got wrong that's just
need to be fixed. There will be some identifier that about nobody uses,
that's it's a nonsense to include by defaut, and there will be a few
case where things have been done wrong, so that it's useful to be able
to override the default, even if the defaut is correct.

big5 is a case of a situation that can't quite be satisfyingly solved
with a hard coded solution. http://en.wikipedia.org/wiki/Big5 list at
least 10 different extensions to big5 that possibly could have been
truly used when a page is tagged as big5.

Of those HKSCS is probably the only one that's a real standard and has
currently a large usage. But what's if someone wants a warning when some
of the characters are not truly big5 and use the hkscs extension ? (a
taiwanese for exemple for whom hkscs is not a standard) And what if he
is handling some old content, that's truly HKSCS incompatible, where
he'd love to have big5 interpreted as something else than HKSCS ?

Having the identifiers defined in a ressource file that can be
overwritten to change them makes the situation significantly easier.

Jean-Marc Desperrier

unread,
Dec 29, 2010, 7:23:22 PM12/29/10
to
On 19/11/2010 15:07, Jean-Marc Desperrier wrote:
> Of those HKSCS is probably the only one that's a real standard and has
> currently a large usage. But what's if someone wants a warning when some
> of the characters are not truly big5 and use the hkscs extension ? [...]

>
> Having the identifiers defined in a ressource file that can be
> overwritten to change them makes the situation significantly easier.

I've found another case that's a lot more pertinent : Proprietary
extensions to the japanese S-JIS encoding for emoji characters.

There appear to be currently 3 such extensions, DoCoMo, KDDI and
SoftBank. Basically every japanese cell network has it's own proprietary
way of encoding emoji, documented in
http://www.unicode.org/Public/UNIDATA/EmojiSources.txt

As those emoji characters now are encoded inside the unicode standard,
those extension can't simply be ignored.

0 new messages