>> If you need a specific locale (as seems from "mksh", not
>> sure if it is a bug in that program), you need to set it.
>>
>> You can only set a locale on a glibc-based system if it's
>> installed beforehand, which root needs to do.
This is of course a horrid bug. I'm fighting it right now.
I install a zam.mo file, nothing else, and I damn well expect
that file to get used for messages! Obviously, it's UTF-8.
Obviously, I expect towupper() to follow Unicode defaults.
> You can build-depend on the locales package and generate the locales
> you want locally, using LOCPATH to reference them. There's no need
> for Debian to guarantee the presence of a particular locale ahead of
> time - particularly one that isn't actually useful to end users,
> as C.UTF-8 would be.
Unless plain "C" goes UTF-8, that's exactly the locale I need.
The stupid broken en_US.UTF-8 fucks up the sort order.
Granted, fixing en_US.UTF-8 would be sweet, but it may be far too late.
We really need a do-nothing locale that follows the Unicode spec
using the UTF-8 encoding. We could also use a do-nothing locale
that follows the Unicode spec using the Latin-1 encoding.
--
To UNSUBSCRIBE, email to debian-bugs-...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
>Unless plain "C" goes UTF-8
Not going to happen, it’s not binary-safe. (I fought that in
MirBSD with the OPTU-8/16 encoding scheme.)
>The stupid broken en_US.UTF-8 fucks up the sort order.
So true… (and paper size!)
>We really need a do-nothing locale that follows the Unicode spec
>using the UTF-8 encoding.
Yes, my proposal exactly.
>We could also use a do-nothing locale
>that follows the Unicode spec using the Latin-1 encoding.
No, for two reasons:
① legacy encodings must die
② then you need one for EVERY legacy encoding (why special-case one?)
bye,
//mirabilos
--
Sometimes they [people] care too much: pretty printers [and syntax highligh-
ting, d.A.] mechanically produce pretty output that accentuates irrelevant
detail in the program, which is as sensible as putting all the prepositions
in English text in bold font. -- Rob Pike in "Notes on Programming in C"
Why not? Note that usual functions work on bytes, not on characters, and
on POSIX utilities the old/classical options work on bytes by default.
POSIX introduced new options for characters. E.g. the -c in 'wc' means
really bytes, not characters (which is given by -m). Not so logical, but
compatible with the expected old behaviour.
POSIX was discussing if is is "legal" to have a UTF-8 POSIX/C locale.
IIRC the doubts was about the language in the standard, not about real
problems. OTOH they acknowledged that real bugs could appear.
OTOH I use by default the UTF-8 locale, because I don't expect that
Debian will corrupt my data. And I think system utilities will do
the right things with locale.
I start to think that moving C to UTF-8 will be the real simpler and
faster way to *hide* most of the encoding bugs.
ciao
cate
>> Not going to happen, it’s not binary-safe. (I fought that in
>> MirBSD with the OPTU-8/16 encoding scheme.)
>
> Why not? Note that usual functions work on bytes
Not really.
The difference between 'tr u x' on binary files can, depending on
the implementation of tr (if it does 'tr ¥ €' correctly in an UTF-8
locale), trash it because it must use mbsrtowcs then, which is, by
POSIX, required to fail for non-representable strings.
In MirBSD, we have solved that by clever use of the PUA.
//mirabilos
--
Sometimes they [people] care too much: pretty printers [and syntax highligh-
ting, d.A.] mechanically produce pretty output that accentuates irrelevant
detail in the program, which is as sensible as putting all the prepositions
in English text in bold font. -- Rob Pike in "Notes on Programming in C"
--