Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#432563: locales: norwegian locale has started treating aa as å in regex

2 views
Skip to first unread message

Håvard Moen

unread,
Jul 10, 2007, 11:30:15 AM7/10/07
to
Package: locales
Version: 2.5-9
Severity: normal
Tags: l10n

When upgrading exim4 to 4.67-5 a strange behavior with regex in sed
was seen. It seems that aa has started to be treated as å in character
class matching:
$locale
LANG=nb_NO.UTF-8
LANGUAGE=en_US:en_GB:en
LC_CTYPE="nb_NO.UTF-8"
LC_NUMERIC=en_US.UTF-8
LC_TIME="nb_NO.UTF-8"
LC_COLLATE="nb_NO.UTF-8"
LC_MONETARY="nb_NO.UTF-8"
LC_MESSAGES=en_US.UTF-8
LC_PAPER="nb_NO.UTF-8"
LC_NAME="nb_NO.UTF-8"
LC_ADDRESS="nb_NO.UTF-8"
LC_TELEPHONE="nb_NO.UTF-8"
LC_MEASUREMENT="nb_NO.UTF-8"
LC_IDENTIFICATION="nb_NO.UTF-8"
LC_ALL=
$echo "petrus.haavard.name" | sed 's/[^-0-9a-zA-Z\/\.!*@_~:;< ]/_/g'
petrus.h_vard.name
$export LC_ALL=C
$echo "petrus.haavard.name" | sed 's/[^-0-9a-zA-Z\/\.!*@_~:;< ]/_/g'
petrus.haavard.name

Aa should be treated as å when sorting, but this behavior seems wrong.
See also bug 430391.

-- System Information:
Debian Release: lenny/sid
APT prefers testing
APT policy: (990, 'testing'), (500, 'stable'), (200, 'unstable')
Architecture: i386 (i686)

Kernel: Linux 2.6.21-2-k7 (SMP w/1 CPU core)
Locale: LANG=nb_NO.UTF-8, LC_CTYPE=nb_NO.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages locales depends on:
ii debconf [debconf-2.0] 1.5.13 Debian configuration management sy
ii libc6 [glibc-2.5-1] 2.5-9+b1 GNU C Library: Shared libraries

locales recommends no packages.

-- debconf information:
* locales/default_environment_locale: nb_NO.UTF-8
* locales/locales_to_be_generated: en_US ISO-8859-1, en_US.UTF-8 UTF-8, nb_NO ISO-8859-1, nb_NO.UTF-8 UTF-8, no_NO ISO-8859-1, no_NO.UTF-8 UTF-8

Oleg Verych

unread,
Jul 10, 2007, 3:20:11 PM7/10/07
to
* Haavard Moen (Tue, 10 Jul 2007 17:22:44 +0200)

> $echo "petrus.haavard.name" | sed 's/[^-0-9a-zA-Z\/\.!*@_~:;< ]/_/g'
> petrus.h_vard.name
> $export LC_ALL=C
> $echo "petrus.haavard.name" | sed 's/[^-0-9a-zA-Z\/\.!*@_~:;< ]/_/g'
> petrus.haavard.name

IMHO this is subject of the `sed` package and its maintainer(s), because
of experience, documentation they may have know better.

> Aa should be treated as Е when sorting, but this behavior seems wrong.
> See also bug 430391.

As Marc noted there:

| This looks like a sed or libc issue for me:

Thus, this bug.

But lets look closer to `update-exim4.conf'. That *ASCII* checking routine
doesn't set LANG=C and LC_ALL=C before its processing, and this is a bug
instead.
____

0 new messages