Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#82819: marked as done (strcoll sorts wrong with locales)

0 views
Skip to first unread message

Debian Bug Tracking System

unread,
May 21, 2001, 10:07:53 AM5/21/01
to
Your message dated Mon, 21 May 2001 09:13:52 -0400
with message-id <2001052109...@visi.net>
and subject line Bug#74611: sort order
has caused the attached Bug report to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what I am
talking about this indicates a serious mail system misconfiguration
somewhere. Please contact me immediately.)

Darren Benham
(administrator, Debian Bugs database)

--------------------------------------
Received: (at submit) by bugs.debian.org; 19 Jan 2001 07:33:28 +0000
From to...@junk.nocrew.org Fri Jan 19 01:33:28 2001
Return-path: <to...@junk.nocrew.org>
Received: from junk.nocrew.org [::ffff:212.73.17.42]
by master.debian.org with esmtp (Exim 3.12 1 (Debian))
id 14JW35-0004a4-00; Fri, 19 Jan 2001 01:33:27 -0600
Received: from tomas by junk.nocrew.org with local (Exim 3.12 #1 (Debian))
for sub...@bugs.debian.org
id 14JW33-0004xN-00; Fri, 19 Jan 2001 08:33:25 +0100
To: sub...@bugs.debian.org
Subject: strcoll sorts wrong with locales
From: Tomas Berndtsson <to...@nocrew.org>
Date: 19 Jan 2001 08:33:25 +0100
Message-ID: <80hf2wo...@junk.nocrew.org>
Lines: 65
User-Agent: Gnus/5.0803 (Gnus v5.8.3) Emacs/20.7
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Sender: Tomas Berndtsson <to...@junk.nocrew.org>
Delivered-To: sub...@bugs.debian.org

Package: locales
Version: 2.2.1-1

I have this small test program, which compares two strings, and print
out the value of the comparison:

------
#include <stdio.h>
#include <string.h>
#include <locale.h>

int main(int argc, char *argv[])
{
int i;
char *locale_set;

locale_set = setlocale(LC_COLLATE, "");
printf("locale set: %s\n", locale_set);

i = strcoll(argv[1], argv[2]);
printf("%d\n", i);

return 0;
}
------


Now, study this:

tomas@penne:~/src$ ./localetest "ab, c" "a, bc"
locale set: C
54
tomas@penne:~/src$ ./localetest "ab, c" "a, c"
locale set: C
54
tomas@penne:~/src$ LANG=en_US ./localetest "ab, c" "a, bc"
locale set: en_US
1
tomas@penne:~/src$ LANG=en_US ./localetest "ab, c" "a, c"
locale set: en_US
-1

What happens here, is that, when using some other locale than C (I've
tried this with en_US, en_GB and sv_SE), the strcoll() call skips over
the comma and the space when comparing the two strings. This means
that "a, bc" is sorted before "ab, c", but "a, c" is sorted after
"ab, c". The three string would be sorted like:

a, bc
ab, c
a, c

I have never seen any such list of string get sorted in this
manner. PostgreSQL seems to use strcoll() when ordering the
selections, and it therefore gives this weird sorting order.
I cannot use C as locale for PostgreSQL, because I need swedish sort
order of едц, which is sorted as дец in regular iso-8859-1.

This does not happen with libc6/locales 2.1.3, which seems to have a
different handling for locales.


Greetings,

Tomas

---------------------------------------
Received: (at 82819-done) by bugs.debian.org; 21 May 2001 13:14:11 +0000
From b...@visi.net Mon May 21 08:14:11 2001
Return-path: <b...@visi.net>
Received: from ppp33.ts3-2.newportnews.visi.net (blimpo.internal.net) [209.8.198.161]
by master.debian.org with esmtp (Exim 3.12 1 (Debian))
id 151pVh-0003O3-00; Mon, 21 May 2001 08:14:10 -0500
Received: from bmc by blimpo.internal.net with local (Exim 3.22 #1 (Debian))
id 151pVQ-0007qW-00; Mon, 21 May 2001 09:13:52 -0400
Date: Mon, 21 May 2001 09:13:52 -0400
From: Ben Collins <bcol...@debian.org>
To: GOTO Masanori <go...@debian.or.jp>, 74...@bugs.debian.org
Cc: 80315...@bugs.debian.org, 82819...@bugs.debian.org,
83739...@bugs.debian.org, go...@debian.org,
Edmund GRIMLEY EVANS <edm...@rano.org>
Subject: Re: Bug#74611: sort order
Message-ID: <2001052109...@visi.net>
References: <wtwsnii...@fe.dis.titech.ac.jp>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.3.15i
In-Reply-To: <wtwsnii...@fe.dis.titech.ac.jp>; from go...@debian.or.jp on Sun, May 06, 2001 at 07:46:14PM +0900
Sender: Ben Collins <b...@visi.net>
Delivered-To: 82819...@bugs.debian.org

On Sun, May 06, 2001 at 07:46:14PM +0900, GOTO Masanori wrote:
> > I think these bugs (74611, 80315, 82819, 83739) should be merged.
>
> I contacted this problem to libc upstream maintainer, Ulrich Drepper.
> He said that this behavior is no problem because LC_COLLATE should
> stand on `dictionary order'.
>
> GNU libc uses ISO/IEC 14651 Table 1 (iso14651_t1) as `dictionary
> order'. It's called `International string ordering standard'. You can
> see at http://anubis.dkuug.dk/JTC1/SC22/WG20/docs/projects#14651 .
>
> This standard has large merit because it's ready for strings ordering
> with Unicode (ISO/IEC 10646). The standard has been approved and is
> under publication, but Ulrich Drepper said that; `this sorting is done
> for centuries, and it's nothing which is invented for this standard.'
>
> IMHO, these bug are all closed. If you want to use `ls' with
> traditional behavior, setting LC_COLLATE=C is the firstest answer for
> users who usually use ASCII. However, I don't know how to fix this
> issue for Non-ASCII users. Is it the important problem ?
> In addition, I'm not ISO-8859-* native user. Please tell me if you
> have any complaints.
>
> BenC, please close these bugs, if no one have objection until your
> next libc .deb release.

Closing. Thanks.

Ben

--
-----------=======-=-======-=========-----------=====------------=-=------
/ Ben Collins -- ...on that fantastic voyage... -- Debian GNU/Linux \
` bcol...@debian.org -- bcol...@openldap.org -- bcol...@linux.com '
`---=========------=======-------------=-=-----=-===-======-------=--=---'


--
To UNSUBSCRIBE, email to debian-bugs-c...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

0 new messages