Message from discussion
(Re: The strings design document)
Newsgroups: perl.perl6.internals
Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!nntp.perl.org
Return-Path: <j...@iki.fi>
Mailing-List: contact perl6-internals-h...@perl.org; run by ezmlm
Delivered-To: mailing list perl6-intern...@perl.org
Received: (qmail 2895 invoked from network); 27 Apr 2004 16:57:32 -0000
Received: from x1.develooper.com (63.251.223.170)
by onion.develooper.com with SMTP; 27 Apr 2004 16:57:32 -0000
Received: (qmail 24634 invoked by uid 225); 27 Apr 2004 16:57:32 -0000
Delivered-To: perl6-intern...@perl.org
Received: (qmail 24623 invoked by alias); 27 Apr 2004 16:57:31 -0000
X-Spam-Status: No, hits=0.0 required=7.0
tests=
X-Spam-Check-By: la.mx.develooper.com
Received: from onion.develooper.com (HELO onion.perl.org) (63.251.223.166)
by la.mx.develooper.com (qpsmtpd/0.27.1) with SMTP; Tue, 27 Apr 2004 09:57:31 -0700
Received: (qmail 2879 invoked by uid 1012); 27 Apr 2004 16:57:29 -0000
To: perl6-intern...@perl.org, Jeff Clites <jcli...@mac.com>
Message-ID: <408E90F9.7070600@iki.fi>
Date: Tue, 27 Apr 2004 19:57:29 +0300
User-Agent: Mozilla Thunderbird 0.5 (Macintosh/20040208)
X-Accept-Language: en-us, en
MIME-Version: 1.0
CC: Dan Sugalski <d...@sidhe.org>, perl6-intern...@perl.org
Subject: Re: [Q1] (Re: The strings design document)
References: <a06100505bcaf3db4412b@[172.24.18.98]> <8DE3A940-9869-11D8-88D8-000393A6B9DA@mac.com>
In-Reply-To: <8DE3A940-9869-11D8-88D8-000393A6B9DA@mac.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Posted-By: 62.78.188.30
X-Spam-Rating: onion.develooper.com 1.6.2 0/1000/N
Approved: n...@nntp.perl.org
From: j...@iki.fi (Jarkko Hietaniemi)
> 1) ISO-8859-1 is used to represent text in several different languages,
> including German and Swedish. German and Swedish differ in their sort
> order, even for things they have in common. (For example, ö
> (o-with-diaeresis) is considered a separate letter in Swedish, but is
> just a accented "o" in German.) So (assuming my strings aren't
> explicitly langauge-tagged, or are tagged with "Dunno"), what sort
> order does ISO-8859-1 define? I'm not sure whether the national
> standards themselves actually define a sort order, so are we going to
National standards yes, ISO 8859 (and the like) not. In other words,
sorting standards exist, but they have (quite rightly) nothing to do
with sorting standards. Real life sorting is messy (multiple passes,
some parts may be ignored in some passes, acronyms, etc.) and worlds
apart from "let's compare the bytes one by one" or even from "let's
compare code points" or even from "let's compare grapheme (clusters)".
> define one for every "character set"? In addition, many languages can
> be represented in several different "character set", so that seems to
> mean that the sort order for "öut" v. "out" will vary, depending on the
> "character set" used for those strings?
FWIW, I think binding language to strings is a Mistake. But I have
decided to give up trying to argue anymore about it since Dan seems
to be convinced that it will solve some problems.