Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion The .bytes/.codepoints/.graphemes methods

Newsgroups: perl.perl6.language
Path: g2news1.google.com!news1.google.com!newsfeed.stanford.edu!nntp.perl.org
Return-Path: <d...@sidhe.org>
Mailing-List: contact perl6-language-h...@perl.org; run by ezmlm
Delivered-To: mailing list perl6-langu...@perl.org
Received: (qmail 84517 invoked from network); 28 Jun 2004 18:41:41 -0000
Received: from x1.develooper.com (63.251.223.170)
  by onion.develooper.com with SMTP; 28 Jun 2004 18:41:41 -0000
Received: (qmail 4757 invoked by uid 225); 28 Jun 2004 18:41:41 -0000
Delivered-To: perl6-langu...@perl.org
Received: (qmail 4752 invoked by alias); 28 Jun 2004 18:41:40 -0000
X-Spam-Status: No, hits=-4.9 required=8.0
	tests=BAYES_00
X-Spam-Check-By: la.mx.develooper.com
Received: from 178.94.252.64.snet.net (HELO sprite.sidhe.org) (64.252.94.178)
  by la.mx.develooper.com (qpsmtpd/0.27.1) with SMTP; Mon, 28 Jun 2004 11:41:37 -0700
Received: (qmail 21476 invoked from network); 28 Jun 2004 18:36:37 -0000
X-Scanned-By: AMaViS-ng at sidhe.org
Received: from unknown (HELO localhost) (127.0.0.1)
  by localhost with SMTP; 28 Jun 2004 18:36:24 -0000
Date: Mon, 28 Jun 2004 14:36:24 -0400 (EDT)
To: Austin Hastings <austin_hasti...@yahoo.com>
cc: perl6-langu...@perl.org
Subject: Re: The .bytes/.codepoints/.graphemes methods
In-Reply-To: <20040628182734.98379.qmail@web12308.mail.yahoo.com>
Message-ID: <Pine.LNX.4.58.0406281425560.16872@sprite.sidhe.org>
References: <20040628182734.98379.qmail@web12308.mail.yahoo.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Spam-Rating: onion.develooper.com 1.6.2 0/1000/N
Approved: n...@nntp.perl.org
From: d...@sidhe.org (Dan Sugalski)

On Mon, 28 Jun 2004, Austin Hastings wrote:

> --- Dan Sugalski <d...@sidhe.org> wrote:
> > On Mon, 28 Jun 2004, Juerd wrote:
> >
> > > Dave Whipp skribis 2004-06-28  9:55 (-0700):
> > > > > substr($string, 2 bytes, 4 bytes) = $substitute;
> > > > substr($string, 2, 4 :bytes)
> > >
> > > substr($string, 2 but graphemes, 4 but bytes);
> > >
> > > I think "but" even makes sense, if substr defaults to something.
> >
> > I think mixing strings, bytes, graphemes, and code points together
> > is a phenomenally bad idea, likely to lead to many tears, much
> > gnashing of teeth, and quite a few rampages with sharp objects,
> > not to mention a lot of code guaranteed to fail at the edge cases.
>
> Hmm. Suppose that I have a system that is friendly to 80 byte records.
> I want to output "meaningful" strings, so I want to partition a buffer
> into 80-ish byte substrings, but preserve any graphemes (i.e., store
> the data in a legible format).
>
> How would I do that?

You don't. Or if you do, you do it with a lot of pain, sweat, and annoying
hard work. 80 bytes gets you somewhere between three (And this may be a
*high* estimate--there may be circumstances where 80 bytes is
insufficient for *one* grapheme) and 80 graphemes.

This isn't something that can be made generically easy.

					Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
d...@sidhe.org                         have teddy bears and even
                                      teddy bears get drunk