Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion The .bytes/.codepoints/.graphemes methods
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Larry Wall  
View profile  
 More options Jul 7 2004, 11:15 pm
Newsgroups: perl.perl6.language
From: la...@wall.org (Larry Wall)
Date: Wed, 7 Jul 2004 20:15:30 -0700
Local: Wed, Jul 7 2004 11:15 pm
Subject: Re: The .bytes/.codepoints/.graphemes methods
On Wed, Jul 07, 2004 at 08:09:51PM -0700, Larry Wall wrote:

: On Tue, Jun 29, 2004 at 10:52:34AM -0500, Jonathan Scott Duff wrote:
: : On Tue, Jun 29, 2004 at 08:34:16AM -0700, Austin Hastings wrote:
: : > This has no direct bearing on p6l, since performance is a p6i issue.
: : > But perhaps in the interests of performance as well as hackery we
: : > should explicitly provide some sort of variant regex behavior:
: : >
: : >     /a./ :bytes
: : >     /a./ :graphemes
: : >
: : > where the first would recognize 0x61 followed by any single byte, while
: : > the second would recognize 'a' followed by any number of bytes
: : > composing a single grapheme.
: :
: : Isn't that what :u0, :u1, :u2, and :u3 are for?
: :
: :         :u0         # use bytes       (. is byte)
: :         :u1         # level 1 support (. is codepoint)
: :         :u2         # level 1 support (. is grapheme)
: :         :u3         # level 1 support (. is language dependent)
:
: These modifiers might get renamed to match whatever b/c/g/w convention
: we come up with pragmas.  The levels aren't all that intuitive, though
: there is a kind of progression of semantic complexity that would get
: lost with ordinary names.

On the flip side, a good reason to get rid of the numeric values is
that in all likelihood people will continually make the mistake of
thinking :u1 means "one byte at a time" and :u2 means "two bytes at
a time".  And then they'll wonder why :u4 doesn't give them UTF-32...

Larry


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.