Newsgroups: perl.perl6.language
From: jona...@bright.net (Jonadab The Unsightly One)
Date: Mon, 28 Jun 2004 11:26:32 -0400
Local: Mon, Jun 28 2004 11:26 am
Subject: Re: The .bytes/.codepoints/.graphemes methods
Larry Wall <la...@wall.org> writes: I was just going to ask about substrings, and then didn't because I > That all has to be looked at anyway. What does "5" mean when you > pass it to substr, anyway? figured that had been hashed out already and I'd missed it... > (I've been trying to make it assume some implicit unit based on the It would be possible to have right-associative operators (that bind at > current lexical scope's Unicode level, but issues remain.) We have > magical string positions that have different numeric values > depending on what units you view them as, but at what point does a > number like "5" get translated to such a magical string position? least more tightly than comma and possibly very tightly) and convert a number to one of these objects, so that we can do stuff like this: substr($string, 2 bytes, 4 bytes) = $substitute; Then if you pass a plain number to substr it could either assume The word "bytes" is clearly much too long, though, much less substr($string, 2b, 4b) = $substitute; With presumably g and c for graphemes and codepoints, but I rather And I can't think of another abbreviation that would be remotely There's also the possibility of bsubstr and so on, but that leads us > I dunno--it reads pretty well. Maybe these'll be heavily enough codes and graphs is better than codepoints and graphemes, at least. > used that we should Huffmanize them down a bit: > $str.bytes > Though "letters" is a bit inadequate to describe language-dependent You could coin the abbreviation ligs, for Language Independent > graphemes, since it also divides any non-letters...I suppose we > could go with .characters if we don't mind forcing a heavily > overloaded word in one particular direction, culturally speaking. > Except, I'd kinda like to keep them starting with different letters. > (And maybe .chars should be reserved to mean whatever the default > unit is in the current lexical scope, as with substr() above.) Graphemes. Then some ingenious rascal can create a pragma or whatever that allows $str.b, $str.c, $str.g, and $str.l for fans of terseness. -- You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
| ||||||||||||||