Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Strings internals

5 views
Skip to first unread message

Dan Sugalski

unread,
Jun 16, 2004, 11:59:38 AM6/16/04
to perl6-i...@perl.org
Okay, now that we've got the bytecode-visible stuff specified, I want
to spec the internals some, and start getting things migrated over to
it. (This should allow us to make ICU optional as well, for folks
that only want ASCII/Latin-x/EBCDIC enabled)

Once again, we're going with vtables, like the strings originally
had. Each string has two vtable pointers, one for encoding and one
for charset. Encodings and charsets'll be loadable libraries, in the
encodings/ and charset/ directories.

The encoding vtable needs to handle get/set codepoint, get/set byte,
and transform to another encoding. I don't think there's anything
else, but I could be wrong there.

The charset vtable needs to handle get/set grapheme, get/set
substring, up/down/titlecase, and (possibly) comparison. Charsets
also have a separate grapheme classification requirement (for
regexes) but we'll put that off for now.

I think those are it, but before we nail them down I'd like to have
folks squint at this a bit so we can make sure it's right. When it is
we can define the API directly and start implementing it.
--
Dan

--------------------------------------it's like this-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Mark A Biggar

unread,
Jun 16, 2004, 12:30:55 PM6/16/04
to Dan Sugalski, perl6-i...@perl.org
Do we want a Normalization function here as well. If you have that you can use a binary compare (at least for eq/ne).

--
Mark Biggar
mark.a...@comcast.net

Mark A Biggar

unread,
Jun 16, 2004, 1:07:51 PM6/16/04
to Dan Sugalski, perl6-i...@perl.org
Yeah, but I believe that at least Unicode has one of the four that they suggest
be used for non-locale specific comparisons (canonical decomposition form).
So pick that one for the core and provide the others (if necessary) as library
functions.

--
Mark Biggar
ma...@biggar.org
mark.a...@comcast.net


> At 4:30 PM +0000 6/16/04, mark.a...@comcast.net wrote:
> >Do we want a Normalization function here as well. If you have that
> >you can use a binary compare (at least for eq/ne).
>

> Yeah, we probably do. The question is always "Which normalization"
> since there are at least four for Unicode and two for ISO-2022. (Or
> something like that--I don't think I remembered the ISO number right)


>
> >
> >> The charset vtable needs to handle get/set grapheme, get/set
> >> substring, up/down/titlecase, and (possibly) comparison. Charsets
> >> also have a separate grapheme classification requirement (for
> >> regexes) but we'll put that off for now.
>
>

0 new messages