Once again, we're going with vtables, like the strings originally
had. Each string has two vtable pointers, one for encoding and one
for charset. Encodings and charsets'll be loadable libraries, in the
encodings/ and charset/ directories.
The encoding vtable needs to handle get/set codepoint, get/set byte,
and transform to another encoding. I don't think there's anything
else, but I could be wrong there.
The charset vtable needs to handle get/set grapheme, get/set
substring, up/down/titlecase, and (possibly) comparison. Charsets
also have a separate grapheme classification requirement (for
regexes) but we'll put that off for now.
I think those are it, but before we nail them down I'd like to have
folks squint at this a bit so we can make sure it's right. When it is
we can define the API directly and start implementing it.
--
Dan
--------------------------------------it's like this-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk
--
Mark Biggar
mark.a...@comcast.net
--
Mark Biggar
ma...@biggar.org
mark.a...@comcast.net
> At 4:30 PM +0000 6/16/04, mark.a...@comcast.net wrote:
> >Do we want a Normalization function here as well. If you have that
> >you can use a binary compare (at least for eq/ne).
>
> Yeah, we probably do. The question is always "Which normalization"
> since there are at least four for Unicode and two for ISO-2022. (Or
> something like that--I don't think I remembered the ISO number right)
>
> >
> >> The charset vtable needs to handle get/set grapheme, get/set
> >> substring, up/down/titlecase, and (possibly) comparison. Charsets
> >> also have a separate grapheme classification requirement (for
> >> regexes) but we'll put that off for now.
>
>