1) Add CHARTYPE* as a parameter to the <chartype>_is_digit/_get_digit functions
2) Create a new struct chartype_digit_map_t to contain mappings from code values
to digit values
3) Add a pointer to the above struct to the CHARTYPE structure
Any comments on the above before I go ahead?
--
Peter Gibbs
EmKel Systems
> I am working on adding support for additional chartypes to Parrot, plus
> the capability for run-time registration of same. To this end, I would like to:
>
> 1) Add CHARTYPE* as a parameter to the <chartype>_is_digit/_get_digit functions
> 2) Create a new struct chartype_digit_map_t to contain mappings from code values
> to digit values
> 3) Add a pointer to the above struct to the CHARTYPE structure
Go for it. There is the possibility that identifying a digit is more
involved than we might otherwise want for some chartypes, however--the
first thing that pops to mind is the fun that ensues in the
Chinese-derived writing systems where the characters for the various
numbers may have non-numeric meanings in some circumstances. For those
(and for some other functions, such as "what is a word character", and
"what is a word boundary") requires something more complex than a plain
lookup table.
Dan
> Go for it. There is the possibility that identifying a digit is more
> involved than we might otherwise want for some chartypes, however--the
Yeah - I am hoping to handle the simpler cases generically, so that we only
need to write specific code for the less simple ones.
The current methods for both digit handling and transcoding are
context-free,
which I suspect may become a problem later; if so, some form of iterator
with
context information will be required.
Which should be just *so* much fun... :)
Since you're modifying the struct, make sure there are entries for
*functions* that do all the things you're putting in pointers to data
members for. We can NULL them out for now, but it'll mean that when
someone throws the Shift-JIS chartype code in we won't have to change the
struct and rebuild the world.
Dan