Several ops have also been added to simplify testing:
set_encoding S0, I0 - set encoding to specified index value
set_chartype S0, I0 - set chartype to specified index value
- these two can be deleted once these fields can be set via IO etc
transcode S0, S1, I0, I1 - transcode to specified encoding/chartype
Dynamic loading of a chartype is automatic if a request is made
for a non-existent chartype, e.g.
find_chartype I0, "8859-1" - this will load
parrot/runtime/chartypes/8859-1.TXT
set_chartype S0, I0
The search path and extension are hard-coded for now.
The mapping file format is that used by the Unicode consortium, and
8859-1.TXT
was downloaded directly from their web site; there are lots more mapping
files
there that we can use (Dan - can you confirm that the license is okay?)
There are a lot of limitations in the mechanism so far, including:
only singlebyte encoding
digit mapping is assumed to be standard ascii '0' to '9'
mapping from unicode uses full table scan
However, it should allow us to get a start on testing support for multiple
character sets in the rest of Parrot, and I wanted to get something in
for comments while I continue with further development.
All feedback welcome
--
Peter Gibbs
EmKel Systems
> All feedback welcome
How much does that overlap with icu/*?. Will we use that code or tables
only? What's the general plan?
leo