Tcl, unicode, transcoding strings ... almost.

15 views
Skip to first unread message

William Coleda

unread,
Mar 31, 2005, 11:04:53 PM3/31/05
to Perl 6 Internals
(whoops)

I just added octal and hex escapes to my local copy of the Tcl parser. I was working on unicode when I noticed that not all of the transcodes are done yet.

This works:

$S0 = unicode:""
$S1 = chr 0x30b3
$S0 .= $S1
print $S0
print "\n"

This does not:

$S0 = ascii:""
$S1 = chr 0x30b3
$S0 .= $S1

It fails with:

Cross-type string appending (fixed_8/ascii) (utf8/unicode) unsupported

Similarly, the default iso encoding doesn't allow conversion to unicode either.

FYI, the charset:"string constant" documentation in imcc doesn't mention "unicode", though unicode appears to be a valid choice.

Now, to work around the transcoding issue, I thought I'd just keep all my strings as unicode... but I can't do this, as part of my parsing involves adding single characters based on their value. e.g.: if I have something like

puts \11WHEE

Then I have to generate a string like this:

$S1 = chr $I1

... and that string defaults to iso, which can't be transcoded to unicode.

So, anyone want to implement iso to unicode translation? (it looks like there *is* an implementation for ASCII, but as I noted above, it generates an exception).


Regards.

Leopold Toetsch

unread,
Apr 1, 2005, 1:56:34 AM4/1/05
to William Coleda, perl6-i...@perl.org
William Coleda <wi...@coleda.com> wrote:
> (whoops)

> I just added octal and hex escapes to my local copy of the Tcl parser.
> I was working on unicode when I noticed that not all of the transcodes
> are done yet.

Yes, that's true. Much more work is needed still.

> This does not:

> $S0 = ascii:""
> $S1 = chr 0x30b3
> $S0 .= $S1

> It fails with:

> Cross-type string appending (fixed_8/ascii) (utf8/unicode) unsupported

Yep.

> FYI, the charset:"string constant" documentation in imcc doesn't
> mention "unicode", though unicode appears to be a valid choice.

Yep. As charsets and encodings are extensible even at runtime docs can't
mention all valid choices. But it should have an entry for "unicode".
And wel'll need an interface to get at valid charsets/encodings.

> Now, to work around the transcoding issue, I thought I'd just keep all
> my strings as unicode... but I can't do this, as part of my parsing
> involves adding single characters based on their value. e.g.: if I
> have something like

> puts \11WHEE

> Then I have to generate a string like this:

> $S1 = chr $I1

> ... and that string defaults to iso, which can't be transcoded to unicode.

It can:

$ cat t.imc
.sub main
$S0 = unicode:"abcd"
$S1 = chr 65
find_charset $I0, "unicode"
$S2 = trans_charset $S1, $I0
$S2 .= $S0
print $S2
print "\n"
.end

$ parrot t.imc
Aabcd

> So, anyone want to implement iso to unicode translation? (it looks
> like there *is* an implementation for ASCII, but as I noted above, it
> generates an exception).

t/op/string_cs.t shows implemented conversions

> Regards.

leo

Reply all
Reply to author
Forward
0 new messages