There's *some* rudimentary support for non-BMP characters (via surrogate
pairs) in 8.7, where TCL_UTF_MAX is still definitely 4, though you need
to use "\U" to be able to use six hex digits rather than just four:
% info patchlevel
8.7a2
% puts \U01f600
😀
% string length 😀
2
% scan 😀 %c%c a b
1
% set a
128512
% format %c 128512
😀
As you can see, [string length] is wrong, but [scan] and [format] are
right (as is output to the console on at least OSX). If what you're
doing is passing things through from user input or a file to user output
or a file, then it's probably enough. (The above test was with the
current tip of the core-8-branch.)
You particularly need TIP 388
(
https://core.tcl.tk/tips/doc/trunk/tip/388.md) and TIP 389
(
https://core.tcl.tk/tips/doc/trunk/tip/389.md) in order to make
progress. The latter is an 8.7 one. (Fixing [string length] requires
changing TCL_UTF_MAX so that's probably going to be a thing we do in 9.0
rather than 8.7.)
Donal.
--
Donal Fellows — Tcl user, Tcl maintainer, TIP editor.