Hi Adriaan,
Thanks for the question. There are two answers to this question.
Full transparent bridging between a legacy String/Character and Unicode String/Grapheme/Unicode Scalar was simply too much to accomplish in a single release cycle. This is a multi-release endeavor, and the focus of VAST 2022 was the implementation of Unicode support and safely bringing it into the core of the platform. As we move to VAST 2023, we will focus more on the integration with specific parts of the product and changing code in String/Character would be considered part of this integration. We didn't want to rush how we changed other parts of the system, especially in highly optimized areas where even a little bit on increase code size (primarily vm code) can have a detrimental impact on performance.
"How one in 'real life' would go about this (in an efficient way) in cases where you don't know what kind of strings you're dealing with"
I'm not exactly sure how to answer this question. Some of it depends on your architecture. Some of it depends on what you expect the product to reasonably do for you on your behalf vs what you will be expected to do if you opt-in to Unicode support. I imagine as we integrate other libraries in VAST to be Unicode enabled, we will be better positioned to answer these questions. But it will require examples and use-cases.
One approach is the "Unicode Sandwich" model, and String is just a special case of decoding "on the edge" into a UnicodeString, except we're decoding code-pages instead of UTF data. And that will apply to many scenarios "in real life", but maybe not your situation. Perhaps you could elaborate with an example?
Maybe I already answered your question with "we're looking at it for VAST 2023"? As I've done in a few places, you can always just call #asUnicodeString on the string-like object in question. If this string has any characters outside 7-bit ascii, you're going to need to do conversion anyway.
Again, depending on your needs, another set of APIs to look at are "Views". We actually wrote bridging code so that all the "Views" could be used on the String class also. So, if you need to convert a code-page <String> class to graphemes, unicode scalars, utf8, utf16, utf32. Then you can do so, and this would allow your code to stay polymorphic in those areas. All of these are provided as extensions on String defined in the UnicodeSupport library.
Thanks for the question.
- Seth