The problem isn't C++, it's unicode itself. And the
fact that unicode isn't the only game in town. So what
people will do is open an editor in Word or something,
get their default encoding, think they're writing "unicode"
because in their head that means "fancy characters", save
it as some Win-1152 variant, then go into encoding hell as
it renders as random characters when they run it through Stan.
Or it won't render in the Windows shell.
Then they'll have to figure out how to make LaTeX take
unicode inputs (not that hard, but a flag you need to set).
What Jeffrey Arnold's bringing up is that there are standard
forms. If you have o + umlauts, it can either be encoded
as a regular o followed by an umlaut combining character, or as the
single o+umlaut character. Then there's a whole copy of ASCII
at half-width in the Chinese character plane. And then a whole
bunch of characters in the Chinese name plane that go beyond
16 bits, which limits the number of systems that deal with them
properly. Then there's the fact that unicode's evolving, so we
could go with unicode A.B, but they keep adding more characters.
But like I said, as long as the I/O with proper encodings
can be shown to work on all platforms, we can try it.
- Bob