> What each module does is basically input, processing and
> output. Input and output are often interactive so they are called
> together as "interface".
> What a module can use in interface for texts depends how the
> interface is specified.
> Various text-based interfaces (XML, JSON) use UTF-8.
> Microsoft's language-neutral COM interfaces use UTF-16 BSTR. It
> is painful to use UTF-8 instead there.
If you're interfacing to some external functions, then you
obviously have to use the encoding format which they require.
> As of internally for text processing inside of the module, use
> consistently one Unicode encoding. It is straightforward how to
> convert it into some other Unicode encoding. UTF-8 is most
> natural choice. Maybe that was what you meant by that you
> never use anything but char? As of illusions with UTF-16 ...
> just do not have wrong illusions with it and everything works.
Just a nit (because I think we really agree here), but there is
only one Unicode encoding. What you mean is the encoding form:
how the encoding is represented. Depending on what you are
doing, you may choose different encoding forms. For anything
external, you should use UTF-8. For interfacing with a third
party library, you must use the form that the library expects.
Internally, depending on what you are doing, it may be more
convenient to convert the UTF-8 to UTF-32. Or not: there's a
lot you can effectively do in UTF-8.
--
James