On 17.08.2019 13:10, Szyk Cech wrote:
>
> I want to write string conversion functions:
> std::wstring <--> unsigned char
> where first is UTF-32, but second can be with any encoding.
As I read it, you want to convert between UTF-32 and any byte-oriented
encoding.
That's a noble goal.
You could look into what functionality is used by e.g. Scintilla
component, but I remember from making a Notepad++ extension for UTF-8 as
default, that it's rather messy and generally ungood.
> I want to have functions like this:
>
> std::wstring gRawToUnicode(std::vector<unsigned char> aString,
> std::wstring aEncoding);
> std::vector<unsigned char> gUnicodeToRaw(std::wstring aString,
> std::wstring aEncoding);
Surely not as the bottom foundation.
Stringly typed stuff belongs up near the user, invoking strongly typed
stuff below.
A reasonable approach to bridge the gap between user interface stringly
typed (e.g. where an editor has a textual command interface somewhere),
and internal strongly typed, can be to use an encoding id string as a
key to a repository of converters, which then hands you a converter for
that encoding, or fails to find one.
I'm sure there's a pattern name for that.
Like inversion or some silly name like that.
> Important to me is ability to handle any input encoding (defined as
> wstring) because I want to use this functions in future versions of my
> text editor.
>
> I have two questions:
> 1. Is this possible to make it in pure C++ (stl based)?!?
Yes, but then you have to implement most all of it by yourself.
The standard library supports only two general encoding conversion:
between wide text and the locale's multibyte strings, and between the
various UTF encodings.
The latter set of conversions have been deprecated, and they do anyway
not make for very portable code if they're used directly, even though
they're still as of C++17 “standard”. E.g. g++ and MSVC differ in (1)
where they stop on detecting an input error, and (2) in the endianess
(!) of the result.
> 2. Do I have to use ICU library for this?!?
Yes, in practice.
> ad1. If so, please give me examples:
> + How to get list of all supported encodings?
> + How to convert strings in pure C++ when we know only input/output
> format?!? So I don't want example with hardcoded input/output encoding -
> I want to handle any input format and any output format (according to my
> functions).
I know what I would do: I would just start doing it.
But since I haven't done it I can't help you other than just noting that
diving into stuff like that, is in general both (1) much easier than you
thought, and (2) much more labor intensive, like orders of magnitude
more work, than you thought.
You have hereby been motivated and warned.
Cheers!,
- Alf