Which value of wxMBConv should I use?

23 views
Skip to first unread message

Chris Stankevitz

unread,
Feb 7, 2013, 5:25:06 PM2/7/13
to wx-...@googlegroups.com
Hello,

Please consider the following steps:

1. Purchase a DELL M6600 Laptop

2. Install 64-bit Ubuntu 12.04, configure for "english"

3. Install wxWidgets-2.8 and GCC 4.6

4. Type this code into test.cpp:
#include <wx/string.h>
#include <string>

void f()
{
std::string StlString = "Hello";

wxString WxString(StlString.c_str(), XXXXX);
}

5. Compile with "g++ -c `wx-config --cppflags` test.cpp"

Question: What should be the value for XXXXX:
a) wxConvLibc
b) wxConvLocal
c) wxConvUI
d) wxConvISO8859_1
e) wxConvUTF8
f) wxConvFile
g) none of the above

I believe the answer is (e) but I do not understand and I pray that I
do not need to understand (or that I can find a good source who can
explain it to me without using words like "I think", "probably",
"should be", etc). PS: I love how wx2.9 deals with this.
Unfortunately I have to use wx2.8 at the moment.

Thank you,

Chris

Vadim Zeitlin

unread,
Feb 7, 2013, 5:40:39 PM2/7/13
to wx-...@googlegroups.com
On Thu, 7 Feb 2013 14:25:06 -0800 Chris Stankevitz wrote:

CS> void f()
CS> {
CS> std::string StlString = "Hello";
CS>
CS> wxString WxString(StlString.c_str(), XXXXX);
CS> }
CS>
CS> 5. Compile with "g++ -c `wx-config --cppflags` test.cpp"

Let's get some trivialities out of the way first:

- Please use "--cxxflags", these are C++ compiler flags while "--cppflags"
are C preprocessor flags. They usually are the same -- but not always!

- I assume that you built wxWidgets in Unicode mode, otherwise the question
is not very interesting as no conversion is done.

CS> Question: What should be the value for XXXXX:

This depends entirely on the encoding of the string "StlString". In your
example above it consists of 7 bit ASCII characters only so the answer
is...

CS> a) wxConvLibc
CS> b) wxConvLocal
CS> c) wxConvUI
CS> d) wxConvISO8859_1
CS> e) wxConvUTF8
CS> f) wxConvFile
CS> g) none of the above

... any of the above except (g) which won't compile.

Because it absolutely doesn't matter which conversion you use for ASCII
characters. As soon as you have anything else, you do need to choose the
conversion carefully. If you're in luck, you're working in UTF-8 locale,
with "modern" -- and so UTF-8-friendly -- libraries in which you'd just use
(e) everywhere without thinking. But under Windows this never will be the
case because the locale is never UTF-8, so you'd use something else.
Namely:

(a) wxConvLibc if your string was returned by one of the standard C
("libc") functions or, symmetrically, was meant to be passed to one.

(b) wxConvLocal to use the encoding of the current locale which is almost
always the same as libc encoding.

(c) wxConvUI to use the encoding used by the UI which is the same as
wxConvLocal by default but can be changed if you're feeling (very)
adventurous.

(d) wxConvISO8859_1 if, for some reason, you want to hard-code the use
of this encoding (aka Latin-1) in your program. This is almost never
a good choice.

(e) wxConvUTF8 if you're working with UTF-8 strings. This will usually
be the case when doing IO (either file or network) and, more generally,
exchanging strings with other modules. But it doesn't have to be the
case, of course, i.e. you could use UTF-16 or UTF-32 or even UTF-7 if
you want to feel really special for your network communications. It
just wouldn't be a good idea, normally.

(f) wxConvFile if the string represents a file name.

(g) That would be some other conversion, e.g. wxCSConv("encoding").
Unsurprisingly, this should be used when you know the encoding of the
narrow string you have or would like to create and it's not UTF-8 or
Latin-1 (as in this case you'd use wxConvUTF8 or wxConvISO8859_1).


To summarize, use (a) when working with standard C functions, (e) when
working with other libraries using UTF-8 and (g) in most of the other
situations.

Regards,
VZ

Chris Stankevitz

unread,
Feb 7, 2013, 7:00:36 PM2/7/13
to wx-...@googlegroups.com
Vadim,

Thank you for your reply.

On Thu, Feb 7, 2013 at 2:40 PM, Vadim Zeitlin <va...@wxwidgets.org> wrote:
> - Please use "--cxxflags", these are C++ compiler flags

Yes, thank you for the correction.

> - I assume that you built wxWidgets in Unicode mode

I fell short in my attempt to be explicit. For my step (3) I said:
> > Install wxWidgets-2.8 and GCC 4.6
But I should have said:
sudo apt-get installbuild-essential libwxgtk2.8-dev

This does indeed install a unicode version.

> This depends entirely on the encoding of the string "StlString".

I do not know how to select an encoding when using std::string. Even
if I knew how I would not know which encoding I would even want to
select! Lucky for me I could describe my process from "purchase
computer" to "compile wx-dependent code" and you were able to deduce
this for me (thank you!):

> In your
> example above it consists of 7 bit ASCII characters

Okay good to know! :)

> CS> a) wxConvLibc
> CS> b) wxConvLocal
> CS> c) wxConvUI
> CS> d) wxConvISO8859_1
> CS> e) wxConvUTF8
> CS> f) wxConvFile
> CS> g) none of the above
>
> ... any of the above except (g) which won't compile.

Wow... what a surprise. I definitely was not expecting "any of the
above". This whole string thing is simultaneously complicated and
simple!

> Because it absolutely doesn't matter which conversion you use for ASCII
> characters. As soon as you have anything else, you do need to choose the
> conversion carefully.

Vadim:

I will summarize the first half of your post like this: "Chris, it
looks like you are using std::string. Therefore you are using ASCII.
Therefore you do not need to worry about any of this. Select any
wxMBConv randomly (because you are using ASCII) and you will be fine.
When you are not using ASCII you need to be careful".

Not only did I now know that I was using "ASCII encoding"... I did not
know I could even select a different kind of encoding. I have no idea
how, when or why I would ever switch from ASCII (which I did not even
know I was using) to something else.

I've used std::string (ASCII) to hold filenames and never needed to
know/understand/contemplate "filename encoding".

I've passed strings to other 3rd party libraries without ever asking
them which encoding they use.

I've even written windows programs, where the locale is apparently not
UTF-8 (which I never knew about until just now) and not been aware of
any of these issues.

So I guess I'm kind of curious how it is I have gone through c++ life
for 10 years, calling third party libraries, making system calls, and
reading/writing files without being aware of what my encoding is. It
was sort of "shoved in my face" when I use wx28 that I need to be
aware of my encoding... and comically apparently the answer is "don't
care" due to my use of ASCII-based std::string.

Anyhow, thank you very much for helping me. And especially thank you
for your work on wx29 where this mysterious string handling once again
goes below my radar!

Chris

Manolo

unread,
Feb 7, 2013, 7:33:55 PM2/7/13
to wx-...@googlegroups.com

> I do not know how to select an encoding when using std::string.
You can't.
std::string knows nothing about encodings. Only knows about chars, wide
chars, and a few more.
See http://en.cppreference.com/w/cpp/string

Vadim, perhaps some words about this in docs (unicode/string overviews)
may avoid misunderstanding on wxString conversions.

> So I guess I'm kind of curious how it is I have gone through c++ life
> for 10 years, calling third party libraries, making system calls, and
> reading/writing files without being aware of what my encoding is. It
> was sort of "shoved in my face" when I use wx28 that I need to be
> aware of my encoding... and comically apparently the answer is "don't
> care" due to my use of ASCII-based std::string.
... and maybe also because you didn't need internationalization (aka 'i18').

Regards,
Manolo

Vadim Zeitlin

unread,
Feb 7, 2013, 7:34:39 PM2/7/13
to wx-...@googlegroups.com
On Thu, 7 Feb 2013 16:00:36 -0800 Chris Stankevitz wrote:

CS> > This depends entirely on the encoding of the string "StlString".
CS>
CS> I do not know how to select an encoding when using std::string. Even
CS> if I knew how I would not know which encoding I would even want to
CS> select!

Usually the encoding is selected for you by the string producer, you'll
rarely have literal strings like this in your program. Of course, if you
have them, then the producer is yourself and you do know whether it's ASCII
or (the only other reasonable choice nowadays, really) UTF-8. The trouble
is that more often than not you get this string from elsewhere and then you
need to find its encoding from the documentation of the library giving them
to you.

CS> I will summarize the first half of your post like this: "Chris, it
CS> looks like you are using std::string. Therefore you are using ASCII.

No, this is a wrong conclusion. std::string can contain a string in any
encoding. In this particular case it contains an ASCII string. But it
could contain anything at all encoded in some way, std::string is
completely encoding-agnostic.

CS> I've used std::string (ASCII) to hold filenames and never needed to
CS> know/understand/contemplate "filename encoding".

Again, as long as your filenames use ASCII names it doesn't matter. But if
you wanted to open a file called "français-русский.txt", things would get
more interesting.

CS> So I guess I'm kind of curious how it is I have gone through c++ life
CS> for 10 years, calling third party libraries, making system calls, and
CS> reading/writing files without being aware of what my encoding is.

You probably only worked with ASCII strings all this time. It's still
quite common in US, although rather rare in the rest of the world.

CS> Anyhow, thank you very much for helping me. And especially thank you
CS> for your work on wx29 where this mysterious string handling once again
CS> goes below my radar!

I'm not sure if you're being totally serious, but if you're, this is a
dangerous position. wx29 simply uses wxConvLibc everywhere but you need to
be aware of it and in particular understand when you need to explicitly use
something else (usually wxConvUTF8).

Regards,
VZ
Reply all
Reply to author
Forward
0 new messages