wxString multiple problems (c_str, FromUTF8, FromAscii, array element enumeration)

427 views
Skip to first unread message

Martin Papik

unread,
Jun 27, 2012, 9:11:40 PM6/27/12
to wx-...@googlegroups.com
wx: 2.8.12, 2.9.3
OS: linux, ubuntu

If there's anything I can do to help let me know. Below are my findings so far.

Martin

Common to all code:
  #define wxUSE_UNICODE 1
  char * str; str=strdup("test");
  wxString s;

### 1
  s.FromUTF8(str);
  s.FromUTF8(str,strlen(str));
  s.FromAscii(str);
In all cases s will remain unchanged. Whatever string was there will remain untouched.
I would expect it to contain the unicode representation of test. Behaviour exists both on wx-2.8.12 and wx-2.9.3.
Is this a bug or am I severely misreading the documentation?

For now my workaround is --> for (i=0;str[i];i++) s+=str[i];

### 2
  s = wxT("test");
  printf ("%s\n",s.c_str());
on wx-2.8.12 gcc says: warning: format ‘%s’ expects argument of type ‘char*’, but argument 2 has type ‘const wxChar* {aka const wchar_t*}’ [-Wformat]
on wx-2.9.3 gcc says: warning: format ‘%s’ expects argument of type ‘char*’, but argument 2 has type ‘const wxChar* {aka const wchar_t*}’ [-Wformat]
     +++    warning: format ‘%s’ expects argument of type ‘char*’, but argument 2 has type ‘const wxChar* {aka const wchar_t*}’ [-Wformat]

### 3
  s = wxT("test");
  printf ("%s\n",(char*)s.c_str());
on wx-2.8.12 works, but I would argue type casting shouldn't be needed.
on wx-2.9.3 gcc says: error: invalid cast from type ‘wxCStrData’ to type ‘char*’
same if I cast it to (const char *)

I didn't find any details about wxCStrData in the documentation for wx-2.9.3.

### 4
  s.Printf(wxT("%s"),str);
on wx-2.8.12 the text is garbled -- as far as I can determine, the text is interpreted as utf32, there is a character with a unicode value corresponding to the concatenation of the individual ascii codes, 0x74736574
on wx-2.9.3 the text is copied correctly
  s.Printf(wxT("%S"),str);
on wx-2.8.12 the text is copied correctly
on wx-2.9.3 I get an assertion --> src/common/strvararg.cpp(646): assert "n <= parser.nargs" failed in DoGetArgumentType(): more arguments than format string specifiers?

----
I would expect %s to be a char * and %S to be a wchar *
That's what the manual for printf says and AFAIK the wxWidgets documentation doesn't say otherwise.
I'd say 2.9 is correct in this case.

### 5
  unsigned i;
  for (i=0;i<s.Length();i++) { printf (" 0x%x",s[i]); } printf ("\n");
on wx-2.8.12 works, prints the unicode values of the string characters
on wx-2.9.3
 * gcc says: warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘wxUniCharRef’
 * prints a series of pointers, I think

for (i=0;i<s.Length();i++) { printf (" 0x%x",(int)s[i]); } printf ("\n");
* works correctly on wx-2.9.3

===============

IMHO explicit casting should be avoided because it inhibits the compiler's ability to warn about errors. If I use type casts the compiler assumes I know what I'm doing, but since I'm new at wx, half the time I don't know, the rest of the time I'm copying code. I'd rather the compiler assumes I'm a monkey typing and teach me the error of my ways.

Vadim Zeitlin

unread,
Jun 28, 2012, 7:38:47 AM6/28/12
to wx-...@googlegroups.com
On Thu, 28 Jun 2012 04:11:40 +0300 Martin Papik wrote:

MP> Common to all code:
MP> #define wxUSE_UNICODE 1
MP> char * str; str=strdup("test");
MP> wxString s;
MP>
MP> ### 1
MP> s.FromUTF8(str);
MP> s.FromUTF8(str,strlen(str));
MP> s.FromAscii(str);
MP> In all cases s will remain unchanged. Whatever string was there will remain
MP> untouched.
MP> I would expect it to contain the unicode representation of test. Behaviour
MP> exists both on wx-2.8.12 and wx-2.9.3.
MP> Is this a bug or am I severely misreading the documentation?

You are. Both FromAscii() and FromUTF8() are static functions returning
a new wxString, it doesn't make sense to call them like above.

MP> ### 2
MP> s = wxT("test");
MP> printf ("%s\n",s.c_str());
MP> on wx-2.8.12 gcc says: warning: format ‘%s’ expects argument of type
MP> ‘char*’, but argument 2 has type ‘const wxChar* {aka const wchar_t*}’
MP> [-Wformat]
MP> on wx-2.9.3 gcc says: warning: format ‘%s’ expects argument of type
MP> ‘char*’, but argument 2 has type ‘const wxChar* {aka const wchar_t*}’
MP> [-Wformat]
MP> +++ warning: format ‘%s’ expects argument of type ‘char*’, but
MP> argument 2 has type ‘const wxChar* {aka const wchar_t*}’ [-Wformat]

Use either

printf("%s\n", static_cast<const char*>(s.mb_str()));

or, preferred,

wxPrintf("%s\n", s);

This is discussed in docs/changes.txt, search for "wxString::c_str()".

MP> ### 3
MP> s = wxT("test");
MP> printf ("%s\n",(char*)s.c_str());
MP> on wx-2.8.12 works, but I would argue type casting shouldn't be needed.

It doesn't work. In Unicode build of 2.8 c_str() returns a wchar_t*
pointer, casting it to char* may make it compile but it definitely doesn't
make it work as expected.

MP> on wx-2.9.3 gcc says: error: invalid cast from type ‘wxCStrData’ to type
MP> ‘char*’ same if I cast it to (const char *)

No, not same. Casting to "char*" doesn't work, casting to "const char*"
does work, please check again.

MP> I didn't find any details about wxCStrData in the documentation for
MP> wx-2.9.3.
MP>
MP> ### 4
MP> s.Printf(wxT("%s"),str);

In 2.8 you can only pass wchar_t strings to Printf("%s") in Unicode build.

MP> on wx-2.9.3 the text is copied correctly
MP> s.Printf(wxT("%S"),str);
MP> on wx-2.8.12 the text is copied correctly
MP> on wx-2.9.3 I get an assertion --> src/common/strvararg.cpp(646): assert "n
MP> <= parser.nargs" failed in DoGetArgumentType(): more arguments than format
MP> string specifiers?

Don't use non-portable "%S" with wxPrintf() and similar functions.

MP> I would expect %s to be a char * and %S to be a wchar *

This is not how it works. In 2.9, "%s" can be either char* or wchar_t*
when using wx functions and "%S" is not supported.


MP> IMHO explicit casting should be avoided because it inhibits the compiler's
MP> ability to warn about errors.

Casting is only needed when using vararg functions which don't do any
error checking in the first place. The simplest solution is to use
wxPrintf() and such instead (they look like vararg functions but actually
they are pseudo-vararg templates and are type-safe). Another possibility is
to not use vararg functions at all.


In any case, definitely avoid printf() and other vararg functions in new
code using wxString, they can be used correctly with it but it has never
been convenient to use printf() with objects and still isn't. Using
wxPrintf() OTOH is very nice and you don't need neither the casts nor even
.c_str().

Regards,
VZ

Martin Papik

unread,
Jun 28, 2012, 11:48:15 AM6/28/12
to wx-...@googlegroups.com
Dear Vadim

Thank you for your help, I'll make a note of all of that. I just *really* wish I wouldn't have to bother you with all of this. It really does seem broken to me, but a lot of the time I'm just using it wrong. It would help if there was better documentation, examples. Then I could just get on with my coding and not have to wage wars on casts and widgets.

If I have a wxString, and I have a long text, utf8, if I do wxstr = wxString::FromUTF8(input_data), doesn't that do unnecessary copying? Let's say a 1MB.

What about the for (i=0;i<length;i++) printf ("%d",s[i]);.... what am I doing wrong there?

Martin

Vadim Zeitlin

unread,
Jun 28, 2012, 1:22:36 PM6/28/12
to wx-...@googlegroups.com
On Thu, 28 Jun 2012 18:48:15 +0300 Martin Papik wrote:

MP> Thank you for your help, I'll make a note of all of that. I just really
MP> wish I wouldn't have to bother you with all of this. It really does seem
MP> broken to me, but a lot of the time I'm just using it wrong. It would help
MP> if there was better documentation, examples. Then I could just get on with
MP> my coding and not have to wage wars on casts and widgets.

The documentation is good enough when you know what you're doing. When you
don't, it often can be very useful to have a look at the samples. Just try
grepping them for a class or function you're interested in.

MP> If I have a wxString, and I have a long text, utf8, if I do wxstr =
MP> wxString::FromUTF8(input_data), doesn't that do unnecessary copying?

It does copy the data but there is no way to avoid it. Whatever is the way
in which you create a wxString you always copy the data into it anyhow. Of
course, this is the same for practically all the other string classes.

MP> What about the for (i=0;i<length;i++) printf ("%d",s[i]);.... what am I
MP> doing wrong there?

Not using wxPrintf(). As with c_str(), s[i] returns a dual-use object
(wxUniChar) that can be converted either to char or wchar_t. so you must
either use wxPrintf() which is aware of it or cast it explicitly.

Regards,
VZ

Martin Papik

unread,
Jun 28, 2012, 1:38:25 PM6/28/12
to wx-...@googlegroups.com

Thank you Vadim

Stefa...@t-online.de

unread,
Jun 29, 2012, 4:50:14 AM6/29/12
to wx-...@googlegroups.com
Hi,


VZ> MP> If I have a wxString, and I have a long text, utf8, if I do
VZ> MP> wxstr = wxString::FromUTF8(input_data), doesn't that
VZ> MP> do unnecessary copying?
VZ>
VZ> It does copy the data but there is no way to avoid it. Whatever is
the
VZ> way in which you create a wxString you always copy the data into it
VZ> anyhow.

Yes, but the second copy happening in the assignment operator
really is unnecessary, isn't it? From that point of view a non-static
member function where you could you wxstr.FromUTF8(input_data)
would make sense, so I understand Martin's confusion. OTOH
doing the conversion in the constructor like
wxString wxstr(input_data, wxConvUTF8);
seems to avoid that additional copy ...

Regards,
Stefan


Vadim Zeitlin

unread,
Jun 29, 2012, 6:53:07 AM6/29/12
to wx-...@googlegroups.com
On Fri, 29 Jun 2012 10:50:14 +0200 Stefa...@t-online.de wrote:

So> Yes, but the second copy happening in the assignment operator
So> really is unnecessary, isn't it?

If you use the assignment operator, yes. But you could use copy ctor and
then hopefully the compiler would use RVO (although I admit I didn't check
it did).

So> OTOH doing the conversion in the constructor like
So> wxString wxstr(input_data, wxConvUTF8);
So> seems to avoid that additional copy ...

This works too.

Regards,
VZ
Reply all
Reply to author
Forward
0 new messages