wxString in 2.9

549 views
Skip to first unread message

Armel

unread,
Mar 22, 2012, 11:41:04 AM3/22/12
to wx-...@googlegroups.com
Hi everybody,
 
I have been now developing for years with wx and I believe it is important to let you know how sad I am with the move toward new wxStrings.
basically they are the most complex piece of code I ever saw for such a simple job.
 
and the bad news is that it is prone to error above the acceptable.
the attempt to make "compatibility" stuff has led to a tremedeous waste of effort, just to make something which is uncheckable by the compiler.
 
let's take an example: const char *time = dt.ParseDate ("2012/01/01 18:00:00"); seems perfectly correct isn't it ?
bad news time points to garbage in this case (why? because the returned value is in fact a pointer to the memory of the temporarily created wxString object COPYING the date string)
 
let's take another example: wxString mysubstr(mybigstr, myiterator - mybigstr.begin())
bad news again it works randomly: it works in wide-char build and not UTF8
 
note that wxString::length sometimes returned a count of wxStringCharType (in wide char build), and a count of code point (in UTF8), so it is called 'length' in UTF8 although this is not what is expected anymore for a length to wxString constructor (by the way the meaning changed since 2.6/2.8).
 
another problem beside the inconsistencies is how sluggish wx has become, passing from wx 2.6 to wx 2.9, the startup for the same program is around twice as long.
the executable is nearly 40% bigger with zero features added.
 
a string iterator in UTF8 is now an object with chaining (6 pointers!), and that only to care about the replace() function. most modern frameworks have simply NO in place replace(). why? because in place replace() is just a hack rarely usable. in place, string are immutable and the concept of an efficient string_builder is used.
 
all of that to avoid a few wxT() macros.
 
sorry for the rant, but please realize that many many programs developers will eventually upgrade from 2.8 to 3.0 branch one day, and those developers do not care AT ALL about this tiny wxT() but will definitely care about the programs not working any more for really obscure reasons. I fear that most of them will simply go away, putting their wx stuff to the bin and going the qt or gtk way.
 
if someone want ideas about how to do it right, I have plenty of them.
 
please note that I really love wxWidgets. but a bad decision is a bad decision, not matter how much efforts were involved.
Best regards
Armel
 

Scott

unread,
Mar 22, 2012, 12:13:50 PM3/22/12
to wx-...@googlegroups.com
On 03/22/2012 08:41 AM, Armel wrote:
> ...uncheckable by the compiler.
The debugger on two out of my three platforms can not give me the contents
of a string:

p str_variable

errors out the debugger

Vadim Zeitlin

unread,
Mar 22, 2012, 12:23:19 PM3/22/12
to wx-...@googlegroups.com
On Thu, 22 Mar 2012 08:41:04 -0700 (PDT) Armel wrote:

A> I have been now developing for years with wx and I believe it is important
A> to let you know how sad I am with the move toward new wxStrings.
A> basically they are the most complex piece of code I ever saw for such a
A> simple job.

Unfortunately the job is far from simple. I'm unhappy about wxString
complexity as well but I continue to think that we would have *no* chance
to convince people to upgrade to 3.0 without these changes, until we
preserved ANSI mode as a first class citizen which is even worse in the
long term.

In general, my advice is very simple: do *not* use wxString for textual
manipulation. Use std::string or std::wstring for this and only use
wxString to get data to or from the GUI.


A> let's take an example: const char *time = dt.ParseDate ("2012/01/01
A> 18:00:00"); seems perfectly correct isn't it ?
A> bad news time points to garbage in this case (why? because the returned
A> value is in fact a pointer to the memory of the temporarily created
A> wxString object COPYING the date string)

This is indeed a serious problem that we seem to have missed. We probably
need to provide separate overloads taking "const char*" and "const
wchar_t*". Or we could drop the overloads not returning the iterator at the
cost of breaking backwards compatibility. Doing the former would be
preferable but we need to find resources to do it :-(

A> let's take another example: wxString mysubstr(mybigstr, myiterator -
A> mybigstr.begin())
A> bad news again it works randomly: it works in wide-char build and not UTF8

What is "mybigstr" here? If it's a wxString it really should work, I don't
see why doesn't it. Could you please explain?

A> note that wxString::length sometimes returned a count of wxStringCharType
A> (in wide char build), and a count of code point (in UTF8),

But this is supposed to be the same thing. I.e. length() returns the
number of "characters" in the string, not "bytes". Of course, there are
further problems (composed vs decomposed and so on) but I don't understand
what is the difference between wchar_t and UTF-8 builds here. Again, could
you please give more details?

A> another problem beside the inconsistencies is how sluggish wx has become,
A> passing from wx 2.6 to wx 2.9, the startup for the same program is around
A> twice as long.

Is this only in UTF-8 build or with wchar_t build too?

A> the executable is nearly 40% bigger with zero features added.

This is a serious problem but it has nothing to do with wxString (or at
least not much). It's due to wrong/bad/mixed up link dependencies in wx.
E.g. now all wx programs link in wxGraphicsContext code, even if they
absolutely don't use it. Once again, this is a problem but unfortunately
it's not easy to solve at all (i.e. basically I have no idea what to do
about it realistically). And it's completely unrelated to wxString/Unicode
changes which added at most a few dozens KB to the binaries size, which is
completely negligible compared to other things.


A> a string iterator in UTF8 is now an object with chaining (6 pointers!), and
A> that only to care about the replace() function. most modern frameworks have
A> simply NO in place replace(). why? because in place replace() is just a
A> hack rarely usable. in place, string are immutable and the concept of an
A> efficient string_builder is used.

You're not speaking about C++ here... Anyhow, realistically what do you
suggest? Making wxString immutable? Whatever you think about immutable
strings this just can't happen.

A> all of that to avoid a few wxT() macros.

No, all this to get rid of separate ANSI build mode and finally make sure
that we have a single official wxWidgets build.

A> sorry for the rant, but please realize that many many programs developers
A> will eventually upgrade from 2.8 to 3.0 branch one day, and those
A> developers do not care AT ALL about this tiny wxT()

Oh but they do care about getting tens of thousands compilation errors
which they'd get when attempting to rebuild their code with 3.0 without
these changes. It doesn't matter how simple each of this errors is to fix,
*nobody* would do it and wxWidgets 3.0 would be dead.

A> but will definitely care about the programs not working any more for
A> really obscure reasons. I fear that most of them will simply go away,
A> putting their wx stuff to the bin and going the qt or gtk way.

This is exactly and certainly what would happen if we didn't provide
compatibility (for 99%, it's not perfect) between 2.8 and 3.0.

A> if someone want ideas about how to do it right, I have plenty of them.

Yes, please share.

A> please note that I really love wxWidgets. but a bad decision is a bad
A> decision, not matter how much efforts were involved.

I don't think it's a bad decision. There are bugs (ParseDate() one is a
bad one, we do need to do something about it) but globally the transition
from 2.8 ANSI build has worked much better than I thought it would.

Regards,
VZ

Vadim Zeitlin

unread,
Mar 22, 2012, 12:24:15 PM3/22/12
to wx-...@googlegroups.com
On Thu, 22 Mar 2012 09:13:50 -0700 Scott wrote:

S> On 03/22/2012 08:41 AM, Armel wrote:
S> > ...uncheckable by the compiler.
S> The debugger on two out of my three platforms can not give me the contents
S> of a string:
S>
S> p str_variable
S>
S> errors out the debugger

Change the debugger. Both MSVC and gdb are perfectly fine.

Please let's keep this discussion concentrated on really important points
instead of your local misconfiguration problems.

VZ

Julian Smart

unread,
Mar 22, 2012, 12:30:03 PM3/22/12
to wx-...@googlegroups.com
I'm not very comfortable with the idea of having different
representations of string on different platforms and personally I'd
rather have just stuck with the 2.8-style implementation where each
character is what you think it is (and not potentially split into 2 or
more characters). I don't think the ASCII/Unicode unification of builds
is a such a huge advantage that we needed to complicate things so much.
For translated strings you still need to use a macro around the string
anyway. However, I could be wrong, maybe my use cases are different from
most other people's. It just leaves me with an uneasy feeling though,
that at some point I'm going to come across some inefficiency that's
hard to work around, or a string-related bug that's hard to diagnose.
Having said that, from cursory testing my applications on 2.9 don't seem
to have ground to a halt - maybe a little bit slower in some cases e.g.
when loading large XML files.

Certainly there's a size increase in 2.9 but I don't think much of that
can be laid at the door of wxString. The size increase for gcc can be
mitigated somewhat by linker optimizations (compile with
-ffunction-sections -fdata-sections, link with --gc-section).

Regards,

Julian

M Gagnon

unread,
Mar 22, 2012, 12:32:03 PM3/22/12
to wx-...@googlegroups.com
Hi,


I'm not a core devs but here's my 2 cents


>
> let's take an example: const char *time = dt.ParseDate ("2012/01/01
> 18:00:00"); seems perfectly correct isn't it ?
> bad news time points to garbage in this case (why? because the returned
> value is in fact a pointer to the memory of the temporarily created
> wxString object COPYING the date string)

I don't think storing the result of a conversion from wxString to char* ever worked, IMO
this is not new with wx 2.9. The good way is always to use wxString everywhere,
conversion to char* is usable for calling a function but not to store. I agree
this is a little confusing but you just need to lose the habit of using char*


>
> let's take another example: wxString mysubstr(mybigstr, myiterator -
> mybigstr.begin())

> bad news again it works randomly: it works in wide-char build and not UTF8
>
> note that wxString::length sometimes returned a count of wxStringCharType
> (in wide char build), and a count of code point (in UTF8), so it is called
> 'length' in UTF8 although this is not what is expected anymore for a length
> to wxString constructor (by the way the meaning changed since 2.6/2.8).
>

That I can't comment, but indeed dealing with UTF code points is always a little though


> another problem beside the inconsistencies is how sluggish wx has become,
> passing from wx 2.6 to wx 2.9, the startup for the same program is around
> twice as long.

> the executable is nearly 40% bigger with zero features added.

Maybe you can measure exactly what is slower; I suspect that you may be passing strings by copy around, and wx 2.8 optimised copy by using ref-counting

one big added feature of the change in question (removing ref-counting in strings) is that they are now much less tricky to use from threads, since passing by copy will now actually make a copy, which is good for threads.

so if you pass wxStrings around by copy, the change is simple, pass them by reference instead and you should get the speed back

-- Auria

Armel

unread,
Mar 22, 2012, 12:35:35 PM3/22/12
to wx-...@googlegroups.com
 yes this is a bug in gdb 7.0 up to 7.2, on Debian there is a way by adding a "squeeze-backports" source of packages and "Forcing version" on GDB to 7.3

Julian Smart

unread,
Mar 22, 2012, 12:50:22 PM3/22/12
to wx-...@googlegroups.com
On 22/03/2012 16:23, Vadim Zeitlin wrote:
> In general, my advice is very simple: do *not* use wxString for textual
> manipulation. Use std::string or std::wstring for this and only use
> wxString to get data to or from the GUI.
This is possibly the scariest thing I've heard in 2012 so far. I use
wxString *extensively* for data storage and manipulation. All the time,
for almost every class I ever write. It's a basic building block, and I
can't comprehend how the developer of a programming framework can say
"don't use our string class". Especially now it's such a goliath of a
class, and you don't want me to use except for GUI trivia? Has wxWidgets
2.9 been built upon the assumption that no-one has used wxString for
serious work?

I'm really lost for words. I don't want to use 2 or even 3 different
kinds of string. I just want to use the wxString kindly provided for us,
so I can be sure it'll be compatible with wxWidgets functions and
isolated from any standard library strangeness that might exist for any
given compiler, and I certainly don't have to worry about converting
between different classes of string, or whether I should be using a wide
character string or not.

Am I mad to be using wxString? Well, it's a bit late now that's for sure...

Julian

Vadim Zeitlin

unread,
Mar 22, 2012, 12:55:00 PM3/22/12
to wx-...@googlegroups.com
On Thu, 22 Mar 2012 16:30:03 +0000 Julian Smart wrote:

JS> I'm not very comfortable with the idea of having different
JS> representations of string on different platforms

But this is not what is being discussed here. The possibility to use UTF-8
internally in wxString is almost completely orthogonal to merging of the
ANSI and Unicode builds. IMHO using UTF-8 internally just makes more sense
for wxGTK (and wxDFB for which it was originally done) but it's not such a
huge deal and, in fact, can be easily changed with a single configure
switch.

The idea of having wxString representing either/both narrow or/and wide
character string is much more fundamental and is the real reason of the
code complexity and the bugs pointed out in the original message.

JS> I don't think the ASCII/Unicode unification of builds is a such a huge
JS> advantage that we needed to complicate things so much.

Of course, it's a huge advantage. If nothing else, having a version of the
library not supporting Unicode in 21st century was just an embarrassment.
Also, for people comparing wx with Qt or GTK it was clearly a big
disadvantage to wx right there as neither of these libraries had this idea
of using separate ANSI and Unicode builds. And it definitely saves time to
me, as a wx maintainer.

Regards,
VZ

Paul Cornett

unread,
Mar 22, 2012, 12:56:02 PM3/22/12
to wx-...@googlegroups.com
Vadim Zeitlin wrote:
> A> the executable is nearly 40% bigger with zero features added.
>
> This is a serious problem but it has nothing to do with wxString (or
> at least not much). It's due to wrong/bad/mixed up link dependencies
> in wx.

The link dependency problems were there before 2.9, it doesn't really
explain the large size increase. I investigated this some time ago, and
my conclusion is the size increase is mainly due to the more extensive
use of templates, much of which is wxString-related. Templates are
inline, and the generated code just bloats.

Also note that assertions are now enabled even in release builds unless
explicitly disabled.

Vadim Zeitlin

unread,
Mar 22, 2012, 1:03:35 PM3/22/12
to wx-...@googlegroups.com
On Thu, 22 Mar 2012 16:50:22 +0000 Julian Smart wrote:

JS> On 22/03/2012 16:23, Vadim Zeitlin wrote:
JS> > In general, my advice is very simple: do not use wxString for textual
JS> > manipulation. Use std::string or std::wstring for this and only use
JS> > wxString to get data to or from the GUI.
JS> This is possibly the scariest thing I've heard in 2012 so far. I use
JS> wxString extensively for data storage and manipulation. All the time,
JS> for almost every class I ever write. It's a basic building block, and I
JS> can't comprehend how the developer of a programming framework can say
JS> "don't use our string class".

Because he finally woke up and noticed it was 2012 outside the window and
C++11 standard was ratified and the standard library was available on all
platforms for at least the last 10 years or so?

Have you ever seen people inventing their own string classes in Java?
Python? Perl? Why should you advise them to do this in C++ which also has a
perfectly usable standard string class already? I can't comprehend how can
you seriously think that advising using our string class instead of the
standard one can be seriously considered.

JS> Am I mad to be using wxString? Well, it's a bit late now that's for sure...

You would be wrong to use it in new code. Obviously huge amounts of the
existing code use it and clearly it should continue to work. But equally
clearly you should use std::[w]string in any new code you write because
there is absolutely no freaking reason your GUI framework should dictate
your choice of the string class, if only because strings are commonly used
in the code that has nothing to do with GUI at all and so might not even
link with wx.

You're free to ignore my recommendations, you're good enough to make your
code work with whatever string class you use but for newbies who learn to
use std::string in their C++ tutorials/classes I do strongly recommend that
they keep using it in their wxWidgets programs too and just avoid all
wxString-related complications. There is absolutely no reason whatsoever to
prefer wxString to std::string when writing new code.

Regards,
VZ

Vadim Zeitlin

unread,
Mar 22, 2012, 1:08:59 PM3/22/12
to wx-...@googlegroups.com
On Thu, 22 Mar 2012 09:56:02 -0700 Paul Cornett wrote:

PC> Vadim Zeitlin wrote:
PC> > A> the executable is nearly 40% bigger with zero features added.
PC> >
PC> > This is a serious problem but it has nothing to do with wxString (or
PC> > at least not much). It's due to wrong/bad/mixed up link dependencies
PC> > in wx.
PC>
PC> The link dependency problems were there before 2.9, it doesn't really
PC> explain the large size increase.

It does, at least partially, because a lot of new code was added and this
code is being linked in even when it's not used because of these
dependencies. In fact every time I rebuild Mahogany with latest wx it gains
a couple of hundreds KBs, even when there were no fundamental changes in
the few weeks since the last rebuild and no changes in Mahogany itself.
This is really unfortunate but as I said, I just don't know what to do
about it. Perhaps our next GSoC project could be to fight the code bloat...

PC> I investigated this some time ago, and my conclusion is the size
PC> increase is mainly due to the more extensive use of templates, much of
PC> which is wxString-related. Templates are inline, and the generated code
PC> just bloats.

Perhaps it's worse with g++ than with MSVC. And in any case I don't think
the difference between 2.8 and 2.9.0 was that big, I think it was < 10% on
average. However there was a big increase with every subsequent release
too, even though wxString didn't change any more.

I only took time to try to find out the reason of the big size increase of
the minimal sample for 2.9.3 and I found out that it was due to linking in
of all the wxGC stuff due to my wxTopLevelWindow::SetShape() changes. This
alone increased the minimal.exe size by ~500KB IIRC. And, again, it would
be great if this could be avoided but I don't know how.

PC> Also note that assertions are now enabled even in release builds unless
PC> explicitly disabled.

This was ~5% in my tests, so not a big deal for such a useful feature.

Regards,
VZ

Vadim Zeitlin

unread,
Mar 22, 2012, 1:12:33 PM3/22/12
to wx-...@googlegroups.com
To all those who doubt that Unicode/ANSI build merge was beneficial, let
me present you this message:

------ Forwarded message ------
From: Vanangamudi <gang.of....@gmail.com>
Date: Thu, 22 Mar 2012 09:54:52 -0700 (PDT)
Subject: What is the difference??
To: wx-u...@googlegroups.com


Here is the error I got:

Compiling: AppIDE.cpp
/home/arulmozhi/Programming/Arduino/Arduino01/AppIDE.cpp: In member
function ‘virtual bool AppIDE::OnInit()’:
/home/arulmozhi/Programming/Arduino/Arduino01/AppIDE.cpp:7: error:
conversion from ‘const char [4]’ to ‘wxString’ is ambiguous
/usr/include/wx-2.8/wx/string.h:692: note: candidates are:
wxString::wxString(wxChar, size_t) <near match>
/usr/include/wx-2.8/wx/string.h:682: note:
wxString::wxString(int) <near match>
...
-------- End of message -------

that I received less than 3 minutes ago (and no, I didn't pay a gang of
convicts to send it to the list at just the right moment, even though I
could hardly have done better if I did).

If you think not having such problems any more in 2.9 is not worth the
effort, you're living in a different world from me. In particular you're
definitely not providing support for wx on the mailing list because if you
did, you'd be willing to give an arm and a leg to just never see such
questions again in the future -- after seeing 12,348,875 of them during the
last 15 years.

Regards,
VZ

Julian Smart

unread,
Mar 22, 2012, 1:27:48 PM3/22/12
to wx-...@googlegroups.com
I think effectively deprecating [certain uses of] a fundamental
framework class is a bit of a strange and drastic step. We've always
been able to depend on wxString. Why bother using several string classes
when one would do? One of the advantages of wxWidgets is the longevity
of code written with it. If you have several hundred thousand lines of
code, you don't want to be told that much of it was based on what was
absolutely the right class to use then, but is no longer the right class
now because it's no longer efficient or trendy. Suddenly I should use a
different class just because other people do.

BTW I just read a forum post by an experienced Qt developer/advocate,
recommending people to use QString whenever they need to use a string.
It's just one post, but it may indicate that Qt doesn't consider QString
to be a second-class citizen. Certainly the QString documentation
doesn't flag up any efficiency concerns or recommendations to use a
different class.

Regards,

Julian


--
Julian Smart, Anthemion Software Ltd.
www.anthemion.co.uk | +44 (0)131 229 5306
Tools for writers: www.writerscafe.co.uk
Ebook creation: www.jutoh.com
wxWidgets RAD: www.dialogblocks.com

Vadim Zeitlin

unread,
Mar 22, 2012, 1:49:17 PM3/22/12
to wx-...@googlegroups.com
On Thu, 22 Mar 2012 17:27:48 +0000 Julian Smart wrote:

JS> I think effectively deprecating [certain uses of] a fundamental
JS> framework class is a bit of a strange and drastic step. We've always
JS> been able to depend on wxString.

We also have been able to depend on wxList. And wxArray. And
wxIndividualLayoutConstraint. Does this mean we should continue to use them
forever?

JS> Why bother using several string classes when one would do?

In general because different classes have different advantages and this
applies here too. But you're asking the question about something I never
proposed. I don't think you should use different string classes, I think
you should only use one of them, namely std::basic_string. And conversions
to wxString should only happen on GUI code boundary and, ideally,
transparently (as they do happen in wxUSE_STL build).

JS> One of the advantages of wxWidgets is the longevity of code written
JS> with it.

Are you seriously suggesting wxString will survive past std::string?

JS> If you have several hundred thousand lines of code, you don't want to
JS> be told that much of it was based on what was absolutely the right
JS> class to use then

It wasn't for more than 15 years. I don't know why did you suddenly take
an issue with what I wrote today but I keep saying it for as long as I can
remember and it finally became practical since 2.9.0 (4 years ago?). The
non-GUI code should not use GUI classes and wxString is a GUI class.

JS> BTW I just read a forum post by an experienced Qt developer/advocate,
JS> recommending people to use QString whenever they need to use a string.

Qt has never been a C++ framework. Its basic premise is that it's better
to change the language than to adopt to it. FWIW this was also the main
reason I've decided use wx rather than Qt originally. wxWidgets approach
should be different.

I know that some people disagree with this to which I can only answer: if
you believe wx should (for ever, no less!) use its own string, date,
thread, socket, database, ... classes please come and write and maintain
them. As long as I'm doing it, I'd rather decrease my amount of work on
such stuff and concentrate on wx core mission, i.e. GUI stuff.

Regards,
VZ

Eric Jensen

unread,
Mar 22, 2012, 1:51:08 PM3/22/12
to wx-...@googlegroups.com
Hello Julian,

Thursday, March 22, 2012, 5:50:22 PM, you wrote:

JS> On 22/03/2012 16:23, Vadim Zeitlin wrote:
>> In general, my advice is very simple: do *not* use wxString for textual
>> manipulation. Use std::string or std::wstring for this and only use
>> wxString to get data to or from the GUI.

JS> This is possibly the scariest thing I've heard in 2012 so far. I use
JS> wxString *extensively* for data storage and manipulation. All the time,
JS> for almost every class I ever write.
i have to admit i'm pretty staggered about Vadims statement myself.
Especially since i preach to other wx users they should always use
wxString inside the "wx world" and only use std::string when
communicating with other non-wx libraries.

I also use wxString everywhere for everything in my code and i don't
really like the idea of mixing std and wxString. One thing i prefer
with wxString is that i consider it a "black box" containing a unicode
string, whereare std::string is just a "dump" byte container and i
have to make sure myself to put the correct bytes into it and out of
it.

If i wanted to use std::string, what would be the equivalent for the
wxConv* classes, like wxConvUTF8 etc?

Regards,
Eric


Pete Bannister

unread,
Mar 22, 2012, 1:56:33 PM3/22/12
to wx-...@googlegroups.com
And why wouldn't you use strings of the same (or derived) type used by
the application framework? What's the point of hitting the heap to pass
a string to the UI?
Also, why would you prefer to use std::string when the implementation is
not defined by the standard and you need to know what the performance
characteristics are? Is it reference counted or not? Who knows? How do
you convert it to utf8 or utf16 using the stl, without writing a
code_cvt facet, when wxString simply has a function to do it?

wxString using std::string internally caused a few headaches for me when
I tried porting to 2.9. One of these was down to me using _SECURE_SCL=0
in my application build (msvc2008) - which then also needed to be
applied to wx to avoid violating the one definition rule and getting
random crashes. So every update, this is another build setting that
needs tweaking. Took a while to figure that one out.
Since sizers also appeared to work in a completely different way and
nearly all of my layouts were broken, I decided not to bother moving to
2.9. This was over a year ago so perhaps things have improved somewhat
since then.

Vadim Zeitlin

unread,
Mar 22, 2012, 2:13:49 PM3/22/12
to wx-...@googlegroups.com
On Thu, 22 Mar 2012 18:51:08 +0100 Eric Jensen wrote:

EJ> i have to admit i'm pretty staggered about Vadims statement myself.
EJ> Especially since i preach to other wx users they should always use
EJ> wxString inside the "wx world" and only use std::string when
EJ> communicating with other non-wx libraries.

Let me make this clear: my position is that this is a wrong approach and
that there is no such thing as "wx world" and that wx is just another of
the libraries you use and you shouldn't make your code hostage to it just
as you shouldn't make it hostage to any other 3rd party library. Now if
you're absolutely and definitely sure that you're never going to use
anything other than wx and don't care to keep your non-GUI code separated
from it, go ahead and use wxString to your heart contents, nobody prevents
you from doing this.

Next, and important, point is that if enough people do this, then we
probably should indeed make wchar_t build the default even under Unix as
O(N^2) performance of string iteration using indices is probably going to
surprise many of those people.

But my own recommendation remains to use standard classes whenever
possible and I see absolutely no reason to make exceptions for wxString. So
just as I recommend that you use std::vector<> instead of wxList, I also
recommend that you use std::basic_string<> instead of wxString. At least
you can't accuse me of inconsistency.

EJ> If i wanted to use std::string, what would be the equivalent for the
EJ> wxConv* classes, like wxConvUTF8 etc?

There are no equivalents. You're perfectly welcome to use wxConvUTF8
classes without using wxString or wxString with them if the final result
will be passed to (or from) wxString anyhow. But this doesn't fall into the
domain of my advice which, once again, is to use std::[w]string for storing
and manipulating narrow/wide strings respectively. It says absolutely
nothing about not using wxMBConv or wxString for converting between narrow
and wide strings because of course you will use them (unless you prefer to
use a specialized library such as ICU to do this instead).

Regards,
VZ

Václav Slavík

unread,
Mar 22, 2012, 2:15:50 PM3/22/12
to wx-...@googlegroups.com
Hi,

On 22 Mar 2012, at 18:56, Pete Bannister wrote:
> wxString using std::string internally caused a few headaches for me when I tried porting to 2.9. One of these was down to me using _SECURE_SCL=0 in my application build (msvc2008) - which then also needed to be applied to wx to avoid violating the one definition rule and getting random crashes

It's really unfair to blame this on wx changes. If you use a rare, ABI-affecting compiler setting, then it's too be expected that it has to be applied to all your code consistently. It's no different with, say, _BIND_TO_CURRENT_VCLIBS_VERSION.

Vaclav

Vadim Zeitlin

unread,
Mar 22, 2012, 2:17:21 PM3/22/12
to wx-...@googlegroups.com
On Thu, 22 Mar 2012 17:56:33 +0000 Pete Bannister wrote:

PB> And why wouldn't you use strings of the same (or derived) type used by
PB> the application framework? What's the point of hitting the heap to pass
PB> a string to the UI?

For me the separation of non-GUI code from GUI is incomparably more
important than avoiding a heap allocation.

PB> Also, why would you prefer to use std::string when the implementation is
PB> not defined by the standard and you need to know what the performance
PB> characteristics are?

This is just wrong. The complexity of all string operations is defined.

PB> Is it reference counted or not? Who knows?

Who needs to know?

PB> How do you convert it to utf8 or utf16 using the stl, without writing a
PB> code_cvt facet, when wxString simply has a function to do it?

See my reply to Eric Jensen about conversions.

PB> wxString using std::string internally caused a few headaches for me when
PB> I tried porting to 2.9. One of these was down to me using _SECURE_SCL=0
PB> in my application build (msvc2008) - which then also needed to be
PB> applied to wx to avoid violating the one definition rule and getting
PB> random crashes. So every update, this is another build setting that
PB> needs tweaking. Took a while to figure that one out.

Sorry to hear this but you're obviously supposed to compile wx and your
code with the same settings. There are innumerably many bad things that can
happen if you neglect this.

PB> Since sizers also appeared to work in a completely different way and
PB> nearly all of my layouts were broken, I decided not to bother moving to
PB> 2.9. This was over a year ago so perhaps things have improved somewhat
PB> since then.

There have been some changes in sizers although IIRC more than a year ago.
There have been no important changes to wxString since then.

Regards,
VZ

Carsten Fuchs

unread,
Mar 22, 2012, 2:37:02 PM3/22/12
to wx-...@googlegroups.com
Hi all,

as I find this discussion very interesting (in the long term
perspective), and eventually sure to bring out the best, just my
personal opinion as a years-long wxWidgets user:

Most of my non-GUI code is written as standard-conforming and "pure" as
possible, that is, in this context, using std::string. In practice, when
used in GUI programs, this alone leads to frequent conversions to and
from wxString, at least more frequently than is desirable.

The only advantage of wxString over std::string (for my purposes / that
I can see) is that wxString comes with some convenience functions that
are not provided with std::string.

This is an easily fixed problem though, and overall I think that Vadim
is right, for the very reasons he mentioned: When given the choice, I'd
(almost) always prefer the standardized string, container, thread,
network etc. classes over the wxWidgets ones. And that even though I'd
call myself a conservative STL sceptic who makes doubly sure that the
newly used class is in fact supported on all targeted compilers and
platforms before employing it.

(In fact, if someday someone decided to do the radical and drop all the
wxString, wxArray, wxList and similar classes and in favour of a lean
wxWidgets API and to replace them with the STL equivalents, I'd welcome
this step and would not mind to have my entire code base updated. ;-) )

The only real question that I have is how this combines with existing
wxWidgets API: Shouldn't there a clear deprecation plan, and/or
overloads with the STL equivalents?

Best regards,
Carsten

Am 22.03.2012 18:49, schrieb Vadim Zeitlin:
> I know that some people disagree with this to which I can only answer: if
> you believe wx should (for ever, no less!) use its own string, date,
> thread, socket, database, ... classes please come and write and maintain
> them. As long as I'm doing it, I'd rather decrease my amount of work on
> such stuff and concentrate on wx core mission, i.e. GUI stuff.
>
> Regards,
> VZ

--
Cafu - the open-source Game and Graphics Engine
for multiplayer, cross-platform, real-time 3D Action
Learn more at http://www.cafu.de

Bryan Petty

unread,
Mar 22, 2012, 2:52:23 PM3/22/12
to wx-...@googlegroups.com
On Thu, Mar 22, 2012 at 11:12 AM, Vadim Zeitlin <va...@wxwidgets.org> wrote:
>  If you think not having such problems any more in 2.9 is not worth the
> effort, you're living in a different world from me. In particular you're
> definitely not providing support for wx on the mailing list because if you
> did, you'd be willing to give an arm and a leg to just never see such
> questions again in the future -- after seeing 12,348,875 of them during the
> last 15 years.

Completely agreed, and you aren't even counting the number of times
I've seen this same stupid problem posted on the forums and the IRC
channel as well.

I can't tell you how relieved I was to see these builds combined, and
removing the need for _T() / wxT() leaving just _() for translation,
also making it much more clear to immediately tell what strings are
properly marked for translation.

Not only will the problems with linking to incorrect string builds be
removed, but it also cuts down on all those related support questions
coming from developers just confused about the differences between
those methods (especially since the 'T' in _T() / wxT() does NOT stand
for "translate" as so many developers have just assumed).

Regards,
Bryan Petty

Julian Smart

unread,
Mar 22, 2012, 2:53:54 PM3/22/12
to wx-...@googlegroups.com
On 22/03/2012 18:37, Carsten Fuchs wrote:
> (In fact, if someday someone decided to do the radical and drop all
> the wxString, wxArray, wxList and similar classes and in favour of a
> lean wxWidgets API and to replace them with the STL equivalents, I'd
> welcome this step and would not mind to have my entire code base
> updated. ;-) )
On the other hand, I think I have better things to do with my time than
replace one string class with another for absolutely no gain (for me). I
can see that with some other container classes, there might be
advantages, but - it's a just a string... please just let me just get on
with building on what I've done and trying to make a living rather than
throwing code grenades in my path.

Regards,

Julian

Václav Slavík

unread,
Mar 22, 2012, 3:16:22 PM3/22/12
to wx-...@googlegroups.com

On 22 Mar 2012, at 19:53, Julian Smart wrote:
> On the other hand, I think I have better things to do with my time than replace one string class with another for absolutely no gain (for me).

As Vadim keeps saying: then use it. His recommendation (and FWIW, I fully agree with every word of it, and it applies equally to any other C++ library, not just wx) is that new code should use standard classes that are part of the language instead of some home-grown variants.

> trying to make a living rather than throwing code grenades in my path.

Don't you think you overblow it a little bit here?

Vaclav

Julian Smart

unread,
Mar 22, 2012, 3:29:07 PM3/22/12
to wx-...@googlegroups.com
On 22/03/2012 19:16, V�clav Slav�k wrote:
>
>> trying to make a living rather than throwing code grenades in my path.
> Don't you think you overblow it a little bit here?
I'm hoping so :-)

Julian

Carsten Fuchs

unread,
Mar 22, 2012, 3:29:28 PM3/22/12
to wx-...@googlegroups.com
Hi Julian,

Am 22.03.2012 19:53, schrieb Julian Smart:
> On the other hand, I think I have better things to do with my time than
> replace one string class with another for absolutely no gain (for me). I
> can see that with some other container classes, there might be
> advantages, but - it's a just a string... please just let me just get on
> with building on what I've done and trying to make a living rather than
> throwing code grenades in my path.

My mail was in no way meant to be offensive, and I apologize if it did.

I also know from own experience that "needless
backwards-INcompatibility" can be horrible, especially for code that was
written and "forgotten" long ago and that you just want to continue to work.

But on the other hand, I found that standard classes significantly help
with new or changing members in the development team, and with code that
is actively developed or maintained.

Best regards,
Carsten

Stephan van den Akker

unread,
Mar 22, 2012, 5:10:18 PM3/22/12
to wx-...@googlegroups.com
+1, Carsten.

Recently in Redmond an event was staged and Mr. Stroustrup was invited in as a hero to celebrate the new C++11 standard.

This means the standard has won! No more half baked STL implementations in important C++ compilers. Times have changed....

wxWidgets is as important as ever for me, but more and more just to shield me against the differences between say GTK+ and WIN32 for GUI work.

Armel

unread,
Mar 22, 2012, 5:11:56 PM3/22/12
to wx-...@googlegroups.com

Le jeudi 22 mars 2012 17:23:19 UTC+1, VZ a écrit :
On Thu, 22 Mar 2012 08:41:04 -0700 (PDT) Armel wrote:

A> I have been now developing for years with wx and I believe it is important
A> to let you know how sad I am with the move toward new wxStrings.
A> basically they are the most complex piece of code I ever saw for such a
A> simple job.

 Unfortunately the job is far from simple. I'm unhappy about wxString
complexity as well but I continue to think that we would have *no* chance
to convince people to upgrade to 3.0 without these changes, until we
preserved ANSI mode as a first class citizen which is even worse in the
long term.

 In general, my advice is very simple: do *not* use wxString for textual
manipulation. Use std::string or std::wstring for this and only use
wxString to get data to or from the GUI.

why not, but there is a good deal of code which used to work in clipboard for example which does not anymore (problem on Cocoa if i remember well, every time I try to paste arabic text and/or file names) , i'll have to create tickets, debug, make patch and so on.
 


A> let's take an example: const char *time = dt.ParseDate ("2012/01/01
A> 18:00:00"); seems perfectly correct isn't it ?
A> bad news time points to garbage in this case (why? because the returned
A> value is in fact a pointer to the memory of the temporarily created
A> wxString object COPYING the date string)

 This is indeed a serious problem that we seem to have missed. We probably
need to provide separate overloads taking "const char*" and "const
wchar_t*". Or we could drop the overloads not returning the iterator at the
cost of breaking backwards compatibility. Doing the former would be
preferable but we need to find resources to do it :-(

in my experience it is far better to break compatibility when we would overwise defeat compiler checks. this is my mantra: if compatibility is not possible, force code rewrite (even one char) be sure the developer knows.
 

A> let's take another example: wxString mysubstr(mybigstr, myiterator -
A> mybigstr.begin())
A> bad news again it works randomly: it works in wide-char build and not UTF8

 What is "mybigstr" here? If it's a wxString it really should work, I don't
see why doesn't it. Could you please explain?

yes a wxString, in addition when the size_t len passed in second argument is larger than the string length it seems that different stl implementation behaves differently (or there is something else I missed): on gcc 4.4.5 with wxString("", 4) => I get a string whose end() is at start, and whose length() is 4 ! who ooh, I let you imagine the beautiful firework I got in ParseTime (using wxString(time, timeString.length()), on VS10 STL it seems to work... maybe again an artefact of the UTF8 treatment...
 

A> note that wxString::length sometimes returned a count of wxStringCharType
A> (in wide char build), and a count of code point (in UTF8),

 But this is supposed to be the same thing. I.e. length() returns the
number of "characters" in the string, not "bytes". Of course, there are
further problems (composed vs decomposed and so on) but I don't understand
what is the difference between wchar_t and UTF-8 builds here. Again, could
you please give more details?

no it does not return the number of characters, this is a number of code points which is millions of light year from characters (let's read about Indic scripts and you'll understand why I say that).
in UTF8, it tries to be clever when it should not. In UTF16 this is simply the difference of the basic_string::const_iterator  returned by begin()/end(), this is maybe more a problem of "string len cache" used by UTF8 by the way. In fact, I "hope" that the difference is a simple difference in UTF16, which means that they are NOT code points either...
 
Unicode technical reports tell explicitly that a "glyph" or "character" should always be considered a string from a computer point of view. Honnestly, it is so hard to define what will occupy a single square of space on a sheet of paper from a linguistic point of view, that returning a length in "character" is just a bold lie to the developer, we don't know how to do that. It takes tons of code in libs such a FreeBiDi and its derived work (Cairo...).
 

A> another problem beside the inconsistencies is how sluggish wx has become,
A> passing from wx 2.6 to wx 2.9, the startup for the same program is around
A> twice as long.

 Is this only in UTF-8 build or with wchar_t build too?

both. one problem amongst other is that, wxT() used to save us a conversion when going from compiler static string to wxString, now every single static string needs various allocations, copies and so on just to load it. a good class of string would simply reference such strings (this is what I have here at Ellié Computing).
it seems also partly due to xml handling, do you realize that every single test of an attribute or whatever revolve to at least a pair of new/delete? a simple startup with XRC totally kills the performances, searching in the wxHelp is now _dead_ slow.
 

A> the executable is nearly 40% bigger with zero features added.

 This is a serious problem but it has nothing to do with wxString (or at
least not much). It's due to wrong/bad/mixed up link dependencies in wx.
E.g. now all wx programs link in wxGraphicsContext code, even if they
absolutely don't use it. Once again, this is a problem but unfortunately
it's not easy to solve at all (i.e. basically I have no idea what to do
about it realistically). And it's completely unrelated to wxString/Unicode
changes which added at most a few dozens KB to the binaries size, which is
completely negligible compared to other things.


A> a string iterator in UTF8 is now an object with chaining (6 pointers!), and
A> that only to care about the replace() function. most modern frameworks have
A> simply NO in place replace(). why? because in place replace() is just a
A> hack rarely usable. in place, string are immutable and the concept of an
A> efficient string_builder is used.

 You're not speaking about C++ here... Anyhow, realistically what do you
suggest? Making wxString immutable? Whatever you think about immutable
strings this just can't happen.

no, of course, but replace() could invalidate iterators, this is perfectly acceptable. and have a correct underlying implementation would not hurt, we've lived years with a O(N^2) replace, we could leave with an implementation doing a few mallocs.
 

A> all of that to avoid a few wxT() macros.

 No, all this to get rid of separate ANSI build mode and finally make sure
that we have a single official wxWidgets build.

I always used ANSI/UTF8 on Unix and UTF16 on Windows, never had problems.
 

A> sorry for the rant, but please realize that many many programs developers
A> will eventually upgrade from 2.8 to 3.0 branch one day, and those
A> developers do not care AT ALL about this tiny wxT()

 Oh but they do care about getting tens of thousands compilation errors
which they'd get when attempting to rebuild their code with 3.0 without
these changes. It doesn't matter how simple each of this errors is to fix,
*nobody* would do it and wxWidgets 3.0 would be dead.

I don't see which problem there would be when upgrading without that: I went from 2.6 to 3.0 and the most of the problems I get are _because_ of this.
 

A> but will definitely care about the programs not working any more for
A> really obscure reasons. I fear that most of them will simply go away,
A> putting their wx stuff to the bin and going the qt or gtk way.

 This is exactly and certainly what would happen if we didn't provide
compatibility (for 99%, it's not perfect) between 2.8 and 3.0.

but it only fakes compatibility. unfortunately, there is such a meli melo of constructors, cast operators, proxy classes and whatever that the compiler now accpets almost anything even if its meaningless!
 

A> if someone want ideas about how to do it right, I have plenty of them.

 Yes, please share.

First a wxString object should really not contain 3 strings and an iterator should not be chained to other iterators (I wonder how constant access from many iterators in different threads would do, ok I know you'll see I just have to copy, but sometimes the same objects have to accessed and unalteration of const objects is really a necessity then). 
Second, it's not because newbies don't like or understand wxT() that it should not be there.
Third, please let us use static strings right off the constant memory of the program, with a simple macro around the text, it would trivial to make a constructor for that, and provided that it makes strings directly correct for the representation it would avoid copies completely for ALL static text.
Do reference counted string vanished? or are they still the default non STL implementation? refcounted strings are pretty good (with atomic counter of course). 
 

A> please note that I really love wxWidgets. but a bad decision is a bad
A> decision, not matter how much efforts were involved.

 I don't think it's a bad decision. There are bugs (ParseDate() one is a
bad one, we do need to do something about it) but globally the transition
from 2.8 ANSI build has worked much better than I thought it would.

the problem for me is that correct programs in 2.6 are no more correct programs with 2.9, they still compile but it crashes here and there...
I prefer to have to change two thousands line of code and have something that work with certainty than having nothing to change and have to debug for months.
 
Best regards
Armel
 

Armel

unread,
Mar 22, 2012, 5:25:06 PM3/22/12
to wx-...@googlegroups.com

Le jeudi 22 mars 2012 17:32:03 UTC+1, Marianne Gagnon a écrit :
Hi,


I'm not a core devs but here's my 2 cents
i'm not core neither :-) 

>
> let's take an example: const char *time = dt.ParseDate ("2012/01/01
> 18:00:00"); seems perfectly correct isn't it ?
> bad news time points to garbage in this case (why? because the returned
> value is in fact a pointer to the memory of the temporarily created
> wxString object COPYING the date string)

I don't think storing the result of a conversion from wxString to char* ever worked, IMO
this is not new with wx 2.9. The good way is always to use wxString everywhere,
conversion to char* is usable for calling a function but not to store. I agree
this is a little confusing but you just need to lose the habit of using char*
my problem here is that it compiles silently and it was THE way to do it before 2.9... and its something that any compiler can see it immediately, BUT the new strings defeat the compiler unfortunately.
 

>
> let's take another example: wxString mysubstr(mybigstr, myiterator -
> mybigstr.begin())
> bad news again it works randomly: it works in wide-char build and not UTF8
>
> note that wxString::length sometimes returned a count of wxStringCharType
> (in wide char build), and a count of code point (in UTF8), so it is called
> 'length' in UTF8 although this is not what is expected anymore for a length
> to wxString constructor (by the way the meaning changed since 2.6/2.8).
>

That I can't comment, but indeed dealing with UTF code points is always a little though
 indeed and as it is far too complex for wxString, it should not handle them at all in length() related function rather than believe it makes something clever while just making our life impossible.
 


> another problem beside the inconsistencies is how sluggish wx has become,
> passing from wx 2.6 to wx 2.9, the startup for the same program is around
> twice as long.
> the executable is nearly 40% bigger with zero features added.

Maybe you can measure exactly what is slower; I suspect that you may be passing strings by copy around, and wx 2.8 optimised copy by using ref-counting

 honnestly, almost everything is slower (for example, it seems that event map structures which used to be really static are now half baked and are not constructed by the compiler at compile time but rather by global initialization code, XML/XRC is totally sluggish, help stuff/search is dead slow, strings are converting/new/delete all the time). A bit slower, I would not have care but that much it's just not possible.
 

one big added feature of the change in question (removing ref-counting in strings) is that they are now much less tricky to use from threads, since passing by copy will now actually make a copy, which is good for threads.

 tricky? why? our own tool ECMerge is heavily parallelized and I don't have a single problem with ref counting (OK I had to hack wxString here to make it use atomic increments in our 2.6 version, I don't remember if it was accepted as a patch BTW).

> so if you pass wxStrings around by copy, the change is simple, pass them by reference instead and you should get the speed back
there are so many copies everywhere with wxScopedBuffers and things like that I'm just puzzled, do anyone still counts new/delete? the two slower functions of the computing realm...

> -- Auria
Regards
Armel

Brian Ravnsgaard Riis

unread,
Mar 22, 2012, 5:30:45 PM3/22/12
to wx-...@googlegroups.com
On 22-03-2012 19:17, Vadim Zeitlin wrote:
> PB> Also, why would you prefer to use std::string when the implementation is
> PB> not defined by the standard and you need to know what the performance
> PB> characteristics are?
>
> This is just wrong. The complexity of all string operations is defined.

Yes, but the size of the string object itself may have various overhead
associated, and this is ok by the standard.

>
> PB> Is it reference counted or not? Who knows?
>
> Who needs to know?

Two things: As mentioned by Herb Sutter a loong time ago, a
reference-counted string implementation is an optimization that isn't
[an optimization], at least if you write multi-threaded.

Second (and in part because of this) in C++11 you *do* know: A C++11
compliant std::string is not permitted to be reference-counted.

/Brian

Vadim Zeitlin

unread,
Mar 22, 2012, 5:41:44 PM3/22/12
to wx-...@googlegroups.com
On Thu, 22 Mar 2012 14:11:56 -0700 (PDT) Armel wrote:

A> Le jeudi 22 mars 2012 17:23:19 UTC+1, VZ a écrit :
A> >
A> > In general, my advice is very simple: do not use wxString for textual
A> > manipulation. Use std::string or std::wstring for this and only use
A> > wxString to get data to or from the GUI.
A> >
A> why not, but there is a good deal of code which used to work in clipboard
A> for example which does not anymore (problem on Cocoa if i remember well,
A> every time I try to paste arabic text and/or file names) , i'll have to
A> create tickets, debug, make patch and so on.

Please let's not mix up everything. The changes to wxString you started
discussing were intentional. The problem with the clipboard under Cocoa is
clearly a bug and is definitely not intentional.

A> > A> let's take an example: const char *time = dt.ParseDate ("2012/01/01
A> > A> 18:00:00"); seems perfectly correct isn't it ?
A> > A> bad news time points to garbage in this case (why? because the returned
A> > A> value is in fact a pointer to the memory of the temporarily created
A> > A> wxString object COPYING the date string)
A> >
A> > This is indeed a serious problem that we seem to have missed. We probably
A> > need to provide separate overloads taking "const char*" and "const
A> > wchar_t*". Or we could drop the overloads not returning the iterator at the
A> > cost of breaking backwards compatibility. Doing the former would be
A> > preferable but we need to find resources to do it :-(
A> >
A> in my experience it is far better to break compatibility when we would
A> overwise defeat compiler checks. this is my mantra: if compatibility is not
A> possible, force code rewrite (even one char) be sure the developer knows.

Unfortunately it's not one char. Replacing pointers with iterators is not
that simple, even if it's usually straightforward enough.

Anyhow, hopefully providing the above mentioned overloads should fix this
bug. Could you please open a ticket for it so that I don't forget to do it?

A> > A> let's take another example: wxString mysubstr(mybigstr, myiterator -
A> > A> mybigstr.begin())
A> > A> bad news again it works randomly: it works in wide-char build and not
A> > UTF8
A> >
A> > What is "mybigstr" here? If it's a wxString it really should work, I don't
A> > see why doesn't it. Could you please explain?
A> >
A> yes a wxString, in addition when the size_t len passed in second argument
A> is larger than the string length it seems that different stl implementation
A> behaves differently (or there is something else I missed): on gcc 4.4.5
A> with wxString("", 4) => I get a string whose end() is at start, and whose
A> length() is 4 ! who ooh, I let you imagine the beautiful firework I got in
A> ParseTime (using wxString(time, timeString.length()), on VS10 STL it seems
A> to work... maybe again an artefact of the UTF8 treatment...

Sorry, I don't understand anything here. First, I can't reproduce the
problem with the ctor you mention. Could you please make a patch to the
unit test showing it?

Second, the behaviour of wxString("", 4) is well defined, has nothing to
do with std::string and is never something you want because it's going to
create a string of length 4 with the initial character being '\0' and the
rest of them being whatever garbage happens to be in memory. Why would you
ever expect this to work, std::string or not and 2.8 or 2.9? This really
has nothing to do with any wx problem, it's just a bug in your code.


A> no it does not return the number of characters, this is a number of code
A> points which is millions of light year from characters (let's read
A> about Indic scripts and you'll understand why I say that).
A> in UTF8, it tries to be clever when it should not.

That's the point, it doesn't try to be clever. length() always returns the
number of code points.

A> In UTF16 this is simply the difference of the
A> basic_string::const_iterator returned by begin()/end(),

Which is also the number of code points, ignoring surrogates (which we
shouldn't ignore, of course, but, again, we did ignore them in 2.8 too, and
nothing changed in 2.9 concerning this so let's discuss this separately).

Where is the problem?

A> Unicode technical reports tell explicitly that a "glyph" or "character"
A> should always be considered a string from a computer point of view.

Which is absolutely correct and has no relationship to the changes in wx
2.9. wxString::length() never tried to do anything "clever" and always just
returned you the number of code points and to the best of my knowledge
still does. Again, I see a lot of anger in your post about some problem but
I have absolutely no idea what are you actually complaining about. Please
provide more details about what is wrong here because I just don't see it.

A> > A> another problem beside the inconsistencies is how sluggish wx has
A> > become,
A> > A> passing from wx 2.6 to wx 2.9, the startup for the same program is
A> > around
A> > A> twice as long.
A> >
A> > Is this only in UTF-8 build or with wchar_t build too?
A> >
A> both.

This is surprising as I'd definitely expect UTF-8 to introduce more
overhead.

A> one problem amongst other is that, wxT() used to save us a conversion
A> when going from compiler static string to wxString, now every single static
A> string needs various allocations, copies and so on just to load it.

You can use wxS() in performance sensitive places. You just don't *have
to*.

A> a good class of string would simply reference such strings (this is what
A> I have here at Ellié Computing).

You're perfectly free to use custom immutable string classes, always were
and always will be. wxString was never like this though, std::string or
not. Again, I probably seem very defensive but I just don't understand why
do we have to mix everything up. Half of the things you say don't have
anything at all to do with 2.9 changes (or at least I don't see how do they
do and you don't say).

A> it seems also partly due to xml handling, do you realize that every single
A> test of an attribute or whatever revolve to at least a pair of new/delete?

We benchmarked XRC specifically and didn't find anything nearly close to
50% slowdown. Maybe some performance regression happened since then, of
course. But it's not immediately obvious to me what could it be and I don't
know which extra memory allocations do you mean.

A> > A> all of that to avoid a few wxT() macros.
A> >
A> > No, all this to get rid of separate ANSI build mode and finally make sure
A> > that we have a single official wxWidgets build.
A> >
A> I always used ANSI/UTF8 on Unix and UTF16 on Windows, never had problems.

Sorry, I just don't understand at all what do you mean. There had been no
UTF-8 build of wx in 2.8. If you use "ANSI" build of wx under Unix, then
you didn't have any Unicode support at all.

A> > A> but will definitely care about the programs not working any more for
A> > A> really obscure reasons. I fear that most of them will simply go away,
A> > A> putting their wx stuff to the bin and going the qt or gtk way.
A> >
A> > This is exactly and certainly what would happen if we didn't provide
A> > compatibility (for 99%, it's not perfect) between 2.8 and 3.0.
A> >
A> but it only fakes compatibility. unfortunately, there is such a meli melo
A> of constructors, cast operators, proxy classes and whatever that the
A> compiler now accpets almost anything even if its meaningless!

This is just not true. You found a bug in wxDateTime::ParseDate(), true.
But this is an isolated case and not nearly the end of the world you make
it out to be.

A> > A> if someone want ideas about how to do it right, I have plenty of them.
A> >
A> > Yes, please share.
A> >
A> First a wxString object should really not contain 3 strings

This is unavoidable to be able to continue to handle both the code that
worked with ANSI and Unicode builds of wx 2.8 with the same library. We
need to be able to return it as either wchar_t or char string and pointers
into it must be persistent so we have to keep the conversion results. Of
course, wxString usually contains at most 2 strings and not 3...

A> and an iterator should not be chained to other iterators

This is unfortunate but also unavoidable (or let us know how to avoid it)
as you can see from the comment in the relevant part of the code.

A> Second, it's not because newbies don't like or understand wxT() that it
A> should not be there.

This is not "second", this is "zero" and I completely disagree with this.
We tried with wxT() for 15 years and it was still a daily source of
problems. Have you even seen the email I forwarded from wx-users?
Surprising that it didn't receive nearly the same number of replies as my
totally innocent email saying that using std::string is wiser than using
wxString...

Of course if you get rid of the main motivation for all these changes
things become much simpler. And wxWidgets loses 50% of its users forever.
Is it a good deal? I don't think so, so let's assume this requirement is
there. When I was asking you to share your thoughts it was about how to
implement it better, not about how to avoid implementing it at all.

A> Third, please let us use static strings right off the constant memory of
A> the program, with a simple macro around the text, it would trivial to make
A> a constructor for that, and provided that it makes strings directly correct
A> for the representation

See wxS().

A> it would avoid copies completely for ALL static text..

Of course it wouldn't, wxString always copies the text internally, whether
you use wxT(), wxS() or nothing at all. What you save with wxS() is the
conversion, that's all. But the heap allocation (unless std::string used
internally uses small string optimization and the string is indeed small)
and copying always happen, just as it did in 2.8. You're complaining about
something that absolutely never worked as you think it did!

A> Do reference counted string vanished? or are they still the default non STL
A> implementation? refcounted strings are pretty good (with atomic counter of
A> course).

Ref counted strings turned out to be horribly bad which is why absolutely
nobody uses them any more. But yes, they're still provided if you set
wxUSE_STD_STRING to 0. Good luck with the bugs...

A> > A> please note that I really love wxWidgets. but a bad decision is a bad
A> > A> decision, not matter how much efforts were involved.
A> >
A> > I don't think it's a bad decision. There are bugs (ParseDate() one is a
A> > bad one, we do need to do something about it) but globally the transition
A> > from 2.8 ANSI build has worked much better than I thought it would.
A> >
A> the problem for me is that correct programs in 2.6 are no more correct
A> programs with 2.9, they still compile but it crashes here and there...

Please report bugs as you find them. For now I see exactly one bug which,
I agree, is a serious one but, again, this is not nearly as catastrophic as
could be surmised from reading your message. Is it?

VZ

Lauri Nurmi

unread,
Mar 22, 2012, 6:11:17 PM3/22/12
to wx-...@googlegroups.com
22.3.2012 23:41, Vadim Zeitlin kirjoitti:
> On Thu, 22 Mar 2012 14:11:56 -0700 (PDT) Armel wrote:
>
> A> I always used ANSI/UTF8 on Unix and UTF16 on Windows, never had problems.
>
> Sorry, I just don't understand at all what do you mean. There had been no
> UTF-8 build of wx in 2.8. If you use "ANSI" build of wx under Unix, then
> you didn't have any Unicode support at all.

This is not quite true. If you use the ANSI build of wxGTK 2.8 and your
locale is UTF-8 (like it is by default since 10 years), then your
application does have Unicode support. Wx itself doesn't know it, but
strings of the GUI are UTF-8, and showing and inputing e.g. cyrillic
text and latin letters with an umlaut at the same time is not a problem.
99% of things work nicely.


LN

Armel

unread,
Mar 23, 2012, 4:44:34 AM3/23/12
to wx-...@googlegroups.com

Le jeudi 22 mars 2012 22:41:44 UTC+1, VZ a écrit :
On Thu, 22 Mar 2012 14:11:56 -0700 (PDT) Armel wrote:

A> Le jeudi 22 mars 2012 17:23:19 UTC+1, VZ a écrit :
A> >
A> >  In general, my advice is very simple: do not use wxString for textual
A> > manipulation. Use std::string or std::wstring for this and only use
A> > wxString to get data to or from the GUI.
A> >
A> why not, but there is a good deal of code which used to work in clipboard
A> for example which does not anymore (problem on Cocoa if i remember well,
A> every time I try to paste arabic text and/or file names) , i'll have to
A> create tickets, debug, make patch and so on.

 Please let's not mix up everything. The changes to wxString you started
discussing were intentional. The problem with the clipboard under Cocoa is
clearly a bug and is definitely not intentional.

the problem here is that the number of unintentional impacts won't go down, in particular when wx2.8 users will upgrade.
 
A> > A> let's take an example: const char *time = dt.ParseDate ("2012/01/01
A> > A> 18:00:00"); seems perfectly correct isn't it ?
A> > A> bad news time points to garbage in this case (why? because the returned
A> > A> value is in fact a pointer to the memory of the temporarily created
A> > A> wxString object COPYING the date string)
A> >
A> >  This is indeed a serious problem that we seem to have missed. We probably
A> > need to provide separate overloads taking "const char*" and "const
A> > wchar_t*". Or we could drop the overloads not returning the iterator at the
A> > cost of breaking backwards compatibility. Doing the former would be
A> > preferable but we need to find resources to do it :-(
A> >
A> in my experience it is far better to break compatibility when we would
A> overwise defeat compiler checks. this is my mantra: if compatibility is not
A> possible, force code rewrite (even one char) be sure the developer knows.

 Unfortunately it's not one char. Replacing pointers with iterators is not
that simple, even if it's usually straightforward enough.

 Anyhow, hopefully providing the above mentioned overloads should fix this
bug. Could you please open a ticket for it so that I don't forget to do it?

i'll do, by the way, should we deprecate those w/char* functions, at least some code cleaning would have saved me these bugs
 

A> > A> let's take another example: wxString mysubstr(mybigstr, myiterator -
A> > A> mybigstr.begin())
A> > A> bad news again it works randomly: it works in wide-char build and not
A> > UTF8
A> >
A> >  What is "mybigstr" here? If it's a wxString it really should work, I don't
A> > see why doesn't it. Could you please explain?
A> >
A> yes a wxString, in addition when the size_t len passed in second argument
A> is larger than the string length it seems that different stl implementation
A> behaves differently (or there is something else I missed): on gcc 4.4.5
A> with wxString("", 4) => I get a string whose end() is at start, and whose
A> length() is 4 ! who ooh, I let you imagine the beautiful firework I got in
A> ParseTime (using wxString(time, timeString.length()), on VS10 STL it seems
A> to work... maybe again an artefact of the UTF8 treatment...

 Sorry, I don't understand anything here. First, I can't reproduce the
problem with the ctor you mention. Could you please make a patch to the
unit test showing it?

i'll try to.
 

 Second, the behaviour of wxString("", 4) is well defined, has nothing to
do with std::string and is never something you want because it's going to
create a string of length 4 with the initial character being '\0' and the
rest of them being whatever garbage happens to be in memory. Why would you
ever expect this to work, std::string or not and 2.8 or 2.9? This really
has nothing to do with any wx problem, it's just a bug in your code.

OK, OK, I wrote it too fast, this is wxString mystr1(""); wxString mystr2(mystr1, 4).


A> no it does not return the number of characters, this is a number of code
A> points which is millions of light year from characters (let's read
A> about Indic scripts and you'll understand why I say that).
A> in UTF8, it tries to be clever when it should not.

 That's the point, it doesn't try to be clever. length() always returns the
number of code points.

it _should not_, see explanation for UTF16
 

A> In UTF16 this is simply the difference of the
A> basic_string::const_iterator  returned by begin()/end(),

 Which is also the number of code points, ignoring surrogates (which we
shouldn't ignore, of course, but, again, we did ignore them in 2.8 too, and
nothing changed in 2.9 concerning this so let's discuss this separately).

so this is not the number of code points, this is a number coding entities (uint16), it absolutely should continue to do so ! and UTF8 or UTF32 variant want that as well. Please realize that if few newbies catch the wxT() stuff, I am really sure of one thing: none of them understand all that code points stuff.
 
 

 Where is the problem?

the value returned by wxString::length() is inconsistent what is called a "length" anywhere in the wx code and API, and I have to put #if wxUSE_UNICODE_UTF8 / WCHAR_T in my code to bypass what should be trivially defined, when it is complex, might be slow and unusable.
 
so please make the length() of a UTF8 string just that: the number of bytes in it, as it is expected by string constructors for example.
 

A> Unicode technical reports tell explicitly that a "glyph" or "character"
A> should always be considered a string from a computer point of view.

 Which is absolutely correct and has no relationship to the changes in wx
2.9. wxString::length() never tried to do anything "clever" and always just
returned you the number of code points and to the best of my knowledge
still does. Again, I see a lot of anger in your post about some problem but
I have absolutely no idea what are you actually complaining about. Please
provide more details about what is wrong here because I just don't see it.

no it was not the number of code points, it was the number of bytes in ASCII and the number of uint16 on Windows: it was PERFECT like that
 
 

A> > A> another problem beside the inconsistencies is how sluggish wx has
A> > become,
A> > A> passing from wx 2.6 to wx 2.9, the startup for the same program is
A> > around
A> > A> twice as long.
A> >
A> >  Is this only in UTF-8 build or with wchar_t build too?
A> >
A> both.

 This is surprising as I'd definitely expect UTF-8 to introduce more
overhead.

A> one problem amongst other is that, wxT() used to save us a conversion
A> when going from compiler static string to wxString, now every single static
A> string needs various allocations, copies and so on just to load it.

 You can use wxS() in performance sensitive places. You just don't *have
to*.

we cannot anymore, e.g. some structures require const char* which make it impossible to use, some wxT() were removed, and most code is now written expecting wxString parameters, so now it require char -> wxString through conversion on Windows, when it used to require no conversion/no copy at all before. old code will be more impacted.
 

A> a good class of string would simply reference such strings (this is what
A> I have here at Ellié Computing).

 You're perfectly free to use custom immutable string classes, always were
and always will be. wxString was never like this though, std::string or
not. Again, I probably seem very defensive but I just don't understand why
do we have to mix everything up. Half of the things you say don't have
anything at all to do with 2.9 changes (or at least I don't see how do they
do and you don't say).

granted.
 
 

A> it seems also partly due to xml handling, do you realize that every single
A> test of an attribute or whatever revolve to at least a pair of new/delete?

 We benchmarked XRC specifically and didn't find anything nearly close to
50% slowdown. Maybe some performance regression happened since then, of
course. But it's not immediately obvious to me what could it be and I don't
know which extra memory allocations do you mean.

A> > A> all of that to avoid a few wxT() macros.
A> >
A> >  No, all this to get rid of separate ANSI build mode and finally make sure
A> > that we have a single official wxWidgets build.
A> >
A> I always used ANSI/UTF8 on Unix and UTF16 on Windows, never had problems.

 Sorry, I just don't understand at all what do you mean. There had been no
UTF-8 build of wx in 2.8. If you use "ANSI" build of wx under Unix, then
you didn't have any Unicode support at all.

yes, no UTF8 handling worked better than what is there now. probably mostly because of the belief that length() should return code points count, it should not. the 'ANSI' treatment ignored UTF8 and thanks to that it did it right.
 
 

A> > A> but will definitely care about the programs not working any more for
A> > A> really obscure reasons. I fear that most of them will simply go away,
A> > A> putting their wx stuff to the bin and going the qt or gtk way.
A> >
A> >  This is exactly and certainly what would happen if we didn't provide
A> > compatibility (for 99%, it's not perfect) between 2.8 and 3.0.
A> >
A> but it only fakes compatibility. unfortunately, there is such a meli melo
A> of constructors, cast operators, proxy classes and whatever that the
A> compiler now accpets almost anything even if its meaningless!

 This is just not true. You found a bug in wxDateTime::ParseDate(), true.
But this is an isolated case and not nearly the end of the world you make
it out to be.

it does not seem to be as isolated as you state, when I try to copy arabic on Mac, bang.
 
 

A> > A> if someone want ideas about how to do it right, I have plenty of them.
A> >
A> >  Yes, please share.
A> >
A> First a wxString object should really not contain 3 strings

 This is unavoidable to be able to continue to handle both the code that
worked with ANSI and Unicode builds of wx 2.8 with the same library. We
need to be able to return it as either wchar_t or char string and pointers
into it must be persistent so we have to keep the conversion results. Of
course, wxString usually contains at most 2 strings and not 3...

A> and an iterator should not be chained to other iterators

 This is unfortunate but also unavoidable (or let us know how to avoid it)
as you can see from the comment in the relevant part of the code.

A> Second, it's not because newbies don't like or understand wxT() that it
A> should not be there.

 This is not "second", this is "zero" and I completely disagree with this.
We tried with wxT() for 15 years and it was still a daily source of
problems. Have you even seen the email I forwarded from wx-users?
Surprising that it didn't receive nearly the same number of replies as my
totally innocent email saying that using std::string is wiser than using
wxString...

the problem with std::string I believe is that it lacks this "encoding" aspect that help us cope with file names (e.g. Mac), clipboard, widgets specifics... it is not really "multi platform", it is very good but does not take this reality into account for us.
 
 Of course if you get rid of the main motivation for all these changes
things become much simpler. And wxWidgets loses 50% of its users forever.
Is it a good deal? I don't think so, so let's assume this requirement is
there. When I was asking you to share your thoughts it was about how to
implement it better, not about how to avoid implementing it at all.
for one thing, make length() the number of bytes in UTF8 and operator [] return char in UTF8 as well: this way they are consistent and at least, it does not fool the user into thinking he handles characters.
 
 

A> Third, please let us use static strings right off the constant memory of
A> the program, with a simple macro around the text, it would trivial to make
A> a constructor for that, and provided that it makes strings directly correct
A> for the representation

 See wxS().

A> it would avoid copies completely for ALL static text..

 Of course it wouldn't, wxString always copies the text internally, whether
you use wxT(), wxS() or nothing at all. What you save with wxS() is the
conversion, that's all. But the heap allocation (unless std::string used
internally uses small string optimization and the string is indeed small)
and copying always happen, just as it did in 2.8. You're complaining about
something that absolutely never worked as you think it did!

I know it never existed on wx. it does not mean that it would not be profitable.
 

A> Do reference counted string vanished? or are they still the default non STL
A> implementation? refcounted strings are pretty good (with atomic counter of
A> course).

 Ref counted strings turned out to be horribly bad which is why absolutely
nobody uses them any more. But yes, they're still provided if you set
wxUSE_STD_STRING to 0. Good luck with the bugs...

woohoo, which bugs? I used them in our previous build and never had bugs with them, in spite of threaded stuff.
 

A> > A> please note that I really love wxWidgets. but a bad decision is a bad
A> > A> decision, not matter how much efforts were involved.
A> >
A> >  I don't think it's a bad decision. There are bugs (ParseDate() one is a
A> > bad one, we do need to do something about it) but globally the transition
A> > from 2.8 ANSI build has worked much better than I thought it would.
A> >
A> the problem for me is that correct programs in 2.6 are no more correct
A> programs with 2.9, they still compile but it crashes here and there...

 Please report bugs as you find them. For now I see exactly one bug which,
I agree, is a serious one but, again, this is not nearly as catastrophic as
could be surmised from reading your message. Is it?

I'll report the bugs no problem.
 
I believe this is more or less the deception of 2.9 being stuffed with far too much changes, with such a long release cycle: basically if I want the features I am forced to use a half-baked framework :-( with admitedly not so many bugs, but after few days of crashing gdb every two minutes, I was a bit tired (it was caused by thread variables not well supported by the stable version of gdb 7.0 in Debian, could not print a wxString).
 
If only it had been possible to make a first wxString-only release and then go on with the features :-(
 
Best regards
Armel

Armel

unread,
Mar 23, 2012, 4:46:38 AM3/23/12
to wx-...@googlegroups.com
+1

Pete Bannister

unread,
Mar 23, 2012, 5:04:26 AM3/23/12
to wx-...@googlegroups.com
>> wxString using std::string internally caused a few headaches for me when I tried porting to 2.9. One of these was down to me using _SECURE_SCL=0 in my application build (msvc2008) - which then also needed to be applied to wx to avoid violating the one definition rule and getting random crashes
> It's really unfair to blame this on wx changes. If you use a rare, ABI-affecting compiler setting, then it's too be expected that it has to be applied to all your code consistently. It's no different with, say, _BIND_TO_CURRENT_VCLIBS_VERSION.
>
That is fair enough - just mentioning it as a complication, not meaning
to blame wx for this.

Pete Bannister

unread,
Mar 23, 2012, 5:33:50 AM3/23/12
to wx-...@googlegroups.com
Thanks for the reply Vadim. I don't mean to be overly critical - just
making a couple of observations.

> On Thu, 22 Mar 2012 17:56:33 +0000 Pete Bannister wrote:
>
> PB> And why wouldn't you use strings of the same (or derived) type used by
> PB> the application framework? What's the point of hitting the heap to pass
> PB> a string to the UI?
>
> For me the separation of non-GUI code from GUI is incomparably more
> important than avoiding a heap allocation.
In most cases I'd completely agree with you here, but if you are trying
to optimize GUI code that relies on non-GUI data then this can be a
problem. Its not really an issue for most things - copying a string to
set up the text of a label isn't exactly going to kill performance. If
you are doing something that requires high performance repainting of
many strings then you want to avoid allocating loads of stuff every
repaint. In which case you need to use wxString for non-GUI data.
Profiling has shown this to be a bottleneck for me in some cases.

> PB> Also, why would you prefer to use std::string when the implementation is
> PB> not defined by the standard and you need to know what the performance
> PB> characteristics are?
>
> This is just wrong. The complexity of all string operations is defined.

Not sure I agree with you here - the complexity of the copy operation is
different if it is reference counted and if it isn't. Yes, there are
several schools of thought on whether or not reference counted strings
outperform non-ref counted ones, particularly if the reference count is
thread safe. At least with the old wxString you know exactly how it
works on every platform. The same cannot be said for std::string in
this respect.


>
> PB> Is it reference counted or not? Who knows?
>
> Who needs to know?

Its better to know, surely? Again in many cases its a moot point but
occasionally it is important for performance critical code, and
additionally for thread safe code. If you KNOW that the string uses
reference counting and doesn't use a thread safe reference count, that
has very important implications.

Vadim Zeitlin

unread,
Mar 23, 2012, 8:22:40 AM3/23/12
to wx-...@googlegroups.com
On Fri, 23 Mar 2012 09:33:50 +0000 Pete Bannister wrote:

PB> > On Thu, 22 Mar 2012 17:56:33 +0000 Pete Bannister wrote:
PB> >

PB> > PB> And why wouldn't you use strings of the same (or derived) type used by
PB> > PB> the application framework? What's the point of hitting the heap to pass
PB> > PB> a string to the UI?
PB> >
PB> > For me the separation of non-GUI code from GUI is incomparably more
PB> > important than avoiding a heap allocation.
PB> In most cases I'd completely agree with you here, but if you are trying
PB> to optimize GUI code that relies on non-GUI data then this can be a
PB> problem.

Yes, this is definitely true. And when profiling shows a bottleneck you
need to fix by avoiding conversions and, perhaps, using something else than
std::string (e.g. a custom data structure) for storing the text. But I'm
speaking about 99% of the normal use, i.e. setting labels or text of
controls with less than megabytes of text in which case this is totally
unnoticeable. And in this case you should, IMNSHO, use std::string by
default.

PB> > PB> Also, why would you prefer to use std::string when the implementation is
PB> > PB> not defined by the standard and you need to know what the performance
PB> > PB> characteristics are?
PB> >
PB> > This is just wrong. The complexity of all string operations is defined.
PB> Not sure I agree with you here - the complexity of the copy operation is
PB> different if it is reference counted and if it isn't. Yes, there are
PB> several schools of thought on whether or not reference counted strings
PB> outperform non-ref counted ones, particularly if the reference count is
PB> thread safe. At least with the old wxString you know exactly how it
PB> works on every platform. The same cannot be said for std::string in
PB> this respect.

By now std::string is not ref-counted under any platform any more... And
with C++11 it doesn't risk to change back again.

PB> > PB> Is it reference counted or not? Who knows?
PB> >
PB> > Who needs to know?
PB> Its better to know, surely?

I don't think so because you can't rely on this. It can use ref counting
in one version but not in the next one -- which is exactly what happened to
both wxString and std::string.

Regards,
VZ

Vadim Zeitlin

unread,
Mar 23, 2012, 9:08:42 AM3/23/12
to wx-...@googlegroups.com
On Fri, 23 Mar 2012 01:44:34 -0700 (PDT) Armel wrote:

A> > Please let's not mix up everything. The changes to wxString you started
A> > discussing were intentional. The problem with the clipboard under Cocoa is
A> > clearly a bug and is definitely not intentional.
A> >
A> the problem here is that the number of unintentional impacts won't go down,
A> in particular when wx2.8 users will upgrade.

If you upgrade to an entire new port (wxOSX/Cocoa) which didn't exist at
all in 2.8. It's unfortunate that it still has bugs and believe me I wish
it hadn't. But it's completely unfair to claim that the bugs in the new
port are somehow regressions due to wxString changes.

A> i'll do, by the way, should we deprecate those w/char* functions, at least
A> some code cleaning would have saved me these bugs

We don't deprecate 2.8 compatibility functions in 3.0 yet as getting
thousands of deprecation warnings when upgrading from 2.8 would be quite
annoying (even if better than getting thousands errors). We could/will do
it in 3.2 probably.

A> > Second, the behaviour of wxString("", 4) is well defined, has nothing to
A> > do with std::string and is never something you want because it's going to
A> > create a string of length 4 with the initial character being '\0' and the
A> > rest of them being whatever garbage happens to be in memory. Why would you
A> > ever expect this to work, std::string or not and 2.8 or 2.9? This really
A> > has nothing to do with any wx problem, it's just a bug in your code.
A> >
A> OK, OK, I wrote it too fast, this is wxString mystr1(""); wxString
A> mystr2(mystr1, 4).

You're right, there is a bug here, and I see you already even tracked it
down in #14130. I'll try to check in a fix soon -- and thanks for reporting
this, it's really useful and much appreciated!

A> > That's the point, it doesn't try to be clever. length() always returns the
A> > number of code points.
A> >
A> it _should not_, see explanation for UTF16
...
A> > A> In UTF16 this is simply the difference of the
A> > A> basic_string::const_iterator returned by begin()/end(),
A> >
A> > Which is also the number of code points, ignoring surrogates (which we
A> > shouldn't ignore, of course, but, again, we did ignore them in 2.8 too, and
A> > nothing changed in 2.9 concerning this so let's discuss this separately).
A> >
A> so this is not the number of code points, this is a number coding entities

Sorry, I don't understand your terminology. "Coding entities" is not
mentioned at http://unicode.org/glossary/ so what do you mean by this?
And how do you define it for UTF-8, UTF-16, UTF-32?

A> (uint16), it absolutely should continue to do so ! and UTF8 or UTF32
A> variant want that as well. Please realize that if few newbies catch the
A> wxT() stuff, I am really sure of one thing: none of them understand all
A> that code points stuff.

I think I do understand it, however. And I still don't see your point. As
I wrote above, if we ignore surrogates (and it just so happens that there
is absolutely no support for surrogates in neither 2.8 nor 2.9), the
current length() seems fine to me. And yes, it would be better to add
support for surrogates...

Anyhow, once again, what do you think should length() return? The number
of bytes in the string representation? This would be perfectly useless (and
is already available by other means). While returning the number of code
points is not incredibly useful neither -- as you surely know -- it at
least somewhat makes sense, e.g. if you always use NFKC. It is also
consistent with the meaning of String.length in just about every other
language I know.

Also, replying to a paragraph slightly further:

A> no it was not the number of code points, it was the number of bytes in
A> ASCII and the number of uint16 on Windows: it was PERFECT like that

In ASCII the number of bytes is exactly the same as number of code points,
by definition so the new behaviour is perfectly consistent with the old one
and I don't see how can it surprise anybody. OTOH your definition of length
as the "number of bytes in UTF-16 encoding of the string divided by two" is
not used by anybody else and so is very surprising.

A> > Where is the problem?
A> >
A> the value returned by wxString::length() is inconsistent what is called a
A> "length" anywhere in the wx code and API, and I have to put #if
A> wxUSE_UNICODE_UTF8 / WCHAR_T in my code to bypass what should be trivially
A> defined, when it is complex, might be slow and unusable.
A>
A> so please make the length() of a UTF8 string just that: the number of bytes
A> in it, as it is expected by string constructors for example.

This is just totally *WRONG*. The "length" argument taken by string
constructors is the length in the same sense as the length returned by
length() method. This is completely tautological, of course, but I felt I
had to mention this just to make it perfectly clear.

You shouldn't need any #ifs and there is absolutely no inconsistency
(well, again, unless there are more bugs somewhere, but it definitely was
intended to work like I say). I think this is the root cause of your
discontent. Of course if you think that you need to use different
functions/conversions in Windows and Unix builds things look pretty
horrible. But this is just not the case, the same code works on both
platforms. The _only_ implications can be at performance level but if you
care about this (and you should, in performance-critical places) you must
use string iterators and not indices.


A> > Sorry, I just don't understand at all what do you mean. There had been no
A> > UTF-8 build of wx in 2.8. If you use "ANSI" build of wx under Unix, then
A> > you didn't have any Unicode support at all.
A> >
A> yes, no UTF8 handling worked better than what is there now. probably mostly
A> because of the belief that length() should return code points count, it
A> should not. the 'ANSI' treatment ignored UTF8 and thanks to that it did it
A> right.

If you were happy with using ANSI build in Unicode-aware programs before
(although this was never a good idea as too many things didn't work
correctly and you definitely needed plenty of #ifs to distinguish from
Windows version which makes a bit of a mockery of a cross-platform
framework) you should be happy with using wxUSE_UTF8_LOCALE_ONLY now. This
avoids all the conversions and uses UTF-8 exclusively.

A> it does not seem to be as isolated as you state, when I try to copy arabic
A> on Mac, bang.

This has nothing to do with Unicode changes.

A> the problem with std::string I believe is that it lacks this "encoding"
A> aspect that help us cope with file names (e.g. Mac), clipboard, widgets
A> specifics... it is not really "multi platform", it is very good but does
A> not take this reality into account for us.

std::string is for byte strings, i.e. either for 7 bit ASCII or for
non-textual data at all. For anything else you have std::wstring and you
don't care about its encoding.

A> > Of course if you get rid of the main motivation for all these changes
A> > things become much simpler. And wxWidgets loses 50% of its users forever.
A> > Is it a good deal? I don't think so, so let's assume this requirement is
A> > there. When I was asking you to share your thoughts it was about how to
A> > implement it better, not about how to avoid implementing it at all.
A> >
A> for one thing, make length() the number of bytes in UTF8 and operator []
A> return char in UTF8 as well: this way they are consistent and at least, it
A> does not fool the user into thinking he handles characters.

That would be totally unacceptable as a simple for loop over string
indices wouldn't work at all any more. I find it funny how you complain
about relatively minor compatibility breakage but propose to totally break
everything yourself.


A> If only it had been possible to make a first wxString-only release and then
A> go on with the features :-(

We did make 2.9.0 and invited comments back then. And we did receive them,
of course, and many bugs were fixed. But some apparently still remain.
Hopefully they will be fixed by 2.9.4 time and then 3.0 won't have them.

Regards,
VZ

Vadim Zeitlin

unread,
Mar 23, 2012, 9:48:24 AM3/23/12
to wx-...@googlegroups.com
On Fri, 23 Mar 2012 01:44:34 -0700 (PDT) Armel wrote:

A> Le jeudi 22 mars 2012 22:41:44 UTC+1, VZ a écrit :
A> >
A> > Where is the problem?
A> >
A> the value returned by wxString::length() is inconsistent what is called a
A> "length" anywhere in the wx code and API, and I have to put #if
A> wxUSE_UNICODE_UTF8 / WCHAR_T in my code to bypass what should be trivially
A> defined, when it is complex, might be slow and unusable.

I have to amend my previous reply: it's true that the "length" argument of
the function overloads taking "const char*" is the number of bytes and so
it probably should be renamed to "size" to make it perfectly clear. Of
course, there is only a difference between them when you use UTF-8 and in
this case you should be using wxString::FromUTF8() &c because all the
functions taking "const char*" are not guaranteed to use UTF-8 at all so
they shouldn't be used when you know that your string is in UTF-8.

However it still remains that the "length" everywhere else, i.e. when
you're dealing with any string indices, is the number of code points (in
first approximation, i.e. forgetting about surrogates in UTF-16 builds) and
this is perfectly consistent with length() return value.

A> so please make the length() of a UTF8 string just that: the number of bytes
A> in it, as it is expected by string constructors for example.

String constructors taking wxString expect number of code points, *not*
bytes. String constructors taking "const char*" expect something in the
current locale encoding which can be UTF-8 or not and so must not be used
with UTF-8 strings.

Regards,
VZ

Armel

unread,
Mar 23, 2012, 9:51:38 AM3/23/12
to wx-...@googlegroups.com

[...]

 

A> > A> In UTF16 this is simply the difference of the
A> > A> basic_string::const_iterator  returned by begin()/end(),
A> >
A> >  Which is also the number of code points, ignoring surrogates (which we
A> > shouldn't ignore, of course, but, again, we did ignore them in 2.8 too, and
A> > nothing changed in 2.9 concerning this so let's discuss this separately).
A> >
A> so this is not the number of code points, this is a number coding entities

 Sorry, I don't understand your terminology. "Coding entities" is not
mentioned at http://unicode.org/glossary/ so what do you mean by this?
And how do you define it for UTF-8, UTF-16, UTF-32?

OK they call that Coding Unit.
 

A> (uint16), it absolutely should continue to do so ! and UTF8 or UTF32
A> variant want that as well. Please realize that if few newbies catch the
A> wxT() stuff, I am really sure of one thing: none of them understand all
A> that code points stuff.

 I think I do understand it, however. And I still don't see your point. As
I wrote above, if we ignore surrogates (and it just so happens that there
is absolutely no support for surrogates in neither 2.8 nor 2.9), the
current length() seems fine to me. And yes, it would be better to add
support for surrogates...

 Anyhow, once again, what do you think should length() return? The number
of bytes in the string representation? This would be perfectly useless (and
is already available by other means). While returning the number of code
points is not incredibly useful neither -- as you surely know -- it at
least somewhat makes sense, e.g. if you always use NFKC. It is also
consistent with the meaning of String.length in just about every other
language I know.

many languages do not propose at all composed characters, in particular those were it is typical to have several diacritic-like symbols attached to a single base character.
 

 Also, replying to a paragraph slightly further:

A> no it was not the number of code points, it was the number of bytes in
A> ASCII and the number of uint16 on Windows: it was PERFECT like that

 In ASCII the number of bytes is exactly the same as number of code points,
by definition so the new behaviour is perfectly consistent with the old one
and I don't see how can it surprise anybody. OTOH your definition of length
as the "number of bytes in UTF-16 encoding of the string divided by two" is
not used by anybody else and so is very surprising.

Javascript uses that definition, C#, Java as well. I'm pretty sure that stl::wstring as well. In fact, every single piece of code out there. Are they all wrong? there is no O(1) way to return string length in code points, nobody does it.
yes it would work, it would as it ever did, a simple operator[] refering to coding units and that's all as everybody expects it. write indexed access would allow to replace coding units by coding units that's all.
I understand that you'd like something like mystring[22] = "é" to work the same on UTF8 build and wchar_t. but causing us all those troubles just for that is wrong. just put an assert on wxUnicodeCharRef trying to put something non ASCII using simple operator =() and that's all. sometimes there is no other way. by the way try to find in your code a single place where you replace a character by a non ASCII char using bare operator[](idx) = value... there is 99.9% of chances that you never do.
even with Unicode NFKC/NFC you are still not treating characters. so all of that, to _not_ treat characters? when something is not possible don't fake users into thinking it is: we _cannot_ treat characters. and it really does not help at all to treat code points it just makes things harder to understand, together with requiring a ton of unnecessary code.
 


A> If only it had been possible to make a first wxString-only release and then
A> go on with the features :-(

 We did make 2.9.0 and invited comments back then. And we did receive them,
of course, and many bugs were fixed. But some apparently still remain.
Hopefully they will be fixed by 2.9.4 time and then 3.0 won't have them.

a rule of thumb when developing is that if it is dead complex, then the design is bad.
I know i do it all the time :-(
 
Regards
Armel
 

Vadim Zeitlin

unread,
Mar 23, 2012, 10:06:10 AM3/23/12
to wx-...@googlegroups.com
On Fri, 23 Mar 2012 06:51:38 -0700 (PDT) Armel wrote:

A> > Sorry, I don't understand your terminology. "Coding entities" is not
A> > mentioned at http://unicode.org/glossary/ so what do you mean by this?
A> > And how do you define it for UTF-8, UTF-16, UTF-32?
A> >
A> OK they call that Coding Unit.

Which is, you'll notice, is perfectly useless for UTF-8.

A> many languages do not propose at all composed characters, in particular
A> those were it is typical to have several diacritic-like symbols attached to
A> a single base character.

I don't see this as a language property. How can they not "propose
composed characters" at all? If you get a string as an input, do they
automatically recompose it for you at _language_ level? I don't think so
but I don't know them enough to be sure.

A> Javascript uses that definition, C#, Java as well. I'm pretty sure that
A> stl::wstring as well. In fact, every single piece of code out there.

Obviously, any single piece of code using UTF-16 (or, in case of
std::wstring under Unix, UTF-32). For which it makes perfect sense. For
UTF-8 it doesn't.

Now look at Python, Perl, and probably anything else in use under Unix and
you'll see that they all do it. You're coming from totally Windows-centric
position where everything is UTF-16 (and surrogates are swept under the
carpet) and you deem ridiculous anything doing things differently. You may
be surprised to know that there are plenty of Unix developers who are
equally sure that anybody not using UTF-8 is not worth talking to. We're
trying to reconcile the two positions.

A> > A> for one thing, make length() the number of bytes in UTF8 and

A> > A> operator [] return char in UTF8 as well: this way they are
A> > A> consistent and at least, it does not fool the user into thinking he
A> > A> handles characters.
A> >
A> > That would be totally unacceptable as a simple for loop over string
A> > indices wouldn't work at all any more. I find it funny how you complain
A> > about relatively minor compatibility breakage but propose to totally break
A> > everything yourself.
A> >
A> yes it would work, it would as it ever did,

This is factually wrong. You never got invalid characters iterating over
the strings using indices before.

A> a simple operator[] refering to coding units and that's all as everybody
A> expects it.

Absolutely not for UTF-8.

A> I understand that you'd like something like mystring[22] = "é" to work the
A> same on UTF8 build and wchar_t. but causing us all those troubles just for
A> that is wrong.

Not allowing this to work is not even wrong. You'd be simply laughed out
of the door if you started explaining that this didn't work. I realize that
you like immutable strings and I sympathize with your feelings but the
simple fact is that we can't throw away 20 years of existing code.

Let me propose you a test: remove the possibility to modify wxString from
wx/string.h and try to recompile your program. Please let me know if you
still think it's a good idea after this.

A> just put an assert on wxUnicodeCharRef trying to put
A> something non ASCII using simple operator =() and that's all.

This is absolutely not all. You take the string "h\xc3\xa9llo" ("héllo").
Now you assign a perfectly valid ASCII character 'x' to s[2]. What do you
get with your approach? That's right, total garbage. Silently, without any
even run-time problems, let alone compile-time ones.

A> a rule of thumb when developing is that if it is dead complex, then the
A> design is bad.

I'm unhappy about the complexity of this code too but this is the price to
pay for moving forward. The only other alternative is, again, to lose 50%
of the users who didn't use Unicode build of wxWidgets before. Yes, this
would be much simpler. Losing 100% of the users would be even simpler
probably. Does it mean it's a good idea?

Regards,
VZ

Armel

unread,
Mar 23, 2012, 10:48:50 AM3/23/12
to wx-dev


On 23 mar, 15:06, Vadim Zeitlin <va...@wxwidgets.org> wrote:
> On Fri, 23 Mar 2012 06:51:38 -0700 (PDT) Armel wrote:
>
> A> >  Sorry, I don't understand your terminology. "Coding entities" is not
> A> > mentioned athttp://unicode.org/glossary/so what do you mean by this?
> A> > And how do you define it for UTF-8, UTF-16, UTF-32?
> A> >
> A> OK they call that Coding Unit.
>
>  Which is, you'll notice, is perfectly useless for UTF-8.
no it is because UTF-8 was carefully crafted to be compatible with
ASCII.

> A> many languages do not propose at all composed characters, in particular
> A> those were it is typical to have several diacritic-like symbols attached to
> A> a single base character.
>
>  I don't see this as a language property. How can they not "propose
> composed characters" at all? If you get a string as an input, do they
> automatically recompose it for you at _language_ level? I don't think so
> but I don't know them enough to be sure.
pre-composed characters are there in Unicode only for compatibility
with old encodings, they are definitely not wishable from a linguistic
point of view. and they trick most developers in thinking that a code
point is a character.

>
> A> Javascript uses that definition, C#, Java as well. I'm pretty sure that
> A> stl::wstring as well. In fact, every single piece of code out there.
>
>  Obviously, any single piece of code using UTF-16 (or, in case of
> std::wstring under Unix, UTF-32). For which it makes perfect sense. For
> UTF-8 it doesn't.
>
>  Now look at Python, Perl, and probably anything else in use under Unix and
> you'll see that they all do it. You're coming from totally Windows-centric
> position where everything is UTF-16 (and surrogates are swept under the
> carpet) and you deem ridiculous anything doing things differently. You may
> be surprised to know that there are plenty of Unix developers who are
> equally sure that anybody not using UTF-8 is not worth talking to. We're
> trying to reconcile the two positions.
I would not do almost all my dev in UTF8 if I thought that. you're
maybe not used enough to UTF8 to realize that this is only new code
with -iterators- which profit of code point orientation, and that old
algorithms thinking they are treating ASCII can remain most of the
time unchanged when considering UTF8 like ASCII.
indeed, any string search, comparison, token limit, and so on will not
care about UTF8, and string in between? you'll do an 'entire'
treatment on it most of the time, so whether there are leading/
trailing/ASCII your program does not care. and counting evertyhing in
code points will just slow down everything for zero advantage.

>
> A> > A> for one thing, make length() the number of bytes in UTF8 and
> A> > A> operator [] return char in UTF8 as well: this way they are
> A> > A> consistent and at least, it does not fool the user into thinking he
> A> > A> handles characters.
> A> >
> A> >  That would be totally unacceptable as a simple for loop over string
> A> > indices wouldn't work at all any more. I find it funny how you complain
> A> > about relatively minor compatibility breakage but propose to totally break
> A> > everything yourself.
> A> >
> A> yes it would work, it would as it ever did,
>
>  This is factually wrong. You never got invalid characters iterating over
> the strings using indices before.
>
> A> a simple operator[] refering to coding units and that's all as everybody
> A> expects it.
>
>  Absolutely not for UTF-8.
>
> A> I understand that you'd like something like mystring[22] = "é" to work the
> A> same on UTF8 build and wchar_t. but causing us all those troubles just for
> A> that is wrong.
>
>  Not allowing this to work is not even wrong. You'd be simply laughed out
> of the door if you started explaining that this didn't work. I realize that
> you like immutable strings and I sympathize with your feelings but the
> simple fact is that we can't throw away 20 years of existing code.
no I don't specifically like them, our class here is copy-on-write.
but indeed our replacement function checks that you do something
possible or switch automagically to string oriented replacement that's
all, and there is no character per character access directly on
string: if you want it you get the pointer and know you won't have any
super feature.

>  Let me propose you a test: remove the possibility to modify wxString from
> wx/string.h and try to recompile your program. Please let me know if you
> still think it's a good idea after this.

> A> just put an assert on wxUnicodeCharRef trying to put
> A> something non ASCII using simple operator =() and that's all.
>
>  This is absolutely not all. You take the string "h\xc3\xa9llo" ("héllo").
> Now you assign a perfectly valid ASCII character 'x' to s[2]. What do you
> get with your approach? That's right, total garbage. Silently, without any
> even run-time problems, let alone compile-time ones.
yeh, yeh, the assert is that it should not change the "leading/
trailing/ascii" characteristic of the char. trivial to detect; my
experience is that it would never happen, because nobody makes special
treatment on non ascii chars (maybe case handling would but no user
code does that that way, they call the framework for that). of course,
iterator based algo could still be code point oriented (I think about
reg expr for example).

>
> A> a rule of thumb when developing is that if it is dead complex, then the
> A> design is bad.
>
>  I'm unhappy about the complexity of this code too but this is the price to
> pay for moving forward. The only other alternative is, again, to lose 50%
> of the users who didn't use Unicode build of wxWidgets before. Yes, this
> would be much simpler. Losing 100% of the users would be even simpler
> probably. Does it mean it's a good idea?
we would not loose anybody. because these features serve only
hypothetical use. please do the test, search in your code indexed-char
replacements with user provided characters. you won't find any. it
simply does not happen. and with non ASCII chars? it does not happen
either -in old code-.

Regards
Armel

Vadim Zeitlin

unread,
Mar 23, 2012, 10:58:00 AM3/23/12
to wx-...@googlegroups.com
On Fri, 23 Mar 2012 07:48:50 -0700 (PDT) Armel wrote:

A> On 23 mar, 15:06, Vadim Zeitlin <va...@wxwidgets.org> wrote:


A> > On Fri, 23 Mar 2012 06:51:38 -0700 (PDT) Armel wrote:
A> >

A> > A> >  Sorry, I don't understand your terminology. "Coding entities" is not
A> > A> > mentioned athttp://unicode.org/glossary/so what do you mean by this?
A> > A> > And how do you define it for UTF-8, UTF-16, UTF-32?


A> > A> >
A> > A> OK they call that Coding Unit.

A> >
A> >  Which is, you'll notice, is perfectly useless for UTF-8.
A> no it is because UTF-8 was carefully crafted to be compatible with
A> ASCII.

Thanks, I do know this. It doesn't change what I wrote: having the length
of UTF-8 string in bytes (sorry, code units) is useless.

A> > A> many languages do not propose at all composed characters, in particular
A> > A> those were it is typical to have several diacritic-like symbols attached to
A> > A> a single base character.
A> >
A> >  I don't see this as a language property. How can they not "propose
A> > composed characters" at all? If you get a string as an input, do they
A> > automatically recompose it for you at language level? I don't think so
A> > but I don't know them enough to be sure.
A> pre-composed characters are there in Unicode only for compatibility
A> with old encodings, they are definitely not wishable from a linguistic
A> point of view. and they trick most developers in thinking that a code
A> point is a character.

They're also what 99% of people use in practice. Now I don't know if it's
a good idea but we're not going to change this.

A> you'll do an 'entire' treatment on it most of the time

The trouble is about the rest of cases.


A> > A> I understand that you'd like something like mystring[22] = "é" to work the
A> > A> same on UTF8 build and wchar_t. but causing us all those troubles just for
A> > A> that is wrong.
A> >
A> >  Not allowing this to work is not even wrong. You'd be simply laughed out
A> > of the door if you started explaining that this didn't work. I realize that
A> > you like immutable strings and I sympathize with your feelings but the
A> > simple fact is that we can't throw away 20 years of existing code.
A> no I don't specifically like them, our class here is copy-on-write.
A> but indeed our replacement function checks that you do something
A> possible or switch automagically to string oriented replacement that's
A> all, and there is no character per character access directly on
A> string: if you want it you get the pointer and know you won't have any
A> super feature.

OK, again, what about a little consistency? You complain about relatively
minor changes while doing what you suggest, i.e. removing operator[]
basically, will break *all* the existing wxWidgets code. Do you really
think it's a good idea?

A> >  Let me propose you a test: remove the possibility to modify wxString from
A> > wx/string.h and try to recompile your program. Please let me know if you
A> > still think it's a good idea after this.

Did you do this by chance?


A> > A> just put an assert on wxUnicodeCharRef trying to put
A> > A> something non ASCII using simple operator =() and that's all.
A> >
A> >  This is absolutely not all. You take the string "h\xc3\xa9llo" ("héllo").
A> > Now you assign a perfectly valid ASCII character 'x' to s[2]. What do you
A> > get with your approach? That's right, total garbage. Silently, without any
A> > even run-time problems, let alone compile-time ones.
A> yeh, yeh, the assert is that it should not change the "leading/
A> trailing/ascii" characteristic of the char. trivial to detect;

It's not trivial and in any case it's a run-time assert. So you write your
code, test it and it works correctly -- with your test data. And as soon as
your users start using it they discover that sometimes it just completely
mangles their input. What's not to like...

A> my experience is that it would never happen,

Replace "never" with "almost never" and you will see why exactly is it so
dangerous.


A> we would not loose anybody. because these features serve only
A> hypothetical use. please do the test, search in your code indexed-char
A> replacements with user provided characters. you won't find any. it
A> simply does not happen. and with non ASCII chars? it does not happen
A> either -in old code-.

If we keep (non-const) operator[], it must work. Not "work in some cases
and fatally fail with data loss in some others" but just "work". If we
remove it, we do lose 50% of users or more.

Regards,
VZ

Julian Smart

unread,
Mar 23, 2012, 11:07:47 AM3/23/12
to wx-...@googlegroups.com
On 23/03/2012 14:48, Armel wrote:
> we would not loose anybody. because these features serve only
> hypothetical use. please do the test, search in your code indexed-char
> replacements with user provided characters. you won't find any. it
> simply does not happen. and with non ASCII chars? it does not happen
> either -in old code-.
I know this is all hypothetical because wxString isn't going to change,
but out of curiosity, are you really suggesting that this code should no
longer work?

// Change horizontal bar to em dash
wxString s; // initialised somehow
for (size_t i = 0; i < s.Length(); i++)
if (s[i] == (wxChar) 8213)
s[i] = (wxChar) 8212;

I'm sure you can't be suggesting that this should not work since that
would make no sense, but it sounds like it. And if you are, what code
would replace it? Why should the programmer have to care about however
the characters happen to be implemented, whether multiple bytes or one?

Regards,

Julian

--
Julian Smart, Anthemion Software Ltd.
www.anthemion.co.uk | +44 (0)131 229 5306
Tools for writers: www.writerscafe.co.uk
Ebook creation: www.jutoh.com
wxWidgets RAD: www.dialogblocks.com

Vadim Zeitlin

unread,
Mar 23, 2012, 11:12:27 AM3/23/12
to wx-...@googlegroups.com
On Fri, 23 Mar 2012 15:07:47 +0000 Julian Smart wrote:

JS> I know this is all hypothetical because wxString isn't going to change,

FWIW I'd be more than willing to change it to improve things while still
satisfying the main requirement which is to allow migration from 2.8.

JS> but out of curiosity, are you really suggesting that this code should no
JS> longer work?
JS>
JS> // Change horizontal bar to em dash
JS> wxString s; // initialised somehow
JS> for (size_t i = 0; i < s.Length(); i++)
JS> if (s[i] == (wxChar) 8213)
JS> s[i] = (wxChar) 8212;
JS>
JS> I'm sure you can't be suggesting that this should not work since that
JS> would make no sense, but it sounds like it. And if you are, what code
JS> would replace it? Why should the programmer have to care about however
JS> the characters happen to be implemented, whether multiple bytes or one?

Let me defend Armel a little here: what he says is that such code doesn't
exist. And it's true that it's rare and is never written with explicit
wxChar values like this. However it does exist when wxChar value is
actually a variable which may come from somewhere else so I still agree
with you that it would be totally unacceptable to break it (Armel, you did
manage to perform a miracle here -- you made me agree with Julian. Maybe
you have a chance of fixing wxString too after all...).

Regards,
VZ

Armel

unread,
Mar 23, 2012, 11:27:48 AM3/23/12
to wx-dev
>  OK, again, what about a little consistency? You complain about relatively
> minor changes while doing what you suggest, i.e. removing operator[]
> basically, will break *all* the existing wxWidgets code. Do you really
> think it's a good idea?
never proposed to remove operator[], you only did.

> A> >  Let me propose you a test: remove the possibility to modify wxString from
> A> > wx/string.h and try to recompile your program. Please let me know if you
> A> > still think it's a good idea after this.
>
>  Did you do this by chance?
>
> A> > A> just put an assert on wxUnicodeCharRef trying to put
> A> > A> something non ASCII using simple operator =() and that's all.
> A> >
> A> >  This is absolutely not all. You take the string "h\xc3\xa9llo" ("héllo").
> A> > Now you assign a perfectly valid ASCII character 'x' to s[2]. What do you
> A> > get with your approach? That's right, total garbage. Silently, without any
> A> > even run-time problems, let alone compile-time ones.
> A> yeh, yeh, the assert is that it should not change the "leading/
> A> trailing/ascii" characteristic of the char. trivial to detect;
>
>  It's not trivial and in any case it's a run-time assert. So you write your
> code, test it and it works correctly -- with your test data. And as soon as
> your users start using it they discover that sometimes it just completely
> mangles their input. What's not to like...
what you mean by 'test', seems much more to me like 'loosely test' or
'test passing case' or 'not test'.

> A> my experience is that it would never happen,
>
>  Replace "never" with "almost never" and you will see why exactly is it so
> dangerous.
>
> A> we would not loose anybody. because these features serve only
> A> hypothetical use. please do the test, search in your code indexed-char
> A> replacements with user provided characters. you won't find any. it
> A> simply does not happen. and with non ASCII chars? it does not happen
> A> either -in old code-.
>
>  If we keep (non-const) operator[], it must work. Not "work in some cases
> and fatally fail with data loss in some others" but just "work". If we
> remove it, we do lose 50% of users or more.

I don't understand why you are convinced of that but i'll definitely
not persuade you, so let's this code work at least as stated.

Armel

Julian Smart

unread,
Mar 23, 2012, 11:30:23 AM3/23/12
to wx-...@googlegroups.com
On 23/03/2012 15:12, Vadim Zeitlin wrote:
> On Fri, 23 Mar 2012 15:07:47 +0000 Julian Smart wrote:
>
> JS> I know this is all hypothetical because wxString isn't going to change,
>
> FWIW I'd be more than willing to change it to improve things while still
> satisfying the main requirement which is to allow migration from 2.8.
>
> JS> but out of curiosity, are you really suggesting that this code should no
> JS> longer work?
> JS>
> JS> // Change horizontal bar to em dash
> JS> wxString s; // initialised somehow
> JS> for (size_t i = 0; i< s.Length(); i++)
> JS> if (s[i] == (wxChar) 8213)
> JS> s[i] = (wxChar) 8212;
> JS>
> JS> I'm sure you can't be suggesting that this should not work since that
> JS> would make no sense, but it sounds like it. And if you are, what code
> JS> would replace it? Why should the programmer have to care about however
> JS> the characters happen to be implemented, whether multiple bytes or one?
>
> Let me defend Armel a little here: what he says is that such code doesn't
> exist. And it's true that it's rare and is never written with explicit
> wxChar values like this.
Huh? Why 'never'? Can you explain the problem with this code and what
the alternative should be? Thanks.

Julian

Armel

unread,
Mar 23, 2012, 11:39:11 AM3/23/12
to wx-dev


On 23 mar, 16:07, Julian Smart <jul...@anthemion.co.uk> wrote:
> On 23/03/2012 14:48, Armel wrote:> we would not loose anybody. because these features serve only
> > hypothetical use. please do the test, search in your code indexed-char
> > replacements with user provided characters. you won't find any. it
> > simply does not happen. and with non ASCII chars? it does not happen
> > either -in old code-.
>
> I know this is all hypothetical because wxString isn't going to change,
> but out of curiosity, are you really suggesting that this code should no
> longer work?
>
> // Change horizontal bar to em dash
> wxString s; // initialised somehow
> for (size_t i = 0; i < s.Length(); i++)
>      if (s[i] == (wxChar) 8213)
>          s[i] = (wxChar) 8212;
>
> I'm sure you can't be suggesting that this should not work since that
> would make no sense, but it sounds like it. And if you are, what code
> would replace it? Why should the programmer have to care about however
> the characters happen to be implemented, whether multiple bytes or one?
so when you're treating a 100 MB file you do that? you do not care
that a single replacement is in O(N) (let's consider s[i] = '.') and
your algo might take a year to terminate (replacing almost all chars)?
I do.
wx-user doing real world app will care, and will not like at all to
discover the performance hit, when in front of real data (no more than
the assert I proposed).

granted, for toy strings it's ok. i don't think that the overheads in
code, memory and execution time worth that but is my opinion do not
matter, i'll live with that.

Armel

Julian Smart

unread,
Mar 23, 2012, 11:43:41 AM3/23/12
to wx-...@googlegroups.com
I don't have a 100MB file and I'm assuming that Vadim's time spent
optimizing the string class for this kind of situation has not been in
vain. Otherwise, I'll want to configure wxWidgets to use the wide
character implementation.

Out of interest, what is the correct way to do the above?

Thanks.

Julian

Václav Slavík

unread,
Mar 23, 2012, 12:03:51 PM3/23/12
to wx-...@googlegroups.com
Hi,

On 23 Mar 2012, at 16:39, Armel wrote:
> so when you're treating a 100 MB file you do that? you do not care
> that a single replacement is in O(N) (let's consider s[i] = '.') and
> your algo might take a year to terminate (replacing almost all chars)?

What you do is

1. Think hard about the appropriateness of storing _that_ in a string.
2. If it really is appropriate, quickly spot the bottleneck (simply breaking in the debugger would do).
3. Because you did read changes.txt, you quickly take note of that operator[] and update the code to use iterators. Or force the wchar_t build if you insist on keeping it.

Observe that a "fix hundreds of errors in formerly 2.8/ANSI code that no longer compiles" step is conspicuously missing, because almost all of that was handled automatically.

Vaclav

Armel

unread,
Mar 23, 2012, 12:10:20 PM3/23/12
to wx-dev
I believe that the standard approach would to make a 'result' string,
reserve the right amount of space in it, then concat source spans then
replacements until reaching the end of string. that is .replace()
behaviour isn't it? it is always in O(N) and not somewhere between
O(N) and O(N^2)

Armel

Vadim Zeitlin

unread,
Mar 23, 2012, 12:28:36 PM3/23/12
to wx-...@googlegroups.com
On Fri, 23 Mar 2012 09:10:20 -0700 (PDT) Armel wrote:

A> I believe that the standard approach would to make a 'result' string,
A> reserve the right amount of space in it, then concat source spans then
A> replacements until reaching the end of string. that is .replace()
A> behaviour isn't it? it is always in O(N) and not somewhere between
A> O(N) and O(N^2)

Which is a quite non trivial change and, worse, you have no idea that you
need to do it when you're greeted with an error message.

Without mentioning that this approach is obviously much simpler than doing
operation per character in a loop...


Now I don't say that your approach is without merit nor that you are not
partially right about subtle bugs that occur because people confuse code
points and characters all the time. But you really should realize that not
letting people to iterate over characters^W code points in an indexed for
loop is just not a solution. There many people who feel just as strongly as
you do about this but in the reverse direction and I think they also
outnumber you by at least 10 to 1 because, for better or worse, this is
what people are used to.

Regards,
VZ

Armel

unread,
Mar 23, 2012, 4:25:59 PM3/23/12
to wx-dev
I did not mean that was _the_ solution of replacement, just a
solution.
I've already given up for the iterators and all the rest by the way.
I just have to cross fingers and hope that my app won't be called
SnailMerge that's all.

Armel

Vadim Zeitlin

unread,
Mar 25, 2012, 12:46:20 PM3/25/12
to wx-...@googlegroups.com
On Thu, 22 Mar 2012 08:41:04 -0700 (PDT) Armel wrote:

A> let's take an example: const char *time = dt.ParseDate ("2012/01/01
A> 18:00:00"); seems perfectly correct isn't it ?
A> bad news time points to garbage in this case (why? because the returned
A> value is in fact a pointer to the memory of the temporarily created
A> wxString object COPYING the date string)

For the record, this was fixed in r70996.

Regards,
VZ

Dang Le

unread,
Mar 25, 2012, 10:55:26 PM3/25/12
to wx-...@googlegroups.com
Hi all,
I find this thread interesting but it make me a little bit confusing.
We used wxWidgets from version 2.8, so most of the code is using
wxString for all string-related operations.
And recently upgraded to wx 2.9.2.
We have always built wx with Unicode and STL for Windows.
Now I have some concerning questions:
- Is there a performance penalty for using wxString over std::wstring?
- According to Vadim's suggestion, we should use std::wstring for new
code, but all wx classes and functions use wxString as their parameters
or returned value. Do we have to convert from std::wstring to wxString
before and after calling to those wx functions/methods?
I think that will affect the performance. Is there any better way for
mixing wxString and std::wstring?

Thank you in advance!


On 3/23/2012 12:03 AM, Vadim Zeitlin wrote:
> On Thu, 22 Mar 2012 16:50:22 +0000 Julian Smart wrote:
>
> JS> On 22/03/2012 16:23, Vadim Zeitlin wrote:
> JS> > In general, my advice is very simple: do not use wxString for textual
> JS> > manipulation. Use std::string or std::wstring for this and only use
> JS> > wxString to get data to or from the GUI.
> JS> This is possibly the scariest thing I've heard in 2012 so far. I use
> JS> wxString extensively for data storage and manipulation. All the time,
> JS> for almost every class I ever write. It's a basic building block, and I
> JS> can't comprehend how the developer of a programming framework can say
> JS> "don't use our string class".
>
> Because he finally woke up and noticed it was 2012 outside the window and
> C++11 standard was ratified and the standard library was available on all
> platforms for at least the last 10 years or so?
>
> Have you ever seen people inventing their own string classes in Java?
> Python? Perl? Why should you advise them to do this in C++ which also has a
> perfectly usable standard string class already? I can't comprehend how can
> you seriously think that advising using our string class instead of the
> standard one can be seriously considered.
>
> JS> Am I mad to be using wxString? Well, it's a bit late now that's for sure...
>
> You would be wrong to use it in new code. Obviously huge amounts of the
> existing code use it and clearly it should continue to work. But equally
> clearly you should use std::[w]string in any new code you write because
> there is absolutely no freaking reason your GUI framework should dictate
> your choice of the string class, if only because strings are commonly used
> in the code that has nothing to do with GUI at all and so might not even
> link with wx.
>
> You're free to ignore my recommendations, you're good enough to make your
> code work with whatever string class you use but for newbies who learn to
> use std::string in their C++ tutorials/classes I do strongly recommend that
> they keep using it in their wxWidgets programs too and just avoid all
> wxString-related complications. There is absolutely no reason whatsoever to
> prefer wxString to std::string when writing new code.
>
> Regards,
> VZ

Vadim Zeitlin

unread,
Mar 26, 2012, 7:42:58 AM3/26/12
to wx-...@googlegroups.com
On Mon, 26 Mar 2012 09:55:26 +0700 Dang Le wrote:

DL> We used wxWidgets from version 2.8, so most of the code is using
DL> wxString for all string-related operations.
DL> And recently upgraded to wx 2.9.2.
DL> We have always built wx with Unicode and STL for Windows.
DL> Now I have some concerning questions:
DL> - Is there a performance penalty for using wxString over std::wstring?

This depends on the operations you do. For most of them, there is none (or
too small to measure) but accessing N-th character of the string directly
when using UTF-8 representation internally is O(N) instead of O(1) now, so
this is a definitive penalty.

DL> - According to Vadim's suggestion, we should use std::wstring for new
DL> code, but all wx classes and functions use wxString as their parameters
DL> or returned value. Do we have to convert from std::wstring to wxString
DL> before and after calling to those wx functions/methods?

The conversion from std::[w]string to wxString is implicit in 2.9, so you
can always just pass std::wstring to any wx function. Moreover, in the
build using wchar_t for wxString storage internally (this is the default
under Windows) the conversion will be relatively efficient as it just
involves copying this std::wstring, without converting it.

If you build in STL mode, then conversion to std::wstring is implicit as
well, i.e. you can just assign the return value of any wx function to it.
(but in non-STL build you have to use wxString::ToStdWstring() explicitly).

Regards,
VZ

Andy Robinson

unread,
Mar 26, 2012, 8:37:01 AM3/26/12
to wx-...@googlegroups.com
On 26/03/12 12:42, Vadim Zeitlin wrote:
> On Mon, 26 Mar 2012 09:55:26 +0700 Dang Le wrote:
>
> DL> We used wxWidgets from version 2.8, so most of the code is using
> DL> wxString for all string-related operations.
> DL> And recently upgraded to wx 2.9.2.
> DL> We have always built wx with Unicode and STL for Windows.
> DL> Now I have some concerning questions:
> DL> - Is there a performance penalty for using wxString over std::wstring?
>
> This depends on the operations you do. For most of them, there is none (or
> too small to measure) but accessing N-th character of the string directly
> when using UTF-8 representation internally is O(N) instead of O(1) now, so
> this is a definitive penalty.

It seems to me that most people should avoid this problem by using
--enable-unicode --disable-utf8

Which seems simple and harmless to me. I'm not bothered that my strings
use a bit more memory, as I don't have any huge (multi megabyte) strings.

Regards,
Andy Robinson

Václav Slavík

unread,
Mar 26, 2012, 9:02:46 AM3/26/12
to wx-...@googlegroups.com
Hi,

On 26 Mar 2012, at 14:37, Andy Robinson wrote:
> It seems to me that most people should avoid this problem by using
> --enable-unicode --disable-utf8
>
> Which seems simple and harmless to me.

It's not. The reason for using UTF-8 build on some platforms is not memory efficiency, but the fact that the native toolkit uses it. So if you configure wxGTK with --disable-utf8, then every call to GTK+ (and any change in the UI state involves some) needs to allocate a new string and convert wxString data to UTF-8. Moreover, you don't always control wx build settings (think Linux distributions that ship wxGTK), so it's better to update performance-sensitive parts of the code to use iterators.

Vaclav

Dang Le

unread,
Mar 26, 2012, 11:12:50 AM3/26/12
to wx-...@googlegroups.com
Thank you very much Vadim!
I'm quite relieved now.
However it always requires at least a mem copy for the conversion.
So do you think we should gradually change all non-GUI functions/methods
to using std::string directly instead of wxString?

Regards,
Le Hong Dang.

Vadim Zeitlin

unread,
Mar 26, 2012, 11:16:48 AM3/26/12
to wx-...@googlegroups.com
On Mon, 26 Mar 2012 22:12:50 +0700 Dang Le wrote:

DL> However it always requires at least a mem copy for the conversion.

Yes, but this would be the case with wxString without ref counting too in
many cases (and ref counting has its own problems). The C++11 solution to
this would be moving instead of copying but this won't cover all cases, of
course.

DL> So do you think we should gradually change all non-GUI functions/methods
DL> to using std::string directly instead of wxString?

Do you mean in your code or in wx? In your code I'd definitely avoid
wxString unless necessary. In wx this won't be possible, unfortunately.

Regards,
VZ

Dang Le

unread,
Mar 26, 2012, 12:22:13 PM3/26/12
to wx-...@googlegroups.com
On 3/26/2012 10:16 PM, Vadim Zeitlin wrote:
> Do you mean in your code or in wx? In your code I'd definitely avoid
> wxString unless necessary. In wx this won't be possible, unfortunately.
>
I meant in wx.
So it won't be possible and we have to live with this.
Thank you for clarifying this.

Regards,
Le Hong Dang.

MortenMacFly

unread,
Apr 22, 2012, 8:53:31 AM4/22/12
to wx-...@googlegroups.com
Dear all,

I am new here but using wxWidgets extensively since v2.6. As a user I was a bit surprised to see this discussion on such a core component as wxString is. In fact, most of my code's libraries only depend on STL. But my GUI code and (!) code that is easier to maintain using wxWidgets/wxString (but not necessarily is UI code) uses wxString. I also ported some code classes to use wxString instead of std::string because it made things easier.

So from my (and only my) user point-of-view, whatever you do: I can live well with the advise that new code should better depend on standard libraries (this discussion pushes me further in that direction) but I strongly also support what JS said that please keep wxString so I must not change all my code one day. If it makes maintenance easier, deprecate some ugly hacks step by step, but as long as wxString provides so many functions that remove the need of own implementations of another string abstraction layer please also consider that users do use it in non-GUI code if/because it makes live easier. I don't believe there is or should be anything wrong with that (even after carefully reading all arguments of the parties here).
Reply all
Reply to author
Forward
0 new messages