Simple wxString question

Ryan Norton

unread,

Jun 23, 2004, 2:31:55 PM6/23/04

to wx-...@lists.wxwindows.org

Hi,

This has been bugging me for a while -
is there a reason why wxString uses strchr over memchr?
Since it has the length of the string, there shouldn't be...

The MS STL uses memchr, etc...

Thanks,
Ryan

Ryan Norton

unread,

Jun 23, 2004, 2:43:35 PM6/23/04

to wx-...@lists.wxwidgets.org

> RN> This has been bugging me for a while -
> RN> is there a reason why wxString uses strchr over memchr?
> RN> Since it has the length of the string, there shouldn't be...
>
> No, there is not. Looks like a bug to me...
Well, I don't know if I'd call it a "bug"...

anyway I'll make a patch.

Ryan

Vadim Zeitlin

unread,

Jun 23, 2004, 2:30:39 PM6/23/04

to wx-...@lists.wxwidgets.org

On Wed, 23 Jun 2004 11:31:55 -0700 Ryan Norton <wxpro...@comcast.net> wrote:

RN> This has been bugging me for a while -
RN> is there a reason why wxString uses strchr over memchr?
RN> Since it has the length of the string, there shouldn't be...

No, there is not. Looks like a bug to me...

VZ

Ryan Norton

unread,

Jun 24, 2004, 11:25:48 PM6/24/04

to wx-...@lists.wxwidgets.org

Hi,

OK - I took a look at the docs for open watcom and it doesn't
appear to have a wide character variant of memchr (shouldn't
surprise me I guess...) - so for it and all others except gcc and msvc
I'll do something like this -

const wxChar* wxMemchr(const wxChar* s, const wxChar& c, const int& l)
{ return wxStrchr(s,c); }

Compilers worth their salt should inline it so no performance loss
there.
In ANSI mode I'll use normal memchr - even open watcom has that -
interestingly enough wmemchr is in the POSIX standard, but
not in tchar.h.

Ryan

Olly Betts

unread,

Jun 24, 2004, 11:30:32 PM6/24/04

to wx-...@lists.wxwidgets.org

On Thu, Jun 24, 2004 at 08:25:48PM -0700, Ryan Norton wrote:
> OK - I took a look at the docs for open watcom and it doesn't
> appear to have a wide character variant of memchr (shouldn't
> surprise me I guess...) - so for it and all others except gcc and msvc
> I'll do something like this -
>
> const wxChar* wxMemchr(const wxChar* s, const wxChar& c, const int& l)
> { return wxStrchr(s,c); }

That's not the same though - if character c isn't found, memchr will
stop after l characters, while strchr will stop when it reaches a zero.
If there's a zero in less than l characters, this hack may miss a match.
Or if there is no zero within l character, it may find a false match
after more than l characters. Or if there's no zero in memory until you
get to an unmapped page, this will seg fault (or GPF or whatever it's
called on your OS).

I think you'll have to bite the bullet and roll your own wmemchr...

Cheers,
Olly

Ryan Norton

unread,

Jun 25, 2004, 12:29:22 AM6/25/04

to wx-...@lists.wxwidgets.org

Hi Olly,

> That's not the same though - if character c isn't found, memchr will
> stop after l characters, while strchr will stop when it reaches a
zero.
> If there's a zero in less than l characters, this hack may miss a
match.
> Or if there is no zero within l character, it may find a false match
> after more than l characters. Or if there's no zero in memory until
you
> get to an unmapped page, this will seg fault (or GPF or whatever
it's
> called on your OS).

Well, that's a good point - I mainly meant it for internal use -
it which case I'll always be using the length of the string...

One idea is use an ifdef in string.cpp and
roll my own for wxchar.h (as you'll see below it's
a performance. However, after some tests with
my own on gcc, it may just be better to use that
instead of the native wcschr...

> I think you'll have to bite the bullet and roll your own wmemchr...

const wchar_t* mymemchr(const wchar_t* s, wchar_t c, size_t l)
{
for(;l-- && *s != c;++s) {}

if(l!=-1)
return s;
return NULL;
}

It's about 2.5 times the speed of the native wcschr on MSVC...
However, on Dev-Cpp with all optimizations my version is a
bit more than 10% faster (try the app below)...

#include <stdio.h>
#include <time.h>
#include <wchar.h>
#include <tchar.h>

#define memchr mymemchr
int main(int argc, char* argv[])
{
//Put something for STRING (I did 1000 characters)
#define STRING L
#define CHAR 't'
argc = (int&)argv[0] = 0;
int i;
time_t t;
#define NUM 500000
const wchar_t* szFound = NULL;
int STRINGLENGTH = wcslen(STRING);
printf("S:%i\n", STRINGLENGTH);
t = clock();
for(i = 0; i < NUM ; ++i)
szFound = wcschr(STRING, CHAR);
printf("\n%c Pos:%i\nSTRCHR_TIME:%i\n\n", CHAR, szFound - STRING,
clock() - t);

t = clock();
for(i = 0; i < NUM ; ++i)
szFound = (wchar_t*) memchr(STRING, CHAR, STRINGLENGTH);
printf("\n%c Pos:%i\nMEMCHR_TIME:%i\n\n", CHAR, szFound - STRING,
clock() - t);

return 0;
}

> Cheers,
> Olly
Thanks,
Ryan

ABX

unread,

Jun 25, 2004, 2:24:47 AM6/25/04

to wx-...@lists.wxwidgets.org

"Ryan Norton" <wxpro...@comcast.net>:

> I took a look at the docs for open watcom and it doesn't
> appear to have a wide character variant of memchr

Would there work call to memchar with number of chars multiplied by
sizeof(widechar)/sizeof(char) ?

ABX

Ryan Norton

unread,

Jun 25, 2004, 4:00:57 AM6/25/04

to wx-...@lists.wxwidgets.org, a...@abx.art.pl

Hi ABX,

Unfortunentaly not. Normal memchr goes byte-by-byte -
and in practically all cases wchar_t is going to more
than one byte.

So, in other words as long as we don't use values above 255
we could use the normal memchr (ENDIAN-NESS may come
into play too). Of course we do use values above 255 for
non-english languages...

Ryan

Olly Betts

unread,

Jun 25, 2004, 7:55:59 AM6/25/04

to wx-...@lists.wxwidgets.org

On Thu, Jun 24, 2004 at 09:29:22PM -0700, Ryan Norton wrote:
> Well, that's a good point - I mainly meant it for internal use -
> it which case I'll always be using the length of the string...

So you're relying on there always being a zero after the string
contents because that's how wxString stores the contents?

That's still not safe - there may be an embedded zero character in the
string, with the first occurence of the character sought after this.
Or perhaps a simpler example, consider *looking* for a zero character.

Cheers,
Olly

Ryan Norton

unread,

Jun 26, 2004, 12:20:44 PM6/26/04

to wx-...@lists.wxwidgets.org

Hi,

> > Well, that's a good point - I mainly meant it for internal use -
> > it which case I'll always be using the length of the string...
>
> So you're relying on there always being a zero after the string
> contents because that's how wxString stores the contents?

Well, I hope so - I could be wrong though :).

>
> That's still not safe - there may be an embedded zero character in
the
> string, with the first occurence of the character sought after this.

If you run the following in microsoft visual c++ -

#include <string>
#include <wx/string.h>

int main(int argc, char* argv[])
{

wxString wxs = "12345";
std::string sts = "12345";

printf ("WX LOC:%i\n", wxs.find('3'));
printf ("STL LOC:%i\n", sts.find('3'));

wxs[(wxString::size_type)1] = '\0';
sts[1] = '\0';

printf ("WX LOC2:%i\n", wxs.find('3'));
printf ("STL LOC2:%i\n", sts.find('3'));

return 0;
}

You'll get -
WX LOC:2
STL LOC:2
WX LOC2:-101
STL LOC2:2

That's because the microsoft STL _does_ use memchr variants,
and does _NOT_ check for spurious null characters (mainly
because MS's memchr variant is an extreamly optimized assembly
version and puts most other implementations to shame).

The SGI STL does however - in fact it doesn't use any functions at
all -
it just goes commando and does a simple for loop looking for the null
char.

> Or perhaps a simpler example, consider *looking* for a zero
character.

? That would defeat the whole purpose of memchr, right?

Thanks for the comments!
Ryan

Olly Betts

unread,

Jun 26, 2004, 12:57:46 PM6/26/04

to wx-...@lists.wxwidgets.org

MSVC is correct - a nul character doesn't terminate a C++ string. One
of their advantages is they can contain any character, including nul.
So the correct answer is 2, even after you insert the nul.

Checking the documentation, wxStrings are meant to behave this way to
(I'd be very suprised if they didn't frankly), so this is a bug in wx:

wxString is a class which represents a character string of arbitrary
length (limited by MAX_INT which is usually 2147483647 on 32 bit
machines) and containing arbitrary characters. The ASCII NUL
character is allowed, although care should be taken when passing
strings containing it to other functions.

(from http://www.wxwindows.org/manuals/2.4.2/wx456.htm ).

Cheers,
Olly

Olly Betts

unread,

Jun 26, 2004, 3:21:58 PM6/26/04

to wx-...@lists.wxwidgets.org

On Sat, Jun 26, 2004 at 12:05:41PM -0700, Ryan Norton wrote:
> Right! So I guess my question is which behavior do we want? The
> old behaviour, or the STL C++ behaviour?

The STL C++ behaviour, without question. It's what wxString is
currently documented to do, and I believe transition to using STL
strings, etc is a longer term goal, so choosing subtlely different
semantics at this point is extremely counterproductive.

Even if this isn't a goal, the STL is the C++ standard, so you'll catch
C++ programmers out if you make the behaviour different. You'll also
probably suprise users of wxPerl and wxPython - Perl and Python both
allow zero bytes in strings.

> Plus it breaks backwards-compatability...

Not according to the documentation, which is what most users will
have coded to.

Cheers,
Olly

Ryan Norton

unread,

Jun 26, 2004, 3:05:41 PM6/26/04

to wx-...@lists.wxwidgets.org

Hi Olly,

> MSVC is correct - a nul character doesn't terminate a C++ string.
One
> of their advantages is they can contain any character, including
nul.
> So the correct answer is 2, even after you insert the nul.
>
> Checking the documentation, wxStrings are meant to behave this way
to
> (I'd be very suprised if they didn't frankly), so this is a bug in
wx:
>
> wxString is a class which represents a character string of
arbitrary
> length (limited by MAX_INT which is usually 2147483647 on 32 bit
> machines) and containing arbitrary characters. The ASCII NUL
> character is allowed, although care should be taken when passing
> strings containing it to other functions.
>
> (from http://www.wxwindows.org/manuals/2.4.2/wx456.htm ).
>
> Cheers,
> Olly

Right! So I guess my question is which behavior do we want? The
old behaviour, or the STL C++ behaviour?

If we do go the C++ route that means I'll have to get rid of a lot of
strXXX
calls and replace them with the mem equivs - i.e.
wxString::find was wxStrstr -> wxMemchr then wxMemcmp

In order to have the correct behaviour across function calls - the
downside
is that we will have more compiler problems (I'll have to implement
all the wmemXXX functions by hand for compilers other than gcc and
msvc), and all the case-insesitive functions will have to be made by
hand - which might be slow for compilers that have assembly-optimized
versions like msvc. Plus it breaks backwards-compatability...

The upside is of course that it will overall be faster, and actually
be compatible with the STL.

What's the verdict?

Best Regards,
Ryan

Vadim Zeitlin

unread,

Jun 26, 2004, 6:26:30 PM6/26/04

to wx-...@lists.wxwidgets.org

On Sat, 26 Jun 2004 20:21:58 +0100 Olly Betts <ol...@survex.com> wrote:

OB> On Sat, Jun 26, 2004 at 12:05:41PM -0700, Ryan Norton wrote:
OB> > Right! So I guess my question is which behavior do we want? The
OB> > old behaviour, or the STL C++ behaviour?
OB>
OB> The STL C++ behaviour, without question.

Yes. There are quite a few places in the current code where NULs are not
handled correctly but ideally they should be fixed -- this had never been
done in practice just because it's not that easy (thanks for looking into
this, Ryan!) and not many people are affected by these bugs -- but, still
these are bugs.

Thanks,
VZ

Ryan Norton

unread,

Jun 26, 2004, 8:56:09 PM6/26/04

to wx-...@lists.wxwidgets.org

OK - here's what I'm going to do -

I'm going to create a new class called wxCharTraits after the STL
char_traits,
that will contain my version of the memXXX calls
for non-gcc and non-msvc compilers .

There a many reasons for doing this -
1. The user might want to call these routines
in his/her app without going through wxString
(it provides a way to access wmemchr for example).

2. Since none of the memXXX variants are NOT in tchar.h, it
would get messy to put the compiler ifdefs in wxchar.h.

3. Obviously if stl is used wxString will automatically use the STL
char_traits.

In the STL it'd defed as traits_type in string.

For those not familiar -
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vcstdlib/html/vclrf_string_chartraits_members.asp

Anyway, while I'm at it I'll add a couple more funcs for dealing with
regexs.

Finally - how do you want me to handle the source files, i.e.
do we want a seperate header for wxCharTraits (wxchar.h?).
Where should I put the implementation for it (wxchar.cpp?).

I'm shooting for 3.0 on this - this is a big change :).

Regards,
Ryan

> --------------------------------------------------------------------
-
> To unsubscribe, e-mail: wx-dev-un...@lists.wxwidgets.org
> For additional commands, e-mail: wx-de...@lists.wxwidgets.org
>

Vadim Zeitlin

unread,

Jun 27, 2004, 5:39:27 PM6/27/04

to wx-...@lists.wxwidgets.org

On Sat, 26 Jun 2004 17:56:09 -0700 Ryan Norton <wxpro...@comcast.net> wrote:

RN> I'm going to create a new class called wxCharTraits after the STL
RN> char_traits, that will contain my version of the memXXX calls
RN> for non-gcc and non-msvc compilers .

This looks like a good idea, but just to be sure:

RN> There a many reasons for doing this -
RN> 1. The user might want to call these routines
RN> in his/her app without going through wxString
RN> (it provides a way to access wmemchr for example).

Well, this would be the same if you just defined wxWmemchr().

RN> 2. Since none of the memXXX variants are NOT in tchar.h, it
RN> would get messy to put the compiler ifdefs in wxchar.h.

wx/wxchar.h already:

a) includes many other headers
b) is quite messy

RN> 3. Obviously if stl is used wxString will automatically use the STL
RN> char_traits.

So basically this is the only real/good reason to do it. Of course, this
means that it should really have the same methods (and typedefs) as the
standard class and, also, that it shouldn't have any more of them as they
would be "lost" when transiting to STL.

RN> Anyway, while I'm at it I'll add a couple more funcs for dealing with
RN> regexs.

And because of this, adding other functions doesn't seem like a good idea.

RN> Finally - how do you want me to handle the source files, i.e.
RN> do we want a seperate header for wxCharTraits (wxchar.h?).
RN> Where should I put the implementation for it (wxchar.cpp?).

I'd use separate header and .cpp.

RN> I'm shooting for 3.0 on this - this is a big change :).

Why? Normally I don't see anything here which would prevent us from
introducing it in a backwards compatible way.

Thanks,
VZ

Robert Roebling

unread,

Jun 27, 2004, 2:26:09 PM6/27/04

to wx-...@lists.wxwidgets.org

Ryan Norton wrote:

> OK - here's what I'm going to do -
>
> I'm going to create a new class called wxCharTraits after the
> STL char_traits, that will contain my version of the memXXX
> calls for non-gcc and non-msvc compilers .

That seems like a good idea.

> Finally - how do you want me to handle the source files, i.e.
> do we want a seperate header for wxCharTraits (wxchar.h?).
> Where should I put the implementation for it (wxchar.cpp?).

why not a new file chartrt.xxx?

> I'm shooting for 3.0 on this - this is a big change :).

Although it seems like a big change, would it break
backward compatilibity other than by correcting the
current behaviour and making it more STL-like?

Robert

Ryan Norton

unread,

Jun 27, 2004, 6:32:56 PM6/27/04

to wx-...@lists.wxwidgets.org

Hi Robert,

> Although it seems like a big change, would it break
> backward compatilibity other than by correcting the
> current behaviour and making it more STL-like?

None really (people who've stuck null characters in
the middle of the string will have a bad day though).

Ryan

Ryan Norton

unread,

Jun 27, 2004, 6:44:42 PM6/27/04

to wx-...@lists.wxwidgets.org

Hi Vadim,

> RN> 3. Obviously if stl is used wxString will automatically use the
STL
> RN> char_traits.
>
> So basically this is the only real/good reason to do it. Of course,
this
> means that it should really have the same methods (and typedefs) as
the
> standard class and, also, that it shouldn't have any more of them as
they
> would be "lost" when transiting to STL.

I think that would defeat the whole purpose of it if I didn't give it
the
same API.

>
> RN> Anyway, while I'm at it I'll add a couple more funcs for dealing
with
> RN> regexs.
>
> And because of this, adding other functions doesn't seem like a
good idea.

Actually, I meant to wxString itself - but I won't do this anyway...

> I'd use separate header and .cpp.

Done.

> RN> I'm shooting for 3.0 on this - this is a big change :).
>
> Why? Normally I don't see anything here which would prevent us from
> introducing it in a backwards compatible way.

Well, I got some resistance earlier - but you may be right.

Alright - here's my final thoughts -
1. I've been wanting to put in a function for backwards-compatability
people. I was thinking Cutoff which would set the length of the
string
to the first null character it finds...
2. Never occured to me before - any need for an stl-compatable(i.e.
sorted) map?
I have one that I use here when I don't feel like using the stl...

Regards,
Ryan

Olly Betts

unread,

Jun 27, 2004, 7:27:44 PM6/27/04

to wx-...@lists.wxwidgets.org

On Sun, Jun 27, 2004 at 03:44:42PM -0700, Ryan Norton wrote:
> 1. I've been wanting to put in a function for backwards-compatability
> people. I was thinking Cutoff which would set the length of the
> string to the first null character it finds...

You can do that pretty trivially already - wxString(string.c_str()) -
it doesn't seem a likely thing to want to do anyway...

If you wanted to truncate a wxString, rather than poking in a nul and
calling this new function, it would be more efficient to just call
Truncate() with the new length (resize() is the STL near-equivalent).

Note that you can't currently actually truncate a wxString by poking in
a nul anyway - although some methods buggily treat it as the new end of
the string, some definitely won't - length() being perhaps the most
notable.

Cheers,
Olly

Ryan Norton

unread,

Jun 25, 2004, 12:35:57 AM6/25/04

to wx-...@lists.wxwidgets.org

Hi,

> > I think you'll have to bite the bullet and roll your own
wmemchr...
> const wchar_t* mymemchr(const wchar_t* s, wchar_t c, size_t l)
> {
> for(;l-- && *s != c;++s) {}
>
> if(l!=-1)
> return s;
> return NULL;
> }

Whoops -

const wchar_t* mymemchr(const wchar_t* s, wchar_t c, size_t l)
{

for(;l && *s != c;--l, ++s) {}

if(l)
return s;
return NULL;
}

Same speed - I have to admit GCC optimizes for loops better than
MSVC...

Reply all

Reply to author

Forward