Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

std::string crash on multi cpu machine

1 view
Skip to first unread message

Paul Dubuc

unread,
Dec 3, 2002, 6:12:22 PM12/3/02
to
Gil Sudai wrote:
>
> Using gcc 2.95.1 on linux, on a dual cpu machine.
> I have a simple test code that spawn several threads, and in each
> thread I do std::string simple manipulations (like calling
> operator+=).
> Each thread have it's own std::string variable.
> This simple code crash.
> After digging inside the STL code I came to the conclusion that the
> cause of the problem is the static variable 'nilRep' and
> basic_string::Rep::grab() when operating on this 'nilRep'.
>

That's right. The std::string implemetation is refrence counted COW
(copy on write). This is a big problem in MT applications on MP
hardware.
See http://www.sgi.com/tech/stl/string_discussion.html.

> Is this a known limitation?

Apparently so. Though some of us have found out the hard way.

> Can't I write multi threaded apps using std::string in GCC 2.95.1?

We did it (with 2.95.3) by ripping out the reference counting and COW
code in bastring.h and bastring.cc and rebuilding the library. We also
put

#define __USE_MALLOC

in stl_config.h (before rebuilding) to get rid of the caching memory
allocator and use the simple malloc allocator instead. There have been
repeated reports of memory leaks in MT MP environments with that
allocator. These changes probably hurt performance, but avoiding memory
leaks and core dumps was more important for us.

> Does this problem solved in newer versions?

Supposedly so. Beginning with the 3.0 version, the C++ library has been
completely rewritten and I've read claims from the GCC developers that
the string implementation "should be thread safe". But the 3.2.1
(latest) implemetation of std::string is still reference counted and has

// XXX MT

comments in the source (see basic_string.h and basic_string.tcc). I'd
do some extensive testing before migrating to it. You may also want to
consider STLport (http://www.stlport.org). I think they have a thread
safe string implementation.

--
Paul M. Dubuc

Gil Sudai

unread,
Dec 4, 2002, 3:50:08 AM12/4/02
to
Paul Dubuc <pdu...@cas.org> wrote in message news:<3DED3A56...@cas.org>...

Thanks for the comprehensive reply. I will check gcc 3.2.1.
Did you completely removed the reference counting feature of string?
Why not simply checking 'this' against nilRep when changing the reference count?
Can you please elaborate some more about the allocator problem?

Thanks again
Gil

Paul Dubuc

unread,
Dec 4, 2002, 12:50:57 PM12/4/02
to Gil Sudai
Gil Sudai wrote:
>
> Thanks for the comprehensive reply. I will check gcc 3.2.1.
> Did you completely removed the reference counting feature of string?
> Why not simply checking 'this' against nilRep when changing the reference count?

Yes, we removed it so that each string makes it own copy of the data
rather than trying to share it. (I think it's better in C++ to pass
strings around by reference to avoid unessary copying rather than rely
on a string implementation to do it for you.) If I remember right the
string's 'this' isn't the same as nilRep. Empty strings contain a
shared, static nilRep as the representation of the string data. When
the string is not nil, nilRep is replaced by something else (allocated
with new). nilRep is initialized with a ref count of 1, so
(theoretically) it should never go to 0. But that's what we saw
happening, and the string was trying to free the nilRep when that
happened. The problem is in the reference counting itself. The
increment and decrement of the ref count is not protected by a lock
(which would probably add more overhead than it's worth). If the
problem happens with the nilRep, it can happen with an allocated Rep as
well. It probably shows up on nilRep because it's shared (if only
briefly) by all nil strings in your program so its ref count is modified
most often.

> Can you please elaborate some more about the allocator problem?

I don't know much more about it than what I've said already. Someone
posted a test program a while back that demonstrated the the memory leak
problem in a MT program running on a MP system. I think I was able to
verify that the problem existed on Solaris using that program and that
it went away when __USE_MALLOC was put in.

This link might help:

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=9qe5om%24nu64v%241%40ID-63733.news.dfncis.de&rnum=12&prev=/groups%3Fq%3D__USE_MALLOC%26start%3D10%26hl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3D9qe5om%2524nu64v%25241%2540ID-63733.news.dfncis.de%26rnum%3D12

If not, just go to http://groups.google.com and search for __USE_MALLOC.

--
Paul M. Dubuc

0 new messages