Memory Allocation in a Multi-Threaded Environment

Jeff

unread,

Jun 8, 2008, 7:23:22 AM6/8/08

to

Hi - not sure if this is the best forum for this question?

I'm using vc++ 2005 and I've just started playing around with
multithreading.
I quickly realised that multi-threading opens up a whole new slew of bugs
for the novice, but there is one area that is really confusing me.
I have a small class that uses an std::vector array. I create a new worker
process with afxbeginthread - this process will expand/contract the vector
(and do some other stuff) and then return. The problem is, I seem to get
heap corruption errors when I try to access the memory allocated in the
worker thread after that thread has finised. Is this to be expected or
should I be looking elswhere for the bug?

Any general comments on memory allocation in a multi-threaded environment?

Thanks
Jeff

David Lowndes

unread,

Jun 8, 2008, 8:43:51 AM6/8/08

to

>I quickly realised that multi-threading opens up a whole new slew of bugs
>for the novice, but there is one area that is really confusing me.
>I have a small class that uses an std::vector array. I create a new worker
>process with afxbeginthread - this process will expand/contract the vector
>(and do some other stuff) and then return. The problem is, I seem to get
>heap corruption errors when I try to access the memory allocated in the
>worker thread after that thread has finised. Is this to be expected or
>should I be looking elswhere for the bug?

Jeff,

It's not expected, but without more details it's nigh on impossible to
know where your error lies. Try the same code in a single threaded
test and see if you suffer the same issue.

>Any general comments on memory allocation in a multi-threaded environment?

Memory allocation as such isn't usually an issue - the issues are all
to do with multiple threads accessing the same data at the same time.

Dave

Jeff

unread,

Jun 8, 2008, 1:20:16 PM6/8/08

to

"David Lowndes" <Dav...@example.invalid> wrote in message
news:dnkn44l5vhk751e51...@4ax.com...

> >I quickly realised that multi-threading opens up a whole new slew of bugs
>>for the novice, but there is one area that is really confusing me.
>>I have a small class that uses an std::vector array. I create a new worker
>>process with afxbeginthread - this process will expand/contract the vector
>>(and do some other stuff) and then return. The problem is, I seem to get
>>heap corruption errors when I try to access the memory allocated in the
>>worker thread after that thread has finised. Is this to be expected or
>>should I be looking elswhere for the bug?
>
> Jeff,
>
> It's not expected, but without more details it's nigh on impossible to
> know where your error lies. Try the same code in a single threaded
> test and see if you suffer the same issue.
>

Thanks a lot for that, I guess the issue must be somewhere else then.

The particular example concerns reading and processing data from a URL file.
A worker thread is created to read the entire file into a buffer whilst the
main thread begins processing the data as it becomes available. The program
was working fine on my local host where the file was "downloaded" almost
before the processing had begun. When I try it on a remote file however the
heap corruption problem arises - I suspect because the processing is now
catching up with the download. I'm using a local data member to synchronise
reading and resizing of the buffer. In particular the buffer can only be
resized if it isn't being read and can only be read if it isn't being
resized - since a resize could move the buffer. I assume that the buffer can
be read and written to simultaneously.

If I do the download before (and in the same thread as) the processing then
I have no problem even with a remote Url.

Does anyone have a pointer to a simple program that does this type of
simultaneous buffer and process operation?

Thanks again
Jeff

David Lowndes

unread,

Jun 8, 2008, 3:17:01 PM6/8/08

to

>Thanks a lot for that, I guess the issue must be somewhere else then.

Hold on...

>The particular example concerns reading and processing data from a URL file.
>A worker thread is created to read the entire file into a buffer whilst the
>main thread begins processing the data as it becomes available.

So you do have a scenario where 2 threads are trying to access the
same data.

>The program
>was working fine on my local host where the file was "downloaded" almost
>before the processing had begun. When I try it on a remote file however the
>heap corruption problem arises - I suspect because the processing is now
>catching up with the download.

Sounds like you either have some problem with the algorithm that
wasn't being exercised before, or you have a synchronisation issue.
Either way, without a lot more clues it's difficult to know where the
problem is.

>I'm using a local data member to synchronise
>reading and resizing of the buffer.

How?

>In particular the buffer can only be
>resized if it isn't being read and can only be read if it isn't being
>resized - since a resize could move the buffer. I assume that the buffer can
>be read and written to simultaneously.

How does the reader know what's been written? There's got to be some
thread synchronisation there as well surely?

Dave

Joseph M. Newcomer

unread,

Jun 8, 2008, 3:22:30 PM6/8/08

to

See below...

On Sun, 8 Jun 2008 18:20:16 +0100, "Jeff" <som...@somewhere.com> wrote:

>
>"David Lowndes" <Dav...@example.invalid> wrote in message
>news:dnkn44l5vhk751e51...@4ax.com...
>> >I quickly realised that multi-threading opens up a whole new slew of bugs
>>>for the novice, but there is one area that is really confusing me.
>>>I have a small class that uses an std::vector array. I create a new worker
>>>process with afxbeginthread - this process will expand/contract the vector
>>>(and do some other stuff) and then return. The problem is, I seem to get
>>>heap corruption errors when I try to access the memory allocated in the
>>>worker thread after that thread has finised. Is this to be expected or
>>>should I be looking elswhere for the bug?
>>
>> Jeff,
>>
>> It's not expected, but without more details it's nigh on impossible to
>> know where your error lies. Try the same code in a single threaded
>> test and see if you suffer the same issue.

****
In general, I believe if you are sharing the same data between two threads, the design is
already hopeless and needs to be redone. See, for example, my essay "the best
synchronization is no synchronization", the point being that you should not create designs
where synchronization is required for correct behavior.

It would not at all be surprising to see heap corruption errors if you are sharing a
std::vector between two threads. It has nothing to do with "storage allocation" as such,
but a LOT to do with the fact that none of the STL containers are "thread-safe" and
therefore should not be used by one thread unless the entire container is locked against
access by all other threads. This is usually a design error, because it can mean lengthy
locks, and therefore is usually a Very Bad Idea.

For example, imagine you are iterating over a std::vector of n elements. While you are
doing the iteration, and modifying the elements, some other thread does a push_back. Alas,
the first thread, which cleverly has a *pointer* to the array position, will find itself
writing to unallocated memory, because the push_back could have freed the old vector
contents after copying to the new vector.

This means you must never allow ANY access to the vector that can change it while another
thread is using it.
*****

>>
>
>Thanks a lot for that, I guess the issue must be somewhere else then.
>
>The particular example concerns reading and processing data from a URL file.
>A worker thread is created to read the entire file into a buffer whilst the
>main thread begins processing the data as it becomes available. The program
>was working fine on my local host where the file was "downloaded" almost
>before the processing had begun. When I try it on a remote file however the
>heap corruption problem arises - I suspect because the processing is now
>catching up with the download. I'm using a local data member to synchronise
>reading and resizing of the buffer. In particular the buffer can only be
>resized if it isn't being read and can only be read if it isn't being
>resized - since a resize could move the buffer. I assume that the buffer can
>be read and written to simultaneously.

*****
This sounds like a really bad design. If you want to handle partial downloads, this is
about the worst possible way to do it. What you should do is have a thread that is taking
the partial download and filling a buffer; when the buffer is filled, you
PostMessage/PostThreadMessage/PostQUeuedCompletionStatus this buffer to the thread that is
doing the analysis. It performs the analysis as far as it can, then, if it has run out of
things to do, waits for more information to come down (and this is not as simple as it
sounds). When more information arrives, you resume processing using the new information.

I would never consider any possibility of doing this by using a single shared buffer
between the threads....it is far too difficult to get right (and I've been doing
multithreaded programming since 1968). Your assumption that the buffer can be read and
written simultaneously is very suspect. I would work with partial buffers which were
always treated as being independent; and I would use a model called "distributed finite
state machine" or "distributed context free grammar" to handle the parsing. It looks
harder, but it ultimately is not, and is more robust. Key here is to not confuse state
with stack depth, that is, state is contained in variables, and is not part of the call
stack at all. So at any time, you can return to the main data-fetching loop and all the
state is there. If you've run out of data, you just block waiting for more data to be
delivered, and you can resume the parse at any time because everything you need is stored
as part of the state of your parse.

If you have to wait to resize until the processing is done, or delay the processing until
the resize is done, you have given away most of the advantage of multiple threading,
because you have forced the threads to run in "lock-step", one or the other but not both.
I consider this bad design. Create your threads to run completely independently and
asynchrononously, with no interlocks between them because none are required by such a
design.
joe
*****

>
>If I do the download before (and in the same thread as) the processing then
>I have no problem even with a remote Url.

****
This says that your fundamental design has flaws, and you need to redesign.
****

>
>Does anyone have a pointer to a simple program that does this type of
>simultaneous buffer and process operation?

*****
It's called the "producer-consumer" model and is one of the oldest and simplest of
designs. There is a discussion of doing this on my Web site; see my essay on semaphores.
But my own preference is to, whenever possible, avoid events, mutexes, and semaphores,
ESPECIALLY mutexes and semaphores, and use the bulit-in primitives. See my essays on
worker threads, on UI threads, and the use of I/O Completion Ports for interthread
queuing, all on my MVP Tips site.
joe
*****
>
>Thanks again
>Jeff
>
Joseph M. Newcomer [MVP]
email: newc...@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Jeff

unread,

Jun 9, 2008, 11:00:46 AM6/9/08

to

"David Lowndes" <Dav...@example.invalid> wrote in message

news:qmbo44pif0bdgdfjo...@4ax.com...

> >Thanks a lot for that, I guess the issue must be somewhere else then.
>
> Hold on...
>

...

>
> How does the reader know what's been written? There's got to be some
> thread synchronisation there as well surely?
>

Thanks a lot for your help - I do now have a working system which seems to
be perfectly stable - both on a local fileserver and a very slow remote one.

Let me summarise what I'm doing.
I want to stream in a remote file and begin processing the bytes as soon as
possible. To do this I have two threads, one is the processor thread (the
consumer if you like) and the other is a streaming thread (the producer if
you like). These two threads both have access to an expandable buffer which
is based on an std::vector object - only the streaming thread is able to
expand this buffer however. I use a member variable as a token to ensure
that the buffer is never expanded (and potentially moved) whilst it is being
read and to ensure that it is never read whilst it is being expanded. I also
ensure that I never use any pointer copies from the stack when using the
buffer since these could go stale.

The streaming thread only needs to expand the buffer very occasionally
(since it is usually waiting for bytes from the Url) and the processer
thread only needs to read the buffer very occasionally (since it takes a
quck memcpy of a short section and then spends a lot of time processing
this). This means that there are very few if any occasions when a contention
(or lock-out) occurs - and when such a contention does occur it is well
handled.

It turned out that I wasn't being as careful about locking as I thought I
was being.

Thanks again for your help.

Jeff

David Lowndes

unread,

Jun 9, 2008, 11:08:11 AM6/9/08

to

>It turned out that I wasn't being as careful about locking as I thought I
>was being.

Therein lies the difficulty of such designs. As others have mentioned,
it's often better to design it such that you don't have any threading
issues to content with (or reduce them to a bare minimum at least).

Dave

Jeff

unread,

Jun 9, 2008, 11:19:09 AM6/9/08

to

"Joseph M. Newcomer" <newc...@flounder.com> wrote in message
news:3fbo44pe1d25rvuot...@4ax.com...

Thanks for your response - it's really appreciated.

As I've already detailed in another response, I do now have a working system
which appears to be both fast and stable. I am however very interested in
your comments especially given your experience in this area - I have just
about enough experience now to know that these problems can be hard.

In my specific case I want to have a streaming thread whose sole purpose is
to download a file as quickly as possible - ie with as few interuptions as
possible. I then have another thread which wants to process this data as
soon as it is available - ie as it is being downloaded. In some cases (eg a
local file system) the download will probably finish before the first bytes
have been processed. In other cases (eg a slow remote file) the processor
will spend a lot of time waiting for bytes to become available.

One thing that confuses me in your response is your comment that sharing the
same data between threads is a bad idea. In the above scenario the two
threads have to share the same data - that's the whole point. Could you
describe a solution to this problem which would not involve the two threads
accessing the same memory location at some point?

Since this specific problem must be encountered fairly often, I imagine
there is already a well optimised stable solution. I'd be really grateful if
someone could point me at it.

Thanks again.

Jeff

WhiteFang

unread,

Jun 11, 2008, 12:01:01 AM6/11/08

to

May I ask if you have any strings in those vectors, if so where does the
string value come from?

Whether or not this is your problem, you may want to be aware of a potential
problem with strings in a multithreaded environment - definitely with STL
strings, and I suspect with CStrings, though I am not sufficiently familiar
with the implementation of CString to know for sure whether the issue exists
with them.

I'll illustrate a problem situation by use of globally declared string that
is read (but not written) by multiple threads:

For example, say there is a global string such as:

std::string g_strBlurb;

Suppose that multiple threads do something like:

void CThingamajig::vDoStuff()
{
std::string strZing = g_strBlurb; // CAUSES MEMORY CORRUPTION!
.
.
}

The above code can cause memory corruption (even if vDoStuff() does nothing
except return after the string assignment statement) .

How does this cause memory corruption? After all, the above code doesn't
modify the shared string (g_strBlurb).

Or does it?

In order to be memory efficient, the above assignment statement doesn't
allocate memory for, and copy the text of the string (g_strBlurb).

Instead, a reference counter is incremented (it is located in memory just
before the string text)

Think of the string assignment code as doing something like this:

1) Read string reference count from memory into Register A
2) Add 1 to Register A
3) Write Register A to the string reference count in memory
.
.

Suppose that the string has a reference count of 1 (say it's been assigned
to at program startup, and not referenced since).

Now suppose thread 1 performs step 1, reading a ref count of 1 into Register
A. Supose thread 1 is then suspended, and thread 2 begins executing. Thread 2
then performs steps 1 thru 3, storing a ref count of 2 into memory.

Suppose thread 2 is then suspended before returning from vDoStuff(), and
thread 1 continues.

Thread 1 continues at step 2 above, also storing a ref count of 2 into
memory for the string (the correct value is 3, but thread one was interrupted
during the ref count increment process).

Eventually, vDoStuff() is exited in each thread, causing the destructor for
strZing to decrement the string ref count, causing it to become zero (in
whichever thread last exits vDoStuff()). When the ref count becomes zero, the
string self destructs (and the allocated memory is deallocated).

Subsequent access of g_strBlurb will access memory that has been returned to
the heap.

One way to get around this (other than using access control entities, such
as critical sections) is to create a function such as this:

inline std:string strThreadSafeStrCopy(const string& str)
{
std:string _str = str.c_str();
return _str;
}

Then, you can use a statement such as:

std:string strZing = strThreadSafeStrCopy( g_strBlurb );

By making it a function (rather than just using directly invoking the
c_str() method everyplace you want a thread safe copy):

* it is very easy to locate every line of code where a thread safe string
copy has been coded for
* the implementation of thread safety (such as using the c_str() method)
could be changed
* need/non-need for thread safe copy might be detected at compile time, so
that either efficient or thread safe code could be generated appropriately
* it's more obvious to other programmers why a string copy is being done in
a particular manner

Doug Harrison [MVP]

unread,

Jun 11, 2008, 2:02:29 AM6/11/08

to

On Tue, 10 Jun 2008 21:01:01 -0700, WhiteFang
<Whit...@discussions.microsoft.com> wrote:

>May I ask if you have any strings in those vectors, if so where does the
>string value come from?

I don't get the question.

>Whether or not this is your problem, you may want to be aware of a potential
>problem with strings in a multithreaded environment - definitely with STL
>strings, and I suspect with CStrings, though I am not sufficiently familiar
>with the implementation of CString to know for sure whether the issue exists
>with them.
>
>I'll illustrate a problem situation by use of globally declared string that
>is read (but not written) by multiple threads:
>
>For example, say there is a global string such as:
>
>std::string g_strBlurb;
>
>Suppose that multiple threads do something like:
>
>void CThingamajig::vDoStuff()
>{
> std::string strZing = g_strBlurb; // CAUSES MEMORY CORRUPTION!
> .
> .
>}
>
>The above code can cause memory corruption (even if vDoStuff() does nothing
>except return after the string assignment statement) .
>
>How does this cause memory corruption? After all, the above code doesn't
>modify the shared string (g_strBlurb).
>
>Or does it?
>
>In order to be memory efficient, the above assignment statement doesn't
>allocate memory for, and copy the text of the string (g_strBlurb).
>
>Instead, a reference counter is incremented (it is located in memory just
>before the string text)

The last version of VC to use reference-counting for std::string was VC6,
and it ceased to be a problem in VC.NET 2002.

What you've shown should not be a problem for CString, which uses
InterlockedXXX to manage reference counts. However, read this whole thread
for a discussion on problems with CString's copy-on-write method:

http://groups.google.com/group/comp.os.ms-windows.programmer.tools.mfc/browse_frm/thread/7d24ab155565bb6d/fe87c85bf1590ebe?#fe87c85bf1590ebe

I think the memory leak I posited is still possible, though it appears they
fixed the problem I described with MFC 4.1's CopyBeforeWrite function in
its MFC9 counterpart, Fork, and I would bet much earlier than that.

Again, reference-counting std::string has been recognized for a long time
as a losing idea due to the difficulty in doing it in a thread-safe way,
and the fact that the C++ Standard requires string::reference to be a real
reference, not a proxy class, which makes it impossible to use real
copy-on-write. The standard committee tried to please everybody, and they
ended up with a badly flawed specification, which besides failing to
describe a genuine copy-on-write string class, imposed lots of weird little
rules that were necessary to make it almost sorta kinda work. You don't
need to do the sorts of things represented in your strThreadSafeStrCopy in
VC.NET 2002 and later, which abandoned reference counting for a "short
string" optimization that doesn't have these problems. As for VC6, see
"Fixes to <xstring>" here:

http://www.dinkumware.com/vc_fixes.html

As for MFC's CString, it avoids the atomicity problem you described by
using Interlocked(Increment|Decrement). However, because MFC9 continues to
do things like check the refcount and make copy-on-write decisions without
synchronizing the whole operation from checking to forking, it appears it
remains susceptible to the memory leak I described in the thread I linked
to above. See my last message in that thread for a couple of suggestions
from an MS guy that should still be valid for dealing with that.

--
Doug Harrison
Visual C++ MVP