Microsoft's attempt to make it safe to copy CStrings between threads
looks extremely doubtful. For example:
CString::CString(const CString& stringSrc)
{
ASSERT(stringSrc.GetData()->nRefs != 0);
if (stringSrc.GetData()->nRefs >= 0)
{ // HAZARD
ASSERT(stringSrc.GetData() != afxDataNil); // DANGER
m_pchData = stringSrc.m_pchData; // HAZARD
InterlockedIncrement(&GetData()->nRefs);
}
else
{
Init();
*this = stringSrc.m_pchData;
}
}
In the HAZARD area this thread is copying a pointer m_pchData which
is also used by another thread. However, since it has yet to increment
the reference count on the structure, the other thread may choose to
delete the thing that m_pchData points at. So by the time
InterlockedIncrement gets called, m_pchData may be invalid.
Similar errors can be found elsewhere in the CString part of the MFC
library.
Perhaps this explains those odd MSDEV and IEXPLORE crashes I get
occasionally...
Anybody got any good workarounds for passing CStrings between threads?
-- Mike --
Mike Bell
Albion Research Ltd.
Search for "cstring synchronization" in Books Online, and read the
article "Multithreading: Programming Tips" (it's the 2nd hit in VC++
4.1). In brief, MS hasn't attempted to make CString thread-safe in
general, and they say so in that article. However, the use of
InterlockedIncrement (and presumably InterlockedDecrement, used
consistently throughout) on the reference count does appear to allow
multiple threads to safely use copies of some CString that exists
before, during, and after the threads' lifetimes ("lifetime" being the
extent of their use of copies of the original CString); the copies
would of course be copy-on-write aliases for the original. Given
coherent updates of the reference count, the continuous existence of
the original object would prevent the reference count from ever
dropping to zero, as the threads create and destroy copies of it. In
fact, the article cited above says, "... you can have two separate
threads manipulating two different CString objects, but not two
threads manipulating the same CString object", and I take that to mean
that CString can successfully uniquify objects in this limited
scenario.
But you're right; there's no protection for thread A making a COW copy
of a CString that is concurrently being deallocated in thread B. You
have to guard against that with explicit synchronization. And unless
you can ensure that the reference count never goes to zero in the
participating threads, as described above, it's probably a good idea
to disallow COW aliasing altogether between threads. That is, you
would synchronize around copying of the actual char data, instead of
copying of CString objects.
--
Doug Harrison
dHar...@worldnet.att.net
Actually, in the simple scenario below, I think you can get away with
copying the CString object, as opposed to the underlying char data
(but see caveat at end):
Thread A:
Somehow communicate a CString pointer to thread B.
Wait for thread B to acknowledge.
Thread B:
Receive A's CString pointer 'p'.
Initialize my CString _object_ from *p.
Signal A.
This avoids making a copy of *p's data; B constructs a COW alias of
*p. That leaves the question, "what happens when copy-on-write
decisions need to be made?" MFC doesn't synchronize access to the
reference count, except when it increments or decrements it. That
means the refcount can change immediately after the COW function (say,
cow()) checks it.
Suppose the refcount is 2, and cow() is called. If the refcount
decreases after cow() checks it, because an alias in another thread is
destroyed, then cow() will unnecessarily uniquify the representation.
This may result in a memory leak, but it should be rare(?).
Now, suppose the refcount increases right after cow() checks it. This
only matters if the refcount is 1, in which case cow() will fail to
copy-on-write, and a new "copy" of the CString will attach to a
representation that's about to change. But this cannot happen if COW
aliasing of CStrings between threads is synchronized, since if the
refcount is 1, then the thread in which cow() is executing in effect
owns the CString, and only it can send it to another thread.
So, does it make sense to guard access to the refcount, say, with a
critical section? The only benefit would seem to be the avoidance of a
potential memory leak under rare circumstances. It would not relax the
synchronization requirement when passing CStrings between threads.
Checking the refcount is a very frequent operation, and in a class as
heavily used as CString, I might argue that it's reasonable to trade a
rare memory leak for faster execution, especially since the memory
leak is avoidable if you disallow inter-thread COW aliasing of CString
objects and copy only char data when passing between threads.
Of course, the above assumes CString does everything properly.
Consider MFC 4.1:
void CString::CopyBeforeWrite()
{
if (GetData()->nRefs > 1)
{
CStringData* pData = GetData();
Release();
AllocBuffer(pData->nDataLength);
memcpy(m_pchData, pData->data(),
(pData->nDataLength+1)*sizeof(TCHAR));
}
ASSERT(GetData()->nRefs <= 1);
}
The function Release() calls InterlockedDecrement, freeing the
representation returned by GetData() when the refcount goes to zero.
Suppose you have two CString objects, 'a' in thread A and 'b' in
thread B, both aliased to a single COW representation. Thread A does
something on 'a' causing COW to occur, while in thread B, 'b' is
destroyed just before Release() is called in A. When A calls
Release(), the refcount goes to zero, the representation is deleted,
and pData becomes invalid! This should be corrected so that the
releasing is done only after the copy is made.
So, unless I've missed something, I'll have to back-pedal on my
back-pedalling, and conclude that it isn't safe at all to use
COW-aliased CStrings between threads, if there's ever a chance you'll
modify them, causing copy-on-write to occur.
--
Doug Harrison
dHar...@worldnet.att.net
[Excellent analysis of problem deleted]
Yes,
>
>So, unless I've missed something, I'll have to back-pedal on my
>back-pedalling, and conclude that it isn't safe at all to use
>COW-aliased CStrings between threads, if there's ever a chance you'll
>modify them, causing copy-on-write to occur.
What is particularly interesting about Microsoft's style of coding
here is that they half solve the problem What's the point of
InterlockedIncrement(&refcount) if you call it *after* you've created
another reference, rather than before!
What I was hoping to do is guard all accesses to a database with a
critical section.
e.g.
CString CDatabase::GetSomething()
{
CSingleLock( &m_criticalSection, TRUE );
return m_stringInDatabase;
};
CString CDatabase::PutString( CString s )
{
CSingleLock( &m_criticalSection, TRUE );
m_stringInDatabase = s;
}
This ensures that all the sharing of CStrings occurs inside a critical
section. However, as the CopyOnWrite feature is unsafe, it looks like
I'll have to do something more like:
CString CDatabase::GetSomething()
{
CSingleLock( &m_criticalSection, TRUE );
CString result = m_stringInDatabase;
result.LockBuffer(); // make sure result unshared
result.ReleaseBuffer();
return m_stringInDatabase;
};
CString CDatabase::PutString( CString s )
{
CSingleLock( &m_criticalSection, TRUE );
m_stringInDatabase = s;
m_stringInDatabase.LockBuffer(); // unshare
m_stringInDatabase.ReleaseBuffer();
}
I *think* this is safe, but I'm still trying to convince myself. It's
a pity the CopyBeforeWrite procedure isn't exported.
Regards,
-- Mike --
Mike Bell | Albion Research Ltd.
Managing Consultant | Kanata, Ontario, Canada
"We turn ideas into software products"
>I *think* this is safe, but I'm still trying to convince myself. It's
>a pity the CopyBeforeWrite procedure isn't exported.
Mike,
Concerning my earlier messages, I received the following in mail from
Dean McCrory at MS, who suggested I might post it here:
>You are correct. MFC 4.x's ref-counted CString was not designed to be
>used as you have indicated. You have really two choices:
>
>1) make sure one thread keeps an anchor on the string (ie. never
>allowing it to go away until the string is no longer shared between
>multiple threads, by holding on to a reference to it in at least one
>thread for the duration of the said "sharing").
>2) use LockBuffer to avoid that particular string being accidentally
>shared (this is obviously the safer route).
--
Doug Harrison
dHar...@worldnet.att.net