Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Boolean Buyers Beware ... AIX compiler bug --- PMR 26241,756

22 views
Skip to first unread message

dick.dunbar

unread,
Apr 30, 2007, 3:14:58 PM4/30/07
to
Boolean Buyers Beware ... AIX compiler bug --- PMR 26241,756.

Our code runs flawlessly in 32-bit mode.
During a 24-hour torture test of our 64-bit implementation, the system
mysteriously
crashed 10 hours into the run. Hardware watchpoints weren't much help
to isolate the
crash, so we spent several weeks writing small testcases to reproduce
the problem.
The resulting test case will crash within a few seconds.

Hint: _Tag _M_tag:8; is the line that causes the problem; an 8-bit
holder of enum data.

Test Case Essentials:
--------------------
class NQNString {
public:
enum _Tag {_S_leaf, _S_concat, _S_substringfn, _S_function};
volatile int _M_ref_count; // 32-bit int used in tcpip to
32-bit clients
_Tag _M_tag:8; // 8-bit char is as large as
we need
int _M_incr() {return fetch_and_add(&_M_ref_count, 1);}
int _M_decr() {return fetch_and_add(&_M_ref_count,-1);}
};

void *ThreadFunc(void *threadid) {
for (int i = 0; i < 100000000; ++i) {
gs._M_incr(); // atomicly increment
gs._M_tag=NQNString::_S_concat; // set it to one
if(gs._M_decr() == 0 ) ++*(int*)0; // ref_count should never go
to zero!
}
return 0;
}

The Bug:
-------
1. ref_count is incremented/decremented consistently using AIX atomic
ops

2. _M_tag assignment causes both ref_count and _tag to be loaded into
a 64-bit register, updated by twiddling bits, and stored non-
atomicly.
ref_count is modified when only _tag was specified in the source.

3. If multiple threads execute this code, it is only a matter of time
before a thread updates the count (1 -> 2), and another thread
loads the same count (1), updates the _Tag field, and stores the
1 back. When the first thread decrements, the count now goes from
1 to zero, when the expectation was from 2 to 1.

4. The reason for this is an IBM compiler construct called
BITCONTAINER,
where the volatile int and bit-field are manipulated as a unit.

PMR 26241,756.
-------------
IBM identified a source work-around, and claimed the problem was our
source.
A TechNote was published but fails to relate to our 64-bit experience.

http://www-1.ibm.com/support/docview.wss?rs=2239&uid=swg21259159

Problem
-------
In a multithreaded program, updating a bitfield variable may affect
the values of other variables.

Cause
-----
Memory is logically partitioned into containers for storing data.
The size of the container depends on the alignment of the structure.
For example,

struct foo {
short x;
int y : 8;
};

In this case, a 4-byte container will be created:
foo::x will occupy the first and second bytes of the container
foo::y will occupy the third byte

Assume that struct foo is used in a multithreaded program, and
each thread updates both foo::x and foo::y.
An unexpected value may potentially be passed to foo::x when multiple
threads are processing data in struct foo at the same time because the
memory manipulation of bit-field variables can potentially
"load and store" other variables in the same container.

Solution
--------
When updating a variable, users should use thread mutex lock to ensure
only one thread is updating it. This will avoid the issue.
Alternatively users may insert "int :0;" before the bitfield.
This will change the layout of the struct, so the container for the
bitfields does not overlap the first member.
------- End of IBM TechNote ----------------

I cited section 9.6 of the C++ standard as justification for our
coding.

ISO_14482 C++ Standard Section: 9.6 bit-fields (class.bit)
------------------------------------------------------------
* id:constant ... represents bits. "constant" may be larger than
needed, for padding
* allocation of bit-fields within a class object is implementation
defined
* Alignment of bit-fields is implementation-defined
* bit fields are packed into some addressable allocation unit
* bit-fields straddle allocation units on some machine and not on
others.
* bit-fields are assigned right-to-left on some machines, left-to-
right on others
* unnamed bit-fields are not members and cannot be initialized
* special case: unnamed bit-field with width zero specifies alignment
of next bit-field
at an allocation unit boundary. Only unnamed bit-fields may use :0
* A bool value can successfully be stored in a bit-field of any
nonzero size.
The phrase "allocation unit" is only used in section 9.6, and never
defined.
------------- End of Standard snippets ------

IBM Response:
------------
Section 9.6 of ISO14882 does not specify the mandatory bit-field
implementation in multithreaded environment.
As a result, we have not violated the current C++ bit-field standard.
Currently we do not have plans to revise the bit-field implementation
of the compiler.

Response:
--------
The standard expects you to do something rational on your hardware.
IBM can do anything they like to bit-fields, but to subsume
non-bitfield POD data into your BITCONTAINER is wrong.

Modifying two pieces of data, when only one object was specified in
our source
code is about the worst kind of compiler error I can think of.

To avoid the compiler bug, we changed _M_tag to remove the
implication of bits,
and therefore stop IBM from "inventing this BITCONTAINER construct"
when none was
intended or implied in our code.

_Tag _M_tag:8; // Fails: single byte of enum data
char _M_tag; // Works: but no longer associates _M_tag with the
enum

This code only fails on IBM compilers (Version 7 and 8), and no other
platform.

So what do you think? Our bug or theirs?

dick.dunbar

unread,
Apr 30, 2007, 7:59:31 PM4/30/07
to
xlC_r -q64 -qsource -qlist -c tc.cpp
vi tc.lst
/17| finds the generated code for stmt 17 gs.c =
NQNString::_S_concat;


--- tc.cpp ---
#include <sys/atomic_op.h>


class NQNString {
public:
enum _Tag {_S_leaf, _S_concat, _S_substringfn, _S_function};
volatile int _M_ref_count;

_Tag t:8; bool b:8; int i:8; char c:8;
char* p; // Remove this, and correct code
is generated
NQNString():_M_ref_count(1) { }
int _M_incr() {return fetch_and_add((int*)&_M_ref_count, 1);}
int _M_decr() {return fetch_and_add((int*)&_M_ref_count,-1);}
};

NQNString gs; // ref_count gets initialized to 1


void *ThreadFunc(void *threadid) {
for (int i = 0; i < 100000000; ++i) {
gs._M_incr();

gs.c=NQNString::_S_concat; // gs.t | gs.b | gs.i | gs.c
if(gs._M_decr() == 0 ) ++*(int*)0; // ref_count should never be
zero
}
return 0;
}

ref_count and c are in a single 8-byte "BITCONTAINER". char *p is in
a second container.
ref_count gets atomicly updated by "fetch_and_add()"
modification of gs.c is performed by an 8byte load of ref_count and
gs.c,
maneuvering some bits into place for gs.c, and an 8-byte store.

Change gs.c from char c:8 to char c, and you get the expected "store
byte" instruction.

If you remove the "char *p" from the class (structure), the compiler
then manipulates
a single word of storage, not a double word. Inefficient when all
that is required is a
byte store, but non-destructive for our purposes.

Barry Kelly

unread,
Apr 30, 2007, 8:44:51 PM4/30/07
to
dick.dunbar wrote:

> [big snip]


> This code only fails on IBM compilers (Version 7 and 8), and no other
> platform.
>
> So what do you think? Our bug or theirs?

My first thought: the C++ standard says next to nothing about threads,
so you can't rely on it much for supporting evidence. Second thing, if a
machine doesn't have simple byte-addressable memory, then a compiler
will need to do the kind of stuff it sounds like the IBM compiler is
doing: load a larger unit, modify it, then save it back down. This works
transparently in single-threaded code.

In summary: I think it's questionable that the compiler acts differently
for a bitfield of width 8 bits (and doesn't straddle different bytes)
and an 8-bit char, but I can't say that it is incorrect.

-- Barry

--
http://barrkel.blogspot.com/

Thomas Richter

unread,
Apr 30, 2007, 9:46:45 PM4/30/07
to
dick.dunbar <dick....@gmail.com> wrote:

> Boolean Buyers Beware ... AIX compiler bug --- PMR 26241,756.

> Hint: _Tag _M_tag:8; is the line that causes the problem; an 8-bit


> holder of enum data.
>
> Test Case Essentials:
> --------------------
> class NQNString {
> public:
> enum _Tag {_S_leaf, _S_concat, _S_substringfn, _S_function};
> volatile int _M_ref_count; // 32-bit int used in tcpip to
> 32-bit clients
> _Tag _M_tag:8; // 8-bit char is as large as
> we need
> int _M_incr() {return fetch_and_add(&_M_ref_count, 1);}
> int _M_decr() {return fetch_and_add(&_M_ref_count,-1);}
> };
>
> void *ThreadFunc(void *threadid) {
> for (int i = 0; i < 100000000; ++i) {
> gs._M_incr(); // atomicly increment
> gs._M_tag=NQNString::_S_concat; // set it to one
> if(gs._M_decr() == 0 ) ++*(int*)0; // ref_count should never go
> to zero!
> }
> return 0;
> }

First of all, please note that C++ makes no definitions about threads
at all, and provides no guarantees whatsoever concerning thread-safety.
Thus, it's purely an extension your vendor provides, reliable or unreliable.
Make sure you know what you're doing, but the C++ standard offers you nothing
here.

> The Bug:
> -------
> 1. ref_count is incremented/decremented consistently using AIX atomic
> ops

Thus, test_and_set() works according to what was intented by the
vendor.

> 2. _M_tag assignment causes both ref_count and _tag to be loaded into
> a 64-bit register, updated by twiddling bits, and stored non-
> atomicly.

Well, if that is the only way how to access an 8 bit field on the
architecture, that's it. If you want thread-safe access on a
structure, use a mutex, hopefully provided by the implementation.

> PMR 26241,756.
> -------------
> IBM identified a source work-around, and claimed the problem was our
> source.
> A TechNote was published but fails to relate to our 64-bit experience.
>
> http://www-1.ibm.com/support/docview.wss?rs=2239&uid=swg21259159
>
> Problem
> -------
> In a multithreaded program, updating a bitfield variable may affect
> the values of other variables.

That is definitely a problem. It is also a problem the C++ standardization
committee is aware of, see the discussion in comp.lang.c++.moderated.
I think it is really that: On some architectures, it is unreasonable
to expect that you can access structure units atomically (without adding
much overhead). If you need to be thread safe, you should protect the whole
thing by a mutex.

> I cited section 9.6 of the C++ standard as justification for our
> coding.

I don't think the C++ standard has a word to say about threads. Note
that a C++ compiler is always free to operate on an "as if" basis, i.e.
its implementation might do whatever it wants as long as the observed
result is the same. And as C++ only addresses single-threaded programs,
observed behaviour is only defined in this sense. Thus, your program
has been proven to work correctly for single-threaded by the compiler,
and thus the compiler is fine.

> IBM Response:
> ------------
> Section 9.6 of ISO14882 does not specify the mandatory bit-field
> implementation in multithreaded environment.

A bit "off" I would call that. The problem is that the standard does
not say anything about multithreading. Leave alone how bitfields are
supposed to work..

> The standard expects you to do something rational on your hardware.
> IBM can do anything they like to bit-fields, but to subsume
> non-bitfield POD data into your BITCONTAINER is wrong.

No, why? You seem to imply that an "allocation unit" itself is something
that can be addressed in a thread-safe way. There is no such guarantee
in the standard. An allocation unit might well be a byte, even if the
processor has no provisions to access that unit atomically.

> To avoid the compiler bug, we changed _M_tag to remove the
> implication of bits,
> and therefore stop IBM from "inventing this BITCONTAINER construct"
> when none was
> intended or implied in our code.

A "bitcontainer" is nothing specified in C++. Compilers may use that
*if* they can prove correct behaivor on the virtual machine C++ defines.
And that one is single-threaded. Thus, *formally* the code is running
into undefined-behaivour land. IBM provides additional guarantees to
you when allowing you to run multi-threaded code. As it seems, they don't
express them very clearly.

> So what do you think? Our bug or theirs?

Your bug to depend on something that isn't specified. But unfortunately,
C++ doesn't specify anything here, and still one needs multithreading.

Thus, IBM may specify whatever they found reasonable. Aparently,
that's different from what you expected or think it's reasonable.
Thus, probably bad documentation.

So long,
Thomas

David Schwartz

unread,
Apr 30, 2007, 10:53:09 PM4/30/07
to
On Apr 30, 12:14 pm, "dick.dunbar" <dick.dun...@gmail.com> wrote:

> public:
> enum _Tag {_S_leaf, _S_concat, _S_substringfn, _S_function};
> volatile int _M_ref_count; // 32-bit int used in tcpip to
> 32-bit clients
> _Tag _M_tag:8; // 8-bit char is as large as
> we need
> int _M_incr() {return fetch_and_add(&_M_ref_count, 1);}
> int _M_decr() {return fetch_and_add(&_M_ref_count,-1);}
>
> };

Umm, *you* made '_M_ref_count' and '_M_tag' parts of the same storage
object. How is it the compiler's fault that this is not what you
meant?

Suppose you had a platform that did not have atomic 8-bit operations,
but you did this:
volatile char a;
char b;

What could 'a=8;' do but read in both a and b, modify the 'a' part,
and write it back? What could 'b=8;' do but read in both a and b,
modify the 'b' part, and write it back?

Even if the platform has small atomic operations, it is never required
to use them. That's why you *had* to use atomic operations for
_M_ref_count. Surely the platform wasn't required to somehow determine
that it was important to you that it do so for _M_tag too?

Sorry, it's your bug. You assumed that manipulating one part of an
object wouldn't affect another part. In analogous cases, it is clear
that this will never be the case. (Consider successive character
variables on a platform that has no 8-bit atomic operations.)

So you relied on something that was not guaranteed by anything.

DO NOT EVER DO THAT. The optimization is not worth the ultimate
suffering when the assumption turns out not to be true on some other
platform or with some other compiler.

If you needed to do this as an optimization because of a platform
where it was proven to be required and safe, why did you enable it on
*this* platform?

No standard guarantees this, hence it's a platform-specific
optimization to be used only where it is known to help and known to be
safe.

DS

dick.dunbar

unread,
Apr 30, 2007, 11:17:41 PM4/30/07
to
On Apr 30, 4:44 pm, Barry Kelly <barry.j.ke...@gmail.com> wrote:

> My first thought: the C++ standard says next to nothing about threads,
> so you can't rely on it much for supporting evidence. Second thing, if a
> machine doesn't have simple byte-addressable memory, then a compiler
> will need to do the kind of stuff it sounds like the IBM compiler is
> doing: load a larger unit, modify it, then save it back down. This works
> transparently in single-threaded code.

The compiler didn't need to (and doesn't) combine these fields in an
unexpected way. It has byte addressable storage.
I did _not_ expect the compiler to modify two fields when the source
only
named one field.

>
> In summary: I think it's questionable that the compiler acts differently
> for a bitfield of width 8 bits (and doesn't straddle different bytes)
> and an 8-bit char, but I can't say that it is incorrect.

See my second post in this thread. If we declare the object as "char
c" and not "char c:8",
then the compiler generates the expected code.


dick.dunbar

unread,
Apr 30, 2007, 11:29:34 PM4/30/07
to
On Apr 30, 5:46 pm, Thomas Richter <t...@mersenne.math.TU-Berlin.DE>
wrote:

> > The Bug:
> > -------
> > 1. ref_count is incremented/decremented consistently using AIX atomic
> > ops
>
> Thus, test_and_set() works according to what was intented by the
> vendor.

Yes, of course. That's my point.

>
> > 2. _M_tag assignment causes both ref_count and _tag to be loaded into
> > a 64-bit register, updated by twiddling bits, and stored non-
> > atomicly.
>
> Well, if that is the only way how to access an 8 bit field on the
> architecture, that's it. If you want thread-safe access on a
> structure, use a mutex, hopefully provided by the implementation.

It is not the only way to accomplish this, and if we stop defining
things in
terms of the number of bits required for storage, the compiler
generates
the proper byte-oriented modifications to the field we specified in
our source.


> > -------
> > In a multithreaded program, updating a bitfield variable may affect
> > the values of other variables.
>
> That is definitely a problem. It is also a problem the C++ standardization
> committee is aware of, see the discussion in comp.lang.c++.moderated.
> I think it is really that: On some architectures, it is unreasonable
> to expect that you can access structure units atomically (without adding
> much overhead). If you need to be thread safe, you should protect the whole
> thing by a mutex.

But the compiler does have other options for updating byte fields,
and using those techniques as long as we don't declare a bit-field in
a structure.
Not only is the generated code to store one byte of constant data
destructive
of other fields ... it is also enormously inefficient. I haven't
posted the generated code.


> I don't think the C++ standard has a word to say about threads. Note
> that a C++ compiler is always free to operate on an "as if" basis, i.e.
> its implementation might do whatever it wants as long as the observed
> result is the same. And as C++ only addresses single-threaded programs,
> observed behaviour is only defined in this sense. Thus, your program
> has been proven to work correctly for single-threaded by the compiler,
> and thus the compiler is fine.

I'm still flabbergasted that reasonable people come to this
conclusion.

Note: The problem isn't confined to threaded programs, if you were to
build a 64-bit device driver that modified adapter storage in a
particular
way, this would be completely unexpected that the compiler should
modify
two fields when one was specified.

>
> > IBM Response:
> > ------------
> > Section 9.6 of ISO14882 does not specify the mandatory bit-field
> > implementation in multithreaded environment.
>
> A bit "off" I would call that. The problem is that the standard does
> not say anything about multithreading. Leave alone how bitfields are
> supposed to work..

I agree. I didn't bring the C++ standard into the discussion, IBM
did.

>
> > The standard expects you to do something rational on your hardware.
> > IBM can do anything they like to bit-fields, but to subsume
> > non-bitfield POD data into your BITCONTAINER is wrong.
>
> No, why? You seem to imply that an "allocation unit" itself is something
> that can be addressed in a thread-safe way. There is no such guarantee
> in the standard. An allocation unit might well be a byte, even if the
> processor has no provisions to access that unit atomically.

Byte modification is allowed in the PowerPC architecture.
The compiler had no reason to use an algorithm like that.


>
> > So what do you think? Our bug or theirs?
>
> Your bug to depend on something that isn't specified. But unfortunately,
> C++ doesn't specify anything here, and still one needs multithreading.

Wow. Forget the standard. This code runs flawlessly on 32-bit AIX,
and 32/64-bit on every other platform.

You think specifying the alteration of a single character of storage
is depending
on some undefined behaviour? How about common sense here??

dick.dunbar

unread,
Apr 30, 2007, 11:40:43 PM4/30/07
to
On Apr 30, 6:53 pm, David Schwartz <dav...@webmaster.com> wrote:
>
> Umm, *you* made '_M_ref_count' and '_M_tag' parts of the same storage
> object. How is it the compiler's fault that this is not what you
> meant?

Huh. I meant EXACTLY that. Class objects in C++ are an abstraction.
We "lump things together" that it makes sense to, from the description
of the problem space we're defining.

>
> Suppose you had a platform that did not have atomic 8-bit operations,
> but you did this:
> volatile char a;
> char b;

Let's not get into wild "Suppose" speculations here.
The PowerPC does not have 8-bit atomic operations, except by clever
usage of lwarx/stwcx.

And no atomic modification of the 8-bit char object was expressed or
implied in the source code. We meant for a single byte to be modified
to a specific value when we know we are the only owner of such an
object.

>
> What could 'a=8;' do but read in both a and b, modify the 'a' part,
> and write it back? What could 'b=8;' do but read in both a and b,
> modify the 'b' part, and write it back?

On PowerPC, it would:
li R0,1 // Load Immediate into Register 0 the value of 1
stb R0,address // Store that single byte into the field specified by
the source

>
> Even if the platform has small atomic operations, it is never required
> to use them. That's why you *had* to use atomic operations for
> _M_ref_count. Surely the platform wasn't required to somehow determine
> that it was important to you that it do so for _M_tag too?

I didn't expect it to do any atomic operations on _M_tag at all.
This is part of making an application "thread efficient".

The problem is the compiler modified two fields when only one was
specified
in our source code. I expect it to modify the single byte specified
by the
source, without modifying anything else. And to do this non-
atomicly.
Atomic update is not required at all.

If every field in a threaded program required mutex or atomic
semantics
for modification, nothing would work efficiently.

>
> Sorry, it's your bug. You assumed that manipulating one part of an
> object wouldn't affect another part. In analogous cases, it is clear
> that this will never be the case. (Consider successive character
> variables on a platform that has no 8-bit atomic operations.)
>
> So you relied on something that was not guaranteed by anything.

Exactly what do you think I relied on? That the compiler should
alter
the byte I specified and nothing else. Oh yes, I rely on that every
day.

>
> DO NOT EVER DO THAT. The optimization is not worth the ultimate
> suffering when the assumption turns out not to be true on some other
> platform or with some other compiler.

This not an optimization. Our code works exactly as we intended on
AIX 32-bit
and every other platform in 64-bit. The implication is that we did
something
wrong in our source. I don't believe it.

>
> If you needed to do this as an optimization because of a platform
> where it was proven to be required and safe, why did you enable it on
> *this* platform?

Sorry, you completely lost me. We made no such assumptions for AIX.
We didn't enable anything specific for this platform.

And, you make the same argument that I've heard from a number of IBM
contacts:

"If your code doesn't work in a threaded environment, it must be
something you
did wrong, because our compiler doesn't make any mistakes."

Wow. Talk about naiive.

Barry Kelly

unread,
May 1, 2007, 12:25:26 AM5/1/07
to
dick.dunbar wrote:

> Exactly what do you think I relied on? That the compiler should
> alter
> the byte I specified and nothing else. Oh yes, I rely on that every
> day.

And that's what it does do, when single-threaded, which is all C++
guarantees for you. I think IBM's case is pretty air-tight, when viewed
as a defect. Without knowing all the details, I suspect it may not be
optimal, however.

> > DO NOT EVER DO THAT. The optimization is not worth the ultimate
> > suffering when the assumption turns out not to be true on some other
> > platform or with some other compiler.
>
> This not an optimization. Our code works exactly as we intended on
> AIX 32-bit
> and every other platform in 64-bit. The implication is that we did
> something
> wrong in our source. I don't believe it.

I'm fairly sure you're not going to win this battle. You need to accept
that sometimes you make mistakes too :)

> And, you make the same argument that I've heard from a number of IBM
> contacts:
>
> "If your code doesn't work in a threaded environment, it must be
> something you
> did wrong, because our compiler doesn't make any mistakes."
>
> Wow. Talk about naiive.

I'm pretty sure that isn't a verbatim quote, because I can't imagine a
compiler engineer or QA engineer saying "our compiler doesn't make any
mistakes" - shipping commercial compilers have hundreds, if not
thousands, of very minor corner case bugs, and it's even more true when
the language is as subtle and complex as C++. I say that working as I do
on the Delphi compiler at Borland.

But it's also ironic, because it seems that you yourself cannot admit
the possibility of error on your own part!

rfis...@gmail.com

unread,
May 1, 2007, 4:33:42 AM5/1/07
to
dick.dunbar ha scritto:

> Boolean Buyers Beware ... AIX compiler bug --- PMR 26241,756.
>
> Our code runs flawlessly in 32-bit mode.

[...]

> _Tag _M_tag:8; // 8-bit char is as large as

There's your problem! Looks like you're using bitfields.
Those things are basically undefined. Those people who
do use them usually do so to get sub-byte ints, but yours is 1 byte...
Why not just use a 1 byte int, something like u_int8_t from types.h?

P.S. I'm pretty sure it was Bjarne who said bitfields
were for "idiots and children".

RF

--
felix: lightweight threads, HOFs and game scripting.
svn co https://svn.sourceforge.net/svnroot/felix/felix/trunk felix

Gil Hamilton

unread,
May 1, 2007, 7:48:05 AM5/1/07
to
"dick.dunbar" <dick....@gmail.com> wrote in
news:1177960498.3...@o5g2000hsb.googlegroups.com:

> class NQNString {
> public:
> enum _Tag {_S_leaf, _S_concat, _S_substringfn, _S_function};
> volatile int _M_ref_count; // 32-bit int used in tcpip to
> 32-bit clients
> _Tag _M_tag:8; // 8-bit char is as large as

[snip]
> };

> 2. _M_tag assignment causes both ref_count and _tag to be loaded into
> a 64-bit register, updated by twiddling bits, and stored non-
> atomicly.
> ref_count is modified when only _tag was specified in the source.

OK. I'm going to go against the flow here and express sympathy for your
position. :-)

I think IBM has erred with their cavalier treatment of a volatile. From
the C++ standard at
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2135.pdf
(section 1.9):

6 The observable behavior of the abstract machine is its
sequence of reads and writes to volatile data and calls
to library I/O functions.(8)

7 Accessing an object designated by a volatile lvalue
[other side effects] are all side effects, which are
changes in the state of the execution environment.

There is also this from section 7.1.5.1 paragraph 7:

[ Note: volatile is a hint to the implementation to avoid
aggressive optimization involving the object because the
value of the object might be changed by means undetectable
by an implementation. See 1.9 for detailed semantics. In
general, the semantics of volatile are intended to be the
same in C++ as they are in C. 容nd note ]

Which would lead to this reference in section 6.7.3 paragraph 6 of the C
standard (May 6, 2005 draft):

6 An object that has volatile-qualified type may be modified
in ways unknown to the implementation or have other unknown
side effects. Therefore any expression referring to such an
object shall be evaluated strictly according to the rules of
the abstract machine, as described in 5.1.2.3. Furthermore,
at every sequence point the value last stored in the object
shall agree with that prescribed by the abstract machine,
except as modified by the unknown factors mentioned
previously.(114) What constitutes an access to an object that
has volatile-qualified type is implementation-defined.

By loading up and modifying _M_ref_count along with _M_tag, they are
violating the "rules of the abstract machine" with respect to the
volatile variable; i.e. it has been modified in ways other than those
prescribed by the abstract machine with a resulting change to the
"observable behavior".

Now everything in the standard(s) concerning implementation of volatile
leaves plenty of wiggle room for interpretation so it's difficult to say
that they have actually violated the standard even so. However, I'm
quite comfortable saying "it's broken".

Frankly, I don't understand why a compiler implementer would ever allow a
volatile to share an "atomic" data container with another variable at
all. The way to handle this in the compiler is simply to align the bit
field (or whatever the next variable is) to the next atomic boundary.
Structure alignment padding is something programmers know to expect and
account for while having a volatile loaded up along with another field
and stored back under the covers violates the law of programmer
expectations. It's just a stupid thing for the compiler developer to do.

As much as I sympathize, however, the bottom line is that I don't think
you're going to win the battle. You probably need to move on after
placing a big warning comment in the code (wherein you say rude things
about IBM's implementation).

GH

Thomas Richter

unread,
May 1, 2007, 9:33:23 AM5/1/07
to
dick.dunbar <dick....@gmail.com> wrote:

>> Well, if that is the only way how to access an 8 bit field on the
>> architecture, that's it. If you want thread-safe access on a
>> structure, use a mutex, hopefully provided by the implementation.
>
> It is not the only way to accomplish this, and if we stop defining
> things in
> terms of the number of bits required for storage, the compiler
> generates
> the proper byte-oriented modifications to the field we specified in
> our source.

Probably, but probably its the most efficient way of doing it. And
note, the compiler *is* allowed to do that by acting on the basis of
the "as-if" rule.

> But the compiler does have other options for updating byte fields,
> and using those techniques as long as we don't declare a bit-field in
> a structure.

But there is no requirement to choose one specific method over
another.

> Not only is the generated code to store one byte of constant data
> destructive
> of other fields ... it is also enormously inefficient. I haven't
> posted the generated code.

I don't know. Maybe not. On some architectures, an aligned 64bit
transfer is easier than an unaligned byte transfer.

>> I don't think the C++ standard has a word to say about threads. Note
>> that a C++ compiler is always free to operate on an "as if" basis, i.e.
>> its implementation might do whatever it wants as long as the observed
>> result is the same. And as C++ only addresses single-threaded programs,
>> observed behaviour is only defined in this sense. Thus, your program
>> has been proven to work correctly for single-threaded by the compiler,
>> and thus the compiler is fine.
>
> I'm still flabbergasted that reasonable people come to this
> conclusion.

On which ground would you want to understand this a compiler bug.
A compiler bug is the failure of the compiler to comply to the
language definition. The language definition, however, defines only
a single-threaded execution model. A compiler can do anything it
likes for multithreading as far as the standard is concerned. Thus,
the compiler is well-behaiving *in terms of standard C++*.

If the vendor provides any guarantees beyond that standard, and
the compiler breaks those rules, then you also found a bug. But
aparently bitfields are, according to the IBM specs, not expected
to work reasonable with multithreaded code, there is no atomic
access.

> Note: The problem isn't confined to threaded programs, if you were to
> build a 64-bit device driver that modified adapter storage in a
> particular
> way, this would be completely unexpected that the compiler should
> modify
> two fields when one was specified.

And again, the standard says nothing about that. If you access any
I/O devices thru pointers, you're up to yourself and what your
vendor guarantees. The virtual machine defined by the C++ standard
again doesn't say a word about I/O devices. You *may* get around
this problem then by "hinting" the compiler by the "volatile" keyword,
but again, you're still in "undefined" land.

> I agree. I didn't bring the C++ standard into the discussion, IBM
> did.

They did, but that's *one* basis on which one can define whether a compiler
is working correctly or not. Thus, it *is* conforming to the C++ standard.
If you want to do something that is not covered by the standard, well,
you're up to yourself.

Maybe the compiler isn't conforming to its own (IBM) specs, but for that, you
need to point IBM to their own manual where they (potentially) claim
how it handles multithreading. But again, likely IBM did not specify
anything at all in this corner.

>> > The standard expects you to do something rational on your hardware.
>> > IBM can do anything they like to bit-fields, but to subsume
>> > non-bitfield POD data into your BITCONTAINER is wrong.
>>
>> No, why? You seem to imply that an "allocation unit" itself is something
>> that can be addressed in a thread-safe way. There is no such guarantee
>> in the standard. An allocation unit might well be a byte, even if the
>> processor has no provisions to access that unit atomically.
>
> Byte modification is allowed in the PowerPC architecture.
> The compiler had no reason to use an algorithm like that.

Aparently, it had. It is probably an optimization strategy. Even then,
it does not have have a reason provided it can prove that the code
works on the C++ "virtual machine". And that it does. The VM is single
threaded (and has no I/O devices).

>> > So what do you think? Our bug or theirs?
>>
>> Your bug to depend on something that isn't specified. But unfortunately,
>> C++ doesn't specify anything here, and still one needs multithreading.
>
> Wow. Forget the standard. This code runs flawlessly on 32-bit AIX,
> and 32/64-bit on every other platform.

Ok, but on which other ground do you want to claim that the compiler
has a bug? The only other way I can think of is to point IBM to their
own manual, but I doubt you'll find anything. Thus, you're having
expectations on the compiler that were never written down and spelled
out, and it only failed to comply to your own expectations. Bad luck.

> You think specifying the alteration of a single character of storage
> is depending
> on some undefined behaviour?

No. I'm saying that simultaneous access from several threads to a single
character is undefined behaviour. And if your program were single-threaded,
it would have worked fine.

> How about common sense here??

It's not. It is the failure of the standard not to cover multithreaded
code. It's being worked on, as said. From the discussions I followed
in comp.lang.c++.moderated (I'm not in the C++ ISO committee), I got
the impression that even with the enhanced standard, atomic access
to bitfields is nothing that will be covered in the future, though.
The reason is that C++ compilers might want to apply optimization
techniques to them, and on several platforms, there is also no way
how to make an access to one field of a structure without also
accessing neighbouring fields.

So long,
Thomas

David Hopwood

unread,
May 1, 2007, 12:06:04 PM5/1/07
to
dick.dunbar wrote:
> --------------------
> class NQNString {
> public:
> enum _Tag {_S_leaf, _S_concat, _S_substringfn, _S_function};
> volatile int _M_ref_count; // 32-bit int used in tcpip to
> 32-bit clients
> _Tag _M_tag:8; // 8-bit char is as large as
> we need
> int _M_incr() {return fetch_and_add(&_M_ref_count, 1);}
> int _M_decr() {return fetch_and_add(&_M_ref_count,-1);}
> };
[...]

> The Bug:
> -------
> 1. ref_count is incremented/decremented consistently using AIX atomic
> ops
>
> 2. _M_tag assignment causes both ref_count and _tag to be loaded into
> a 64-bit register, updated by twiddling bits, and stored non-
> atomicly.
> ref_count is modified when only _tag was specified in the source.
>
> 3. If multiple threads execute this code, it is only a matter of time
> before a thread updates the count (1 -> 2), and another thread
> loads the same count (1), updates the _Tag field, and stores the
> 1 back. When the first thread decrements, the count now goes from
> 1 to zero, when the expectation was from 2 to 1.
>
> 4. The reason for this is an IBM compiler construct called
> BITCONTAINER, where the volatile int and bit-field are manipulated
> as a unit.
>
> PMR 26241,756.
> -------------
> IBM identified a source work-around, and claimed the problem was our
> source.

It isn't. I think it is fairly clear that the compiler is violating the
intended semantics of 'volatile' in POSIX and/or C++.

(Strictly speaking, there is no applicable standard for multi-threaded C++,
but it's likely that the C compiler does the same thing, and that the
C++ compiler does the same thing for single-threaded programs.)

The standard line in this newsgroup is that 'volatile' has no well-defined
semantics under multi-threading. While that may be true, the problem here
has nothing specifically to do with multi-threading. The 'volatile'
declaration is supposed to guarantee that there are no accesses to
_M_ref_count that are not explicit in the program. The only wiggle room
that an implementor has here is that "What constitutes an access to an object
that has volatile-qualified type is implementation-defined" (that is the
wording in section 6.7.3 #6 of the C99 standard, but section 7.1.5.1 of
the C++ standard says: "In general, the semantics of volatile are intended
to be the same in C++ as they are in C.")

So, go look for the documentation that specifies what constitutes a volatile
access on this platform (ask IBM if you can't find it):

- if there is no such documentation, then the C99 and C++ standards are
being violated, since they require implementation-defined things to
be documented.

- if the documentation says something to the effect that in this case, an
access means that some machine instruction reads or writes the virtual
memory that holds the representation of _M_ref_count, then it's clear
that semantics of volatile (C99 section 6.7.3) are being violated.

- if the documentation essentially makes 'volatile' meaningless, then that's
a big problem for the AIX platform, since many, many programs rely on
it.

> A TechNote was published but fails to relate to our 64-bit experience.
>
> http://www-1.ibm.com/support/docview.wss?rs=2239&uid=swg21259159

The TechNote doesn't cover the 'volatile' case. It is clear that if
_M_ref_count were not declared volatile, then the word tearing would be
allowed.

That's the wrong track. Sections 7.1.5.1 of C++ and 6.7.3 of C99 are what
you want.

--
David Hopwood <david....@industrial-designers.co.uk>

Peter Dimov

unread,
May 1, 2007, 12:11:32 PM5/1/07
to
On Apr 30, 10:14 pm, "dick.dunbar" <dick.dun...@gmail.com> wrote:

> class NQNString {
> public:
> enum _Tag {_S_leaf, _S_concat, _S_substringfn, _S_function};
> volatile int _M_ref_count; // 32-bit int used in tcpip to
> 32-bit clients
> _Tag _M_tag:8; // 8-bit char is as large as
> we need
> int _M_incr() {return fetch_and_add(&_M_ref_count, 1);}
> int _M_decr() {return fetch_and_add(&_M_ref_count,-1);}
>
> };

...

> So what do you think? Our bug or theirs?

Theirs because of the volatile, as Gil Hamilton pointed out.

In the upcoming C++0x standard it would likely be required to work
even without a volatile:

http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2007/n2171.htm#location

You are using reserved identifiers, by the way. The compiler is
allowed to have a macro _M_incr or _Tag and break your code.

dick.dunbar

unread,
May 1, 2007, 12:20:34 PM5/1/07
to
"First things first, but not necessarily in that order." - Doctor Who

I am trying to convince others this is a serious bug, but it doesn't
really matter if you agree or not.
This was a _serious_ bug that held up shipment of our code for a few
weeks.
I didn't write the code ... in fact, it comes from STLport, but I
don't see how I could have expressed
the intent any better and have it work on all compilers.

Let's simplify, and show 3 code generations by changing the class
declarations.

The three relevant data items in the class are:

volatile int Ref_count; // Updated with atomic routines
_Tag T:8; // An enumeration container 1-
byte wide
char* P; // Pointer to a char string

And the source:
gs.T = NQNString::_S_concat; // Set T to a constant (enum)
value

_Tag T:8 ... contains the values of an enumeration. The :8 tells the
compiler to use a byte.

The IBM compiler treats this as a bit-field, and creates two internal
BITCONTAINER objects:
+0 Ref_count BITCONTAINER@0
+4 T
+5 3 bytes of Padding
+8 64-bit pointer to char BITCONTAINER@8

The generated code for "gs.T = " is 5 instructions, 2 storage touches,
8 bytes modified non-atomicly

L8 gr5=gs.BITCONTAINER@0(gr3,0)
LI gr4=1
RN8 gr5=gr5,56,0xFFFFFFFFFFFFFFFF
RI8 gr4=gr5,8,gr4,0xFFFFFFFFFFFFFF00
ST8 gs.BITCONTAINER@0(gr3,0)=gr4

If you remove the "char *p" pointer from the class, the generated
code:
4 instructions, 2 storage touches, 4 bytes altered

L4 gr5=gs.BITCONTAINER@4(gr3,4)
LI gr4=1
RI4 gr4=gr5,0r4,0xFFFFFF00
ST4 gs.BITCONTAINER@4(gr3,4)=gr4

Obvious question: So why did the first code manipulate 8 bytes and
this code manipulate 4
using exactly the same source line?
If the original generated code had manipulated 4-bytes instead of 8-
byte, we never would have
discovered another error (next post).

If you change the declaration of T:

char T;

The generated code becomes:

LI gr4=1
STB gs.c@7(gr3,7)=gr4

Clearly, the intent of our source was to modify a single byte with a
constant,
and this does it in 2 instructions with one storage touch.

You still don't see a bug? Bools Beware.

dick.dunbar

unread,
May 1, 2007, 12:37:33 PM5/1/07
to
The light at the end of the tunnel has been extinquished to save
energy.

There is a happier ending to this story than you might imagine, and my
reason for
posting is to share the discovery.

Once we discovered the BITCONTAINER compiler construct, I compiled our
application with xlC_r flags -qsource -qlist.

This shows the generated code, which I posted above. I used grep to
investigate all of the BITCONTAINER references, particularly if it
involved a LoadDouble (LD or L8) instruction.

It turned out to be a whole bunch; the bulk due to this single byte
width specification for the enumeration.

When we rebuilt our application and ran it under stress, an
astonishing
thing happened ... for the first time the application actually scaled
on AIX similiar to our other hardware platforms.

The lack of scaling was observed by using AIX "vmstat" command.
The first two columns are "r" (Runnable), and "b" (Blocked).
Prior to this change using a 200-thread pool, 175+ threads
would be in the "Run" queue.

Clearly, we're not going to scale with that kind of processor demand.
What could be wrong? I hunted and hunted through our use of threaded
constructs, and never found anything I could pin the fault to.

With this single change, the number of threads in the run queue is now
about 5-6 ... what we expected from our application.

Such things are hard to observe, but it is hard to avoid the
conclusion that
these inefficient storage modification algorithms are the root cause
of
processor contention and stalls in the processor pipeline.

Is it a bug? Who cares, there's a happy ending. Happy Hunting.

ma...@pulsesoft.com

unread,
May 1, 2007, 12:49:11 PM5/1/07
to
Peter Dimov <pdi...@gmail.com> writes:

Care to cite relevant part of standard on this? This indeed violates
rules imposed by _C_ standard, but last time i tried to read relevant
parts of C++ draft i remember only identifiers with double underscore
(regardless of position) being reserved.

--
vale

dick.dunbar

unread,
May 1, 2007, 1:15:26 PM5/1/07
to
On May 1, 4:48 am, Gil Hamilton <gil_hamil...@hotmail.com> wrote:

> "dick.dunbar" <dick.dun...@gmail.com> wrote innews:1177960498.3...@o5g2000hsb.googlegroups.com:
> I think IBM has erred with their cavalier treatment of a volatile. From
> the C++ standard athttp://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2135.pdf

> (section 1.9):
>
> 6 The observable behavior of the abstract machine is its
> sequence of reads and writes to volatile data and calls
> to library I/O functions.(8)
>
> 7 Accessing an object designated by a volatile lvalue
> [other side effects] are all side effects, which are
> changes in the state of the execution environment.
>
> There is also this from section 7.1.5.1 paragraph 7:
>
> [ Note: volatile is a hint to the implementation to avoid
> aggressive optimization involving the object because the
> value of the object might be changed by means undetectable
> by an implementation. See 1.9 for detailed semantics. In
> general, the semantics of volatile are intended to be the
> same in C++ as they are in C. -end note ]

>
> Which would lead to this reference in section 6.7.3 paragraph 6 of the C
> standard (May 6, 2005 draft):
>
> 6 An object that has volatile-qualified type may be modified
> in ways unknown to the implementation or have other unknown
> side effects. Therefore any expression referring to such an
> object shall be evaluated strictly according to the rules of
> the abstract machine, as described in 5.1.2.3. Furthermore,
> at every sequence point the value last stored in the object
> shall agree with that prescribed by the abstract machine,
> except as modified by the unknown factors mentioned
> previously.(114) What constitutes an access to an object that
> has volatile-qualified type is implementation-defined.
>
> By loading up and modifying _M_ref_count along with _M_tag, they are
> violating the "rules of the abstract machine" with respect to the
> volatile variable; i.e. it has been modified in ways other than those
> prescribed by the abstract machine with a resulting change to the
> "observable behavior".
>
> Now everything in the standard(s) concerning implementation of volatile
> leaves plenty of wiggle room for interpretation so it's difficult to say
> that they have actually violated the standard even so. However, I'm
> quite comfortable saying "it's broken".

Thank you for your insight on this. I went down the "volatile"
argument path with them, but didn't have the "standard-legal-
background"
to pursue it further. Those are valuable references.

>
> Frankly, I don't understand why a compiler implementer would ever allow a
> volatile to share an "atomic" data container with another variable at
> all. The way to handle this in the compiler is simply to align the bit
> field (or whatever the next variable is) to the next atomic boundary.
> Structure alignment padding is something programmers know to expect and
> account for while having a volatile loaded up along with another field
> and stored back under the covers violates the law of programmer
> expectations. It's just a stupid thing for the compiler developer to do.

Agreed. Long email exchanges around this point; they weren't
convinced.
But I'm not in agreement about the alignment issues, because it
involves
an enlarged footprint for the application for the compiler to pad in
such a
manner. I think they just didn't use good algorithms in the
generation
of the code. They had the ability to manipulate a word, they just
didn't do it.

It's mostly the efficiency that I object to. Stupid to read data that
you can modify directly.

>
> As much as I sympathize, however, the bottom line is that I don't think
> you're going to win the battle. You probably need to move on after
> placing a big warning comment in the code (wherein you say rude things
> about IBM's implementation).

I would do that if I hadn't worked 30 years for IBM. Things have gone
downhill since I left :-)


dick.dunbar

unread,
May 1, 2007, 1:20:27 PM5/1/07
to
On May 1, 9:49 am, m...@pulsesoft.com wrote:
> Peter Dimov <pdi...@gmail.com> writes:
> > On Apr 30, 10:14 pm, "dick.dunbar" <dick.dun...@gmail.com> wrote:
>
> > > class NQNString {
> > > public:
> > > enum _Tag {_S_leaf, _S_concat, _S_substringfn, _S_function};
> > > volatile int _M_ref_count; // 32-bit int used in tcpip to
> > > 32-bit clients
> > > _Tag _M_tag:8; // 8-bit char is as large as
> > > we need
> > > int _M_incr() {return fetch_and_add(&_M_ref_count, 1);}
> > > int _M_decr() {return fetch_and_add(&_M_ref_count,-1);}
>
> > > };
>
> > ...
>
> > > So what do you think? Our bug or theirs?
>
> > Theirs because of the volatile, as Gil Hamilton pointed out.
>
> > In the upcoming C++0x standard it would likely be required to work
> > even without a volatile:
>
> >http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2007/n2171.htm#loc...

>
> > You are using reserved identifiers, by the way. The compiler is
> > allowed to have a macro _M_incr or _Tag and break your code.
>
> Care to cite relevant part of standard on this? This indeed violates
> rules imposed by _C_ standard, but last time i tried to read relevant
> parts of C++ draft i remember only identifiers with double underscore
> (regardless of position) being reserved.

I do remember the same thing about reserved words to the compiler.
The STLport code was intended to be adopted as part of a compiler
component, so I don't fault the author for choosing this mechanism.
Such things are easily discovered using compiler diagnostics,
and we protect ourselves by using a unique namespace.
With nightly builds, this issue doesn't stay buried for long.

dick.dunbar

unread,
May 1, 2007, 1:34:52 PM5/1/07
to
On May 1, 1:33 am, "rfist...@gmail.com" <rfist...@gmail.com> wrote:
> dick.dunbar ha scritto:

> > _Tag _M_tag:8; // 8-bit char is as large as
>
> There's your problem! Looks like you're using bitfields.
> Those things are basically undefined. Those people who
> do use them usually do so to get sub-byte ints, but yours is 1 byte...
> Why not just use a 1 byte int, something like u_int8_t from types.h?

Indeed it is the problem. See my 3 code examples.

The reason for using "_Tag" is to clearly indicate this field will
only
contain the values of an enumeration. Valuable to know when you're
reading code.

I believe the problem is that that this designation is used for two
different purposes:
- To define bit fields ... had I used int or bool, I would clearly be
doing that.
- To define container widths ... that's what we are doing here.

The compiler has the freedom to define whatever width it wants for
an enumeration. We know our data and there are fewer than 255
values, so :8 is a clear choice for this purpose.

There's a compiler option "-qenum=1", but obviously we can't adopt
that
for our whole application ... and then there are these other platforms
to worry about. _Tag T:8 is the clearest, most uniform way to
express the intent.

The fact that the compiler used this as a bit-field, without
investigating
further the usage ... and specifically, that the entire field was
being assigned a constant ... is a thought-error.

>
> P.S. I'm pretty sure it was Bjarne who said bitfields
> were for "idiots and children".

Bjarne Stroustrup, C++ Programming Language, Special Edition.

pg 125: Convenient bit manipulation can be very important, but for
reliability,
maintainability, portability, etc, it should be kept at low levels of
the system.

16.3.11 vector<bool> implemented with int's.
17.5.3 bitset ... addressing a single bit

dick.dunbar

unread,
May 1, 2007, 1:49:05 PM5/1/07
to
On Apr 30, 9:25 pm, Barry Kelly <barry.j.ke...@gmail.com> wrote:

> But it's also ironic, because it seems that you yourself cannot admit
> the possibility of error on your own part!

If you were to explain the error that I have made, I'd be happy to
adjust.

The reason for putting this out in a public forum is to gain
community input. No need to attack me (yet).

It isn't even my code, so I don't have nearly the attachment to it
that you seem to imply.

What we have done is to change our code for the personality
of a particular compiler, and I am always opposed to that,
particularly when it obscures the semantics of the code.

Our code base is about 99% generic ... builds and runs everywhere.

dick.dunbar

unread,
May 1, 2007, 1:56:50 PM5/1/07
to
On May 1, 6:33 am, Thomas Richter <t...@mersenne.math.TU-Berlin.DE>
wrote:

> > You think specifying the alteration of a single character of storage
> > is depending on some undefined behaviour?
>
> No. I'm saying that simultaneous access from several threads to a single
> character is undefined behaviour. And if your program were single-threaded,
> it would have worked fine.

Thomas, this is a very thoughtful contribution. Thanks.
We agree, except for the snippets above.

Simultaneous access to the single byte was never an issue.
IBM kept trying to move the discussion in that direction ... that I've
altered a single byte without using a mutex, but in fact, on the
PowerPC
this is perfectly allowable and safe.

The issue was that in the process of modifying that single byte, they
also modified a volatile int that we did not intend to be modified.

That is the crux of this problem. Not atomic updates to a single
byte,
but the alteration of adjacent fields.

Peter Dimov

unread,
May 1, 2007, 2:11:51 PM5/1/07
to
On May 1, 7:49 pm, m...@pulsesoft.com wrote:
> Peter Dimov <pdi...@gmail.com> writes:

> > You are using reserved identifiers, by the way. The compiler is
> > allowed to have a macro _M_incr or _Tag and break your code.
>
> Care to cite relevant part of standard on this? This indeed violates
> rules imposed by _C_ standard, but last time i tried to read relevant
> parts of C++ draft i remember only identifiers with double underscore
> (regardless of position) being reserved.

"Each name that contains a double underscore _ _ or begins with an
underscore followed by an uppercase letter (2.11) is reserved to the
implementation for any use." - 17.4.3.1.2 [global.names].

I didn't realize that the code originated from STLPort, though. It
makes sense for a standard library replacement to use such a
convention. Sorry for the distraction.

David Schwartz

unread,
May 1, 2007, 2:18:21 PM5/1/07
to
On Apr 30, 8:40 pm, "dick.dunbar" <dick.dun...@gmail.com> wrote:

> On Apr 30, 6:53 pm, David Schwartz <dav...@webmaster.com> wrote:

> > Umm, *you* made '_M_ref_count' and '_M_tag' parts of the same storage
> > object. How is it the compiler's fault that this is not what you
> > meant?

> Huh. I meant EXACTLY that. Class objects in C++ are an abstraction.
> We "lump things together" that it makes sense to, from the description
> of the problem space we're defining.

No, that's not what you meant. You meant to be able to modify one
without having any effect on the other. Again, consider:

char a, b;

Is it reasonable to expect the compiler to be able to modify 'a'
without reading and writing back 'b'?

The compiler is violating a dependency that it cannot possibly know
about. How is that the compiler's fault?

> > Suppose you had a platform that did not have atomic 8-bit operations,
> > but you did this:
> > volatile char a;
> > char b;
>
> Let's not get into wild "Suppose" speculations here.
> The PowerPC does not have 8-bit atomic operations, except by clever
> usage of lwarx/stwcx.

You are missing my point. The issue is whether or not your code is
guaranteed to work. If perfectly analogous code on another platform is
not guaranteed to work then your code is not guaranteed to work on
your platform. Either you have a guarantee or you do not.

> And no atomic modification of the 8-bit char object was expressed or
> implied in the source code. We meant for a single byte to be modified
> to a specific value when we know we are the only owner of such an
> object.

Except that you had a dependency that the compiler did not know about.
You could have told the compiler about that dependency (by using
mutexes or separate storage objects) but you did not.

> > What could 'a=8;' do but read in both a and b, modify the 'a' part,
> > and write it back? What could 'b=8;' do but read in both a and b,
> > modify the 'b' part, and write it back?

> On PowerPC, it would:
> li R0,1 // Load Immediate into Register 0 the value of 1
> stb R0,address // Store that single byte into the field specified by
> the source

This isn't a PowerPC question. This is a compiler question. We are
asking about what the compiler is required to do, not what it could
do. Yes, if the compiler somehow knew about this secret dependency, it
is possible for it to do the right thing. But there are cases where
even if it did, it could not do the right thing, so it can't be
required to.

> > Even if the platform has small atomic operations, it is never required
> > to use them. That's why you *had* to use atomic operations for
> > _M_ref_count. Surely the platform wasn't required to somehow determine
> > that it was important to you that it do so for _M_tag too?

> I didn't expect it to do any atomic operations on _M_tag at all.
> This is part of making an application "thread efficient".

Yes, you did expect it to.

> The problem is the compiler modified two fields when only one was
> specified
> in our source code. I expect it to modify the single byte specified
> by the
> source, without modifying anything else. And to do this non-
> atomicly.
> Atomic update is not required at all.

That's what it did, so why are you complaining? *YOU* modified the
other byte, not the compiler. (And you did so without following the
rules.)

> If every field in a threaded program required mutex or atomic
> semantics
> for modification, nothing would work efficiently.

I'm afraid that is the requirement, at least, in an platform-neutral
way. There might be ways you can make optimizations for particular
platforms, but you can only do so where they are known to be safe for
a particular platform. You are outside the standards here.

> > Sorry, it's your bug. You assumed that manipulating one part of an
> > object wouldn't affect another part. In analogous cases, it is clear
> > that this will never be the case. (Consider successive character
> > variables on a platform that has no 8-bit atomic operations.)

> > So you relied on something that was not guaranteed by anything.

> Exactly what do you think I relied on? That the compiler should
> alter
> the byte I specified and nothing else. Oh yes, I rely on that every
> day.

The compiler altered the byte you specified and nothing else. The
problem is, *you* modified another byte in the same storage object
while the compiler was accessing it.

You place the blame on the compiler, but it takes two to tango. You
did something that affected what the compiler was doing, and the
combination was unsafe. What the compiler did is perfectly safe in
isolation and there is no rule that prohibits the compiler from doing
it.

> > DO NOT EVER DO THAT. The optimization is not worth the ultimate
> > suffering when the assumption turns out not to be true on some other
> > platform or with some other compiler.

> This not an optimization. Our code works exactly as we intended on
> AIX 32-bit
> and every other platform in 64-bit. The implication is that we did
> something
> wrong in our source. I don't believe it.

Don't believe it, but it's true.

The compiler can do anything it wants that doesn't broke code that
compiles with the standards. You failed to comply with the standards
by not protecting an access with mutexes.

Again, if you did the same thing with two character objects on a
platform that didn't have atomic operations on a single byte, would
you still insist your code what valid? If not, then why is it
guaranteed to work on your platform? Because it is possible for the
compiler to make it work if it knew what you were doing, which is
doesn't?

> > If you needed to do this as an optimization because of a platform
> > where it was proven to be required and safe, why did you enable it on
> > *this* platform?

> Sorry, you completely lost me. We made no such assumptions for AIX.
> We didn't enable anything specific for this platform.

That's your problem. You are making platform-specific optimizations as
if they were portable.

> And, you make the same argument that I've heard from a number of IBM
> contacts:
>
> "If your code doesn't work in a threaded environment, it must be
> something you
> did wrong, because our compiler doesn't make any mistakes."
>
> Wow. Talk about naiive.

I have identified a specific defect in your code that is the cause of
your problem. How is that the same as saying that there must be a
defect in your code without finding one?

Your code is broken. Really.

DS

David Schwartz

unread,
May 1, 2007, 2:20:06 PM5/1/07
to
On May 1, 10:49 am, "dick.dunbar" <dick.dun...@gmail.com> wrote:

> If you were to explain the error that I have made, I'd be happy to
> adjust.

The error is really this simple -- you modify an object in one thread
while another thread is or might be accessing that same storage object
and you haven't protected the accesses with mutexes. This is what the
pthreads standard requires. The fix is either to use mutexes or to put
the two variables in separate storage objects.

DS

David Schwartz

unread,
May 1, 2007, 2:25:27 PM5/1/07
to
On May 1, 4:48 am, Gil Hamilton <gil_hamil...@hotmail.com> wrote:

> Frankly, I don't understand why a compiler implementer would ever allow a
> volatile to share an "atomic" data container with another variable at
> all. The way to handle this in the compiler is simply to align the bit
> field (or whatever the next variable is) to the next atomic boundary.
> Structure alignment padding is something programmers know to expect and
> account for while having a volatile loaded up along with another field
> and stored back under the covers violates the law of programmer
> expectations. It's just a stupid thing for the compiler developer to do.

Consider:

volatile char array[1024];

Suppose this platform needs separate storage objects to be at least 64-
bits. If the compiler puts each of my array entries in a separate
storage object, this code will break:

strcpy(array, "hello there");

But if it doesn't, then this code will break because it may affect
neighboring volatiles:

array[52]=0;

How is the compiler supposed to know what I'm going to do with the
array when it lays it out in memory? And what if the 'volatile'
appears in the form of a cast?

DS

David Schwartz

unread,
May 1, 2007, 2:27:07 PM5/1/07
to
On May 1, 9:11 am, Peter Dimov <pdi...@gmail.com> wrote:

> Theirs because of the volatile, as Gil Hamilton pointed out.

Then explain to me what:

volatile char foo[1024];
foo[1]=0;

is supposed to do on a platform that only has 32-bit operations.

DS

dick.dunbar

unread,
May 1, 2007, 2:31:21 PM5/1/07
to

Not exactly ... my source modified one field, and the compiler
modified two.
I have no problem with the field that was modified.

We know our code. The sample program conveniently lumps everything
together so you can see the issue of the generated code.

The byte alteration is made in an entirely different part of the code,
where
we know we "own" the object, and we're only leaving bread-crumbs
behind when another atomic decrement occurs to take the count to zero.

Please review my generated code examples.
Without changing our code at all to involve mutexes you seem to think
are
essential, the code now works perfectly ... and far more efficiently.

We got two fixes with one simple declaration change ...
thread safety and thread efficiency ... 32 and 64-bit.

I just don't know how to factor in your suggestions.


dick.dunbar

unread,
May 1, 2007, 2:38:13 PM5/1/07
to
On May 1, 11:18 am, David Schwartz <dav...@webmaster.com> wrote:
> Your code is broken. Really.

Our code is now un-broken on AIX (Really) with a simple declaration
change.
_Tag T:8;
char T;

The code is also unbroken on every other platform: win, linux, hpux,
solaris, aix


Markus Elfring

unread,
May 1, 2007, 3:07:29 PM5/1/07
to
> Our code is now un-broken on AIX (Really) with a simple declaration change.
> _Tag T:8;
> char T;

Do you know if this variable or any adjacent members of your data structure will
be simultaneously accessed by multiple threads in your programs?

Regards,
Markus

David Schwartz

unread,
May 1, 2007, 3:53:29 PM5/1/07
to

There you go. That's the way non-portable code works, you have to
tweak it for each new platform.

Your mistake was in not realizing that the code wasn't portable and
not having a portable version to use for platforms where you didn't
know that the optimization was safe.

DS

Peter Dimov

unread,
May 1, 2007, 3:58:35 PM5/1/07
to

I see two possibilities.

1. Define char as 32 bits.
2. Violate the strict interpretation of the standard and write to
foo[0], foo[2] and foo[3].

Our case is less problematic, though. The compiler can do the right
thing, it just chooses not to.

Peter Dimov

unread,
May 1, 2007, 4:17:00 PM5/1/07
to

_M_ref_count and _M_tag are separate objects. They are both subobjects
of NQNString which is the "complete object" in this case. Interpreting
the pthreads standard - which talks about "memory locations" - to only
apply to complete objects outlaws many sensible use cases and seems
contrary to its intent.

The problem here is that the compiler is inventing conflicting writes
to a memory location behind the programmer's back. The pthreads
standard doesn't, and cannot, say what the programmer should do in
this case except complain to the compiler vendor. This will hopefully
be clarified on the C++ side in a few years.

dick.dunbar

unread,
May 1, 2007, 4:54:12 PM5/1/07
to

Of course. The RefPointer is atomicly updated by threads all the
time.

The _Tag byte is modified, without conflict among threads, as a way to
leave
state. It is fine if two threads simultaneously modify this field,
and
we don't care who wins the race. It represents work to be done.

The only time this flag will be accessed is when the RefCount goes to
zero.
Only a single thread will see this action, and the result is to
destroy the object
this structure represents.

And that was the difficult thing to find ... that this object should
be referenced
by a thread _after_ it had been destroyed. No clear pattern of
failure, since
the memory once owned by that object could get reused.

And the reason the count went to zero is the compiler modified this
field "behind our back".
Studying the code produced no clues to this fact. Lots of hard work
and very fast
machines was the only way to focus in on the failing component.

Reference Counted objects are a well known, well understood, well
behaved
design pattern in the C++ world.

Marcin 'Qrczak' Kowalczyk

unread,
May 1, 2007, 6:44:33 PM5/1/07
to
Dnia 01-05-2007, wto o godzinie 11:25 -0700, David Schwartz napisał(a):

> volatile char array[1024];
>
> Suppose this platform needs separate storage objects to be at least 64-
> bits. If the compiler puts each of my array entries in a separate
> storage object, this code will break:
>
> strcpy(array, "hello there");

It's not supposed to work because strcpy takes a non-volatile pointer.

--
__("< Marcin Kowalczyk
\__/ qrc...@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/

Chris Thomasson

unread,
May 1, 2007, 7:22:29 PM5/1/07
to
"dick.dunbar" <dick....@gmail.com> wrote in message
news:1177960498.3...@o5g2000hsb.googlegroups.com...
> Boolean Buyers Beware ... AIX compiler bug --- PMR 26241,756.

[...]
> The Bug:
> -------
> 1. ref_count is incremented/decremented consistently using AIX atomic
> ops

[...]

> void *ThreadFunc(void *threadid) {
> for (int i = 0; i < 100000000; ++i) {
> gs._M_incr(); // atomicly increment
> gs._M_tag=NQNString::_S_concat; // set it to one
> if(gs._M_decr() == 0 ) ++*(int*)0; // ref_count should never go to
> zero!
> }
> return 0;
> }


> So what do you think?

Humm... You know, I hesitated for a moment before I posted this query
because I don't really think it applies to your particular scenario, but I
just felt the need to mention it anyway! Lets put it this way, I hope that I
am wrong and your reference counting algorithm is correct!

So please try to bear with me here...

;^)


> Our bug or theirs?

Well, before I answer that, I have one very quick question:

------------------------------------
Does any thread (e.g., ThreadFunc) in your application have the ability to
increment the reference count of any shared instance of NQNString (e.g.,
'gs' in your example), if it does not already own a previous reference?
------------------------------------

If so, you just may be experiencing a fairly nasty race-condition wrt the
reference counting algorithm itself. As you probably know, you must use some
form of atomic reference counting algorithm whenever a thread needs to
increment the reference count of any shared object that it does NOT
previously own a reference to. Please note that "atomically thread-safe"
reference counting is completely different than "normal thread-safe"
counting. For instance, Boost's "current" shared_ptr algorithm cannot be
used as a "standalone" solution for this because it's only a normal
thread-safe counted pointer. In order to make correct use of Boost
shared_ptr, a thread MUST own a reference to a shared data-structure BEFORE
it tries to acquire another one.


If you want a thread to be able to grab a reference to a object that it does
not own, you need to use some form of atomic counting. FWIW, here are some
links to some full-blown atomic counted pointers:

http://atomic-ptr-plus.sourceforge.net/

http://appcore.home.comcast.net/vzoom/refcount/


Any thoughts?

P.S.

If this is not applicable to your situation, please forgive me for wasting
your time!

:^0

rfis...@gmail.com

unread,
May 1, 2007, 7:46:05 PM5/1/07
to
dick.dunbar ha scritto:

> > Why not just use a 1 byte int, something like u_int8_t from types.h?
>
> Indeed it is the problem. See my 3 code examples.
>
> The reason for using "_Tag" is to clearly indicate this field will
> only
> contain the values of an enumeration. Valuable to know when you're
> reading code.

A comment could do that.
char _M_tag; // assumes values of enum _Tag.

[...]

> There's a compiler option "-qenum=1", but obviously we can't adopt
> that
> for our whole application ... and then there are these other platforms
> to worry about. _Tag T:8 is the clearest, most uniform way to
> express the intent.

I disagree - a char + comment expresses the intent just as well. This
problem
was an excruciating intersection of standard haziness, enum size,
bit-field layout/manipulation, multithreading and an IBM compiler for
powerpc.
Knowing what you know now you're going to have to put an essay sized
comment
in anyway, so why not use a byte and remove the possibility once and
for all
of anything like this happening ever again? I may seem over-
conservative,
but in fact I'm just superstitious. That bit-field caused so much
trouble that
I'd go as far as saying that it's malign, even evil. It's cursed code,
inauspicious, and ill-boding. It's as unlucky as women on ships.
And it needs to be cleansed. With fire.

> > P.S. I'm pretty sure it was Bjarne who said bitfields
> > were for "idiots and children".
>
> Bjarne Stroustrup, C++ Programming Language, Special Edition.

[... I love bit-fields ...]

That would be some backflip! Ok, in that case it was, uh, one of
Jon Bentley's pearls. Or was it Dennis Ritchie? Ken Thompson?

RF

--
felix: lightweight threads, HOFs and game scripting.
svn co https://svn.sourceforge.net/svnroot/felix/felix/trunk felix

dick.dunbar

unread,
May 2, 2007, 1:12:24 AM5/2/07
to
On May 1, 4:46 pm, "rfist...@gmail.com" <rfist...@gmail.com> wrote:
> dick.dunbar ha scritto:
>
> A comment could do that.
> char _M_tag; // assumes values of enum _Tag.

Of course.

> > There's a compiler option "-qenum=1", but obviously we can't adopt
> > that
> > for our whole application ... and then there are these other platforms
> > to worry about. _Tag T:8 is the clearest, most uniform way to
> > express the intent.
>
> I disagree - a char + comment expresses the intent just as well.

Well, yes ... but that's only when you know the answer to the problem.
I was giving credit to the STLport author, for using the
most obvious construct for the task, which previously worked fine.

I just hate coding to the personality of some frankenstein compiler.
In this case, it was pretty easy to "just give up" because it also
produced a huge performance boost.

> Knowing what you know now you're going to have to put an essay sized comment
> in anyway, so why not use a byte and remove the possibility once and for all
> of anything like this happening ever again? I may seem over- conservative,

That's exactly what we've chosen to do; use char.
I closed my open issue with IBM today.

> And it needs to be cleansed. With fire.

I think the exorcism you seek is for the IBM compiler.


dick.dunbar

unread,
May 2, 2007, 1:31:33 AM5/2/07
to
On May 1, 4:22 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
> "dick.dunbar" <dick.dun...@gmail.com> wrote in message

> ------------------------------------
> Does any thread (e.g., ThreadFunc) in your application have the ability to
> increment the reference count of any shared instance of NQNString (e.g.,
> 'gs' in your example), if it does not already own a previous reference?
> ------------------------------------

Yes. It always uses the member function to do so, which is an atomic
update.

>
> If so, you just may be experiencing a fairly nasty race-condition wrt the
> reference counting algorithm itself.

You're correct that it's a race-condition, but it was not from our
algorithm.
We always update the counter atomicly; the compiler updated the
counter
without our knowledge ... we were storing a constant into a single
byte
in an adjacent word. Who would guess that would alter RefCounter.
The only way to discover such things is to be able to read assembler.

In fact, I should probably disclose that we don't use the AIX
fetch_and_add
routines at all. I wrote inline assembly functions to Increment/
Decrement
an atomic pointer by 1.

Consider this: You want to update a counter.
On PowerPC, that would be load/add/store instructions.
Now you want to do this atomicly as fast as possible, and it is just
two more instructions if you use assembler. Here are the functions:

#pragma mc_func AtomicInc { "7c641b78" "7c602028" "38630001"
"7c60212d" "4082fff4" }
#pragma reg_killed_by AtomicInc gr3-gr4, cr0
#pragma mc_func AtomicDec { "7c641b78" "7c602028" "3863ffff"
"7c60212d" "4082fff4" }
#pragma reg_killed_by AtomicDec gr3-gr4, cr0

> If this is not applicable to your situation, please forgive me for wasting your time!

Not a waste of time. Open dialog was what I was seeking.

Markus Elfring

unread,
May 2, 2007, 5:13:39 AM5/2/07
to
> The _Tag byte is modified, without conflict among threads, as a way
> to leave state. It is fine if two threads simultaneously modify
> this field, and we don't care who wins the race.

Do you really want to ignore race conditions at this place?


> It represents work to be done.

Are you going to add any synchronisation for bigger memory regions?


> And that was the difficult thing to find ... that this object should
> be referenced by a thread _after_ it had been destroyed.
> No clear pattern of failure, since the memory once owned by that
> object could get reused.
>
> And the reason the count went to zero is the compiler modified this
> field "behind our back".

Would you like to add any checks or a guarantee to protect against unexpected
updates for several fields in the same data structure?
Where is a "mass update" of a few places (of word size) in memory quicker than
the write access for a single byte?
Are the issues between performance/efficiency and correct concurrent behaviour
resolved in your use case for all target platforms now?


> Reference Counted objects are a well known, well understood, well
> behaved design pattern in the C++ world.

It seems that a few special cases must still be considered in their implementations.

Regards,
Markus

rfis...@gmail.com

unread,
May 2, 2007, 6:18:24 AM5/2/07
to

dick.dunbar ha scritto:

Yay! Don't forget the big fat comment!

rfis...@gmail.com

unread,
May 2, 2007, 6:23:19 AM5/2/07
to
David Schwartz ha scritto:

> On May 1, 11:38 am, "dick.dunbar" <dick.dun...@gmail.com> wrote:
> > On May 1, 11:18 am, David Schwartz <dav...@webmaster.com> wrote:
> >
> > > Your code is broken. Really.
> >
> > Our code is now un-broken on AIX (Really) with a simple declaration
> > change.
> > _Tag T:8;
> > char T;
> >
> > The code is also unbroken on every other platform: win, linux, hpux,
> > solaris, aix
>
> There you go. That's the way non-portable code works, you have to
> tweak it for each new platform.

But wait, the tweak (->char) is portable now, isn't it?
Or maybe not, we're still talking about threads without mutexes,
so "portable" probably isn't the right word to use. Could we say
that the char version presents fewer surprises?
At the very least it's luckier, right?

Chris Thomasson

unread,
May 2, 2007, 6:37:38 AM5/2/07
to
"dick.dunbar" <dick....@gmail.com> wrote in message
news:1178083893....@c35g2000hsg.googlegroups.com...

> On May 1, 4:22 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>> "dick.dunbar" <dick.dun...@gmail.com> wrote in message
>
>> ------------------------------------
>> Does any thread (e.g., ThreadFunc) in your application have the ability
>> to
>> increment the reference count of any shared instance of NQNString (e.g.,
>> 'gs' in your example), if it does not already own a previous reference?
>> ------------------------------------
>
> Yes. It always uses the member function to do so, which is an atomic
> update.

Well, FWIW, using fetch_and_add alone is not strong enough to implement
atomically thread-safe reference counting. You could use fetch_and_add to
create a thread-safe counter like Boost shared_ptr, however, you can't use
it as-is to create an atomically thread-safe counter...


rfis...@gmail.com

unread,
May 2, 2007, 7:49:03 AM5/2/07
to
dick.dunbar ha scritto:

> > I disagree - a char + comment expresses the intent just as well.
>
> Well, yes ... but that's only when you know the answer to the problem.

Yah, sorry. 20-20 hindsight speaking here.

> I was giving credit to the STLport author, for using the
> most obvious construct for the task, which previously worked fine.

I'd've tracked the author down and asked for damages for
the several weeks of my life flushed away or at least tried
to convince him as a library writer to write more
conservative code. However, you were pushing it quite a bit
expecting the library to just happen to be threadsafe.
Or did it purport to be threadsafe?

Bit-fields are one of those things that I see and especially
use, so rarely that I'd have to look up the notation, so to
me they're not an obvious solution. I didn't even know you
could declare an enum type as a bitfield. Is that the only
occurence of such a thing in the code? Seeing exactly one
occurence of something outlandish in a large code-base is
a red-flag. Bad juju fer sure.

> > Knowing what you know now you're going to have to put an essay sized comment
> > in anyway, so why not use a byte and remove the possibility once and for all
> > of anything like this happening ever again? I may seem over- conservative,
>
> That's exactly what we've chosen to do; use char.
> I closed my open issue with IBM today.
>
> > And it needs to be cleansed. With fire.
>
> I think the exorcism you seek is for the IBM compiler.

Dunno, personally I'd never've run into that problem, I'm too middle
of the road.
Interesting thread though!

dick.dunbar

unread,
May 2, 2007, 12:09:17 PM5/2/07
to
On May 2, 2:13 am, Markus Elfring <Markus.Elfr...@web.de> wrote:

> It seems that a few special cases must still be considered in their implementations.


You mean, like, checking for a buggy compiler? Sure, I agree it's a
special case :-)

The focus still seems to be on our thread designs. Honestly, they
work fine.
We don't need help in this area. The thrust of this exchange was to
alert others
to the bizarre behaviour of a single compiler in a special situation.
This error never would have happened in a 32-bit world, because the
BITCONTAINERS and the Atomic fields are both 32-bits.
No chance to destroy an adjacent word by an update.

Problem solved. Our application runs as expected, with no race
conditions
and no unexpected behaviour related to threading.

dick.dunbar

unread,
May 2, 2007, 12:18:37 PM5/2/07
to
On May 2, 3:23 am, "rfist...@gmail.com" <rfist...@gmail.com> wrote:
> David Schwartz ha scritto:

> But wait, the tweak (->char) is portable now, isn't it?
> Or maybe not, we're still talking about threads without mutexes,
> so "portable" probably isn't the right word to use. Could we say
> that the char version presents fewer surprises?
> At the very least it's luckier, right?

It is portable, and it accepts an assignment of an enumeration, so no
code changes are required to add (cast)s.

I would say that the C/C++ syntax is "overloaded" by usage.

char c;
char c:8;

They're the same size, but they have different semantics to some
compilers.
IBM's compiler thinks we want to deal with bits ... our code only
intended
to set a particular size for an enumeration.

Assignment to any of these objects would have triggered the failure.

char c:8;
int i:8;
bool b:8;
_Tag t:8;
size_t s:16; // STLport also uses this form of declaration.

And, it is only a problem in a 64-bit arena.

We made the change to "trick" one compiler; thankfully,
the other compilers know how to deal with the trick (it's portable).


dick.dunbar

unread,
May 2, 2007, 12:25:20 PM5/2/07
to
On May 2, 3:37 am, "Chris Thomasson" <cris...@comcast.net> wrote:

> using fetch_and_add ... can't be used as-is to create an atomically thread-safe counter...

Ok, I'm intrigued ... why do you make this claim?

fetch_and_add(int* data, value)

will use these PowerPC instructions:

a: lwarx rx, data // load and reserve
add rx, value // add the value (1 or -1)
stwcx. rx, data // store and release
bne a: // loop back if reservation is lost

The loop continues until our thread is the last one to touch the data
address.

dick.dunbar

unread,
May 2, 2007, 12:44:22 PM5/2/07
to
On May 2, 4:49 am, "rfist...@gmail.com" <rfist...@gmail.com> wrote:


> > I was giving credit to the STLport author,.


>
> I'd've tracked the author down and asked for damages

Damages? on OpenSource? The damage was caused by the IBM compiler.

> library writer to write more conservative code.

I fail to see how the original author could have anticipated a bug in
a
specific compiler running 64-bit.

> expecting the library to just happen to be threadsafe.

It is thread safe ... but not without a lot of work on our end
to plug some race conditions. The failure wasn't one of those.

> Or did it purport to be threadsafe?
>
> Bit-fields are one of those things that I see and especially
> use, so rarely that I'd have to look up the notation, so to
> me they're not an obvious solution.

My background lends itself to bit twiddling, but I was surprised
when we found the error related to manipulation of bits
when none was intended by the code.


> I didn't even know you could declare an enum type as a bitfield.

Neither did I. I had to look up the syntax again, because I'd never
seen it used in that context. That's what I mean about an overloaded
usage meaning ... the author intended to specify object size, not
bit semantics.

> Is that the only occurence of such a thing in the code? Seeing exactly one
> occurence of something outlandish in a large code-base is
> a red-flag. Bad juju fer sure.

This point cranks my engine. In any large code base, written by
hundreds of programmers, and adopting multiple styles from Open Source
will, without a doubt, produce anomolies.

It is _impossible_ for me to read C++ code and understand what it
does.
My only recourse is to rely on compiler generated code to see what
bizzare
class warfare has prepared for me today.

When the goal is to get product (function, business value) out the
door,
and make money at it, then academic quibbles over coding style
will go unheeded. They have to, or we'd never ship anything.

"If it works, ship it."
"If you claim it doesn't work, prove it with a test case."

That's the reality in a commercial software world.
We delayed shipment by more than a month because of this bug.
That's lost revenue.

The backup plan was to withhold the 64-bit AIX version.
No customer would put up with a 10-hour mean-time-to-failure.
That's lost customer confidence.

David Schwartz

unread,
May 2, 2007, 12:34:47 PM5/2/07
to
On May 1, 3:44 pm, Marcin 'Qrczak' Kowalczyk <qrc...@knm.org.pl>
wrote:

> > strcpy(array, "hello there");

> It's not supposed to work because strcpy takes a non-volatile pointer.

That really doesn't matter. You can write your own 'strcpy' that takes
a 'volatile' pointer and you'll have the same problem. This would also
make casting to or from 'volatile' an undefined operation (because the
packaging would be different).

My point stands -- a volatile object cannot be packed differently from
a non-volatile object without breaking tons of code that currently
works.

A 'volatile' char can't be packed so as to prevent conflict with
adjacent volatile objects, so it cannot be required for any type to do
so.

DS

David Schwartz

unread,
May 2, 2007, 12:38:19 PM5/2/07
to
On May 2, 9:25 am, "dick.dunbar" <dick.dun...@gmail.com> wrote:

> Ok, I'm intrigued ... why do you make this claim?
>
> fetch_and_add(int* data, value)
>
> will use these PowerPC instructions:
>
> a: lwarx rx, data // load and reserve
> add rx, value // add the value (1 or -1)
> stwcx. rx, data // store and release
> bne a: // loop back if reservation is lost

Your use of the phrase "will use" strongly suggests that you have not
learned anything from this experience and fully intend to continue to
write broken code. That is kind of sad.

DS

dick.dunbar

unread,
May 2, 2007, 1:08:37 PM5/2/07
to
On May 2, 9:38 am, David Schwartz <dav...@webmaster.com> wrote:

> That is kind of sad.

Indeed, but cheer up.
I'd rather be President; there are no consequences for being wrong.


David Hopwood

unread,
May 2, 2007, 8:49:30 PM5/2/07
to
Thomas Richter wrote:
> dick.dunbar <dick....@gmail.com> wrote:
>
>>>I don't think the C++ standard has a word to say about threads. Note
>>>that a C++ compiler is always free to operate on an "as if" basis, i.e.
>>>its implementation might do whatever it wants as long as the observed
>>>result is the same. And as C++ only addresses single-threaded programs,
>>>observed behaviour is only defined in this sense. Thus, your program
>>>has been proven to work correctly for single-threaded by the compiler,
>>>and thus the compiler is fine.
>>
>>I'm still flabbergasted that reasonable people come to this
>>conclusion.

Me too. It's as though some people think that because multi-threaded C++
is not defined by any Standard, that means that we cannot say anything
about what it is *reasonable* for a compiler to do.

>>Note: The problem isn't confined to threaded programs, if you were to
>>build a 64-bit device driver that modified adapter storage in a
>>particular way, this would be completely unexpected that the compiler
>>should modify two fields when one was specified.
>
> And again, the standard says nothing about that. If you access any
> I/O devices thru pointers, you're up to yourself and what your
> vendor guarantees. The virtual machine defined by the C++ standard
> again doesn't say a word about I/O devices.

Section 7.1.5.1: "In general, the semantics of volatile are intended
to be the same in C++ as they are in C."

Everyone knows how volatile is supposed to work in C; it's supposed to
inhibit any compiler optimizations that affect how a volatile-qualified
object is accessed -- including the optimization that is at issue in this
thread. It is true that the only way you can observe such optimizations
is by non-strictly-conforming code, but so what? The whole point of
"volatile" is to constrain the *implementation* of accesses to volatile
objects, so that non-strictly-conforming programs can rely on those
constraints.

> You *may* get around
> this problem then by "hinting" the compiler by the "volatile" keyword,
> but again, you're still in "undefined" land.

No, implementation-defined. The vendor is supposed to document what
constitutes a volatile access. So if IBM have explicitly documented
somewhere that a volatile object can be written when a nearby
(but not overlapping) object is accessed, then they're following the
standard; otherwise they are violating it. But in the former case,
I think many C and C++ programmers would not be at all happy with that
implementation.

--
David Hopwood <david....@industrial-designers.co.uk>

David Hopwood

unread,
May 2, 2007, 9:00:03 PM5/2/07
to
David Schwartz wrote:
> On May 1, 9:11 am, Peter Dimov <pdi...@gmail.com> wrote:
>
>>Theirs because of the volatile, as Gil Hamilton pointed out.
>
> Then explain to me what:
>
> volatile char foo[1024];
> foo[1]=0;
>
> is supposed to do on a platform that only has 32-bit operations.

Whatever the required platform documentation as to "what constitutes an
access to an object that has volatile-qualified type" says.

--
David Hopwood <david....@industrial-designers.co.uk>

David Hopwood

unread,
May 3, 2007, 12:37:25 PM5/3/07
to
dick.dunbar wrote:
> On Apr 30, 6:53 pm, David Schwartz <dav...@webmaster.com> wrote:
>
>>Sorry, it's your bug. You assumed that manipulating one part of an
>>object wouldn't affect another part. In analogous cases, it is clear
>>that this will never be the case. (Consider successive character
>>variables on a platform that has no 8-bit atomic operations.)
>>
>>So you relied on something that was not guaranteed by anything.
>
> Exactly what do you think I relied on? That the compiler should
> alter the byte I specified and nothing else.

More precisely: and no other volatile-qualified objects.

I found the AIX documentation of "what constitutes an access to a
volatile-qualified object", such as it is:

<http://www-1.ibm.com/support/docview.wss?uid=swg27005417>

# Abstract
# The C/C++ Language Reference describes the syntax, semantics, and
# IBM implementation of the C and C++ programming languages.
[...]
# The volatile Type Qualifier
#
# The volatile qualifier maintains consistency of memory access to data
# objects. Volatile objects are read from memory each time their value
# is needed, and written back to memory each time they are changed. The
# volatile qualifier declares a data object that can have its value
# changed in ways outside the control or detection of the compiler (such
# as a variable updated by the system clock). The compiler is thereby
# notified not to apply certain optimizations to code referring to the
# object.
#
# Accessing any lvalue expression that is volatile-qualified produces a
# side effect. A side effect means that the state of the execution
# environment changes.

If there were any exception allowing an access to a nonvolatile object
to affect nearby volatile fields, then this documentation would have to
mention it. Since it doesn't, the behaviour is an implementation bug.

--
David Hopwood <david....@industrial-designers.co.uk>

David Hopwood

unread,
May 3, 2007, 12:45:45 PM5/3/07
to
David Schwartz wrote:
> On May 1, 10:49 am, "dick.dunbar" <dick.dun...@gmail.com> wrote:
>
>>If you were to explain the error that I have made, I'd be happy to
>>adjust.
>
> The error is really this simple -- you modify an object in one thread
> while another thread is or might be accessing that same storage object
> and you haven't protected the accesses with mutexes. This is what the
> pthreads standard requires.

True, but beside the point, since this implementation behaviour would be
a bug for single-threaded code given the 'volatile' declaration.

--
David Hopwood <david....@industrial-designers.co.uk>

dick.dunbar

unread,
May 3, 2007, 1:56:50 PM5/3/07
to
On May 3, 9:37 am, David Hopwood <david.hopw...@industrial-
designers.co.uk> wrote:

> > Exactly what do you think I relied on? That the compiler should
> > alter the byte I specified and nothing else.
>
> More precisely: and no other volatile-qualified objects.
>

> <http://www-1.ibm.com/support/docview.wss?uid=swg27005417>

> # The volatile Type Qualifier
> #
> # The volatile qualifier maintains consistency of memory access to data
> # objects. Volatile objects are read from memory each time their value
> # is needed, and written back to memory each time they are changed. The
> # volatile qualifier declares a data object that can have its value
> # changed in ways outside the control or detection of the compiler (such
> # as a variable updated by the system clock). The compiler is thereby
> # notified not to apply certain optimizations to code referring to the
> # object.

The version 8 language manual, page 56-57 has similar (more narrow)
language.
http://www.elink.ibmlink.ibm.com/publications/servlet/pbi.wss?CTY=US&FNC=SRX&PBL=SC09-8004-00#
------
The volatile qualifier declares a data object that can have its value


changed in ways outside the control or detection of the compiler (such

as a variable updated by the system clock or by another program). This
prevents the compiler from optimizing code referring to the object by
storing the object's value in a register and re-reading it from there,
rather than from memory, where it may have changed.
-------
The V7 sentence is changed.
# The compiler is thereby notified not to apply "certain
optimizations" to code referring to the object.

Acknowledging that a volatile variable could be "changed in ways
outside the control/detection of the compiler" gives the impression
that side-effect modification of this variable using a <dubious>
optimization might be problematic.

Thanks

<aside> English is a hopeless language to describe computer
expectations.
I guess that's why we have so many lawyers in the USA.
How certain are we that IBM knew what they were doing when they wrote
the V7 manual? :-)

http://tinyurl.com/268j7z

* certain(a): definite but not specified or identified; "set aside a
certain sum each week"; "to a certain degree"; "certain members have
not paid their dues"; "a certain Mrs. Jones"

* certain(p): having or feeling no doubt or uncertainty; confident and
assured; "felt certain of success"; "was sure (or certain) she had
seen it";

* certain(p): established beyond doubt or question; definitely known;
"what is certain is that every effect must have a cause"; "it is
certain that they were on the bus"; "his fate is certain"; "the date
for the invasion is certain"

* certain to occur; destined or inevitable; "he was certain to fail";
"his fate is certain";
"In this life nothing is certain but death and taxes"- Benjamin
Franklin;

* reliable in operation or effect; "a quick and certain remedy";

* exercising or taking care great enough to bring assurance; "be
certain to disconnect the iron when you are through";

rfis...@gmail.com

unread,
May 3, 2007, 4:00:47 PM5/3/07
to
On 3 Mag, 18:45, David Hopwood <david.hopw...@industrial-

Hey, sorry, my last post went away for some reason. I've come around
to dick's way of thinking. That compiler's just wrong.

I would say the follow topics in this thread are complete red
herrings:
bit-fields, the volatile specifier, c++ standards, and atomic ops.

That leaves a compiler that writes to members when it shouldn't.
Our tacit expectation of thread "reasonableness" in c++ says that
you should be able to use a mutex to serialise access to a single
member, however the compiler shouldn't generate code that modifies
that member when you change another unrelated one. It's like
the famous crossroads here with traffic lights in only one direction
(here=Arezzo): it's just wrong - if it were right the implication
would
be that you can only ever serialise access to the entire class and
not just a part of it.

If I'm wrong about this then I've got an awful lot of code to review.
I hope I'm not wrong.

RF

David Schwartz

unread,
May 3, 2007, 5:30:06 PM5/3/07
to
On May 3, 1:00 pm, "rfist...@gmail.com" <rfist...@gmail.com> wrote:

> Hey, sorry, my last post went away for some reason. I've come around
> to dick's way of thinking. That compiler's just wrong.

Nope, your reasoning is.

> I would say the follow topics in this thread are complete red
> herrings:
> bit-fields, the volatile specifier, c++ standards, and atomic ops.

I agree about that part.

> That leaves a compiler that writes to members when it shouldn't.

There is no reason it shouldn't. For example, if you have:

char j[4];

There is no reason a compiler couldn't optimize 'j[2]=0;' as a 32-bit
fetch for all four values, a bit mask to zero out the third byte, and
then a 32-bit store. If that was faster on that platform, a compiler
*should* do this.

> Our tacit expectation of thread "reasonableness" in c++ says that
> you should be able to use a mutex to serialise access to a single
> member, however the compiler shouldn't generate code that modifies
> that member when you change another unrelated one. It's like
> the famous crossroads here with traffic lights in only one direction
> (here=Arezzo): it's just wrong - if it were right the implication
> would
> be that you can only ever serialise access to the entire class and
> not just a part of it.

That is correct. You can only serialize access to an entire storage
object. Many platforms don't have atomic 8-bit operations or any 8-bit
operations. Consider:

char j[4];
j[2]=0;

Are you seriously arguing that this should be done under a bus lock on
such a platform just on the off chance that another thread might read-
modify-write j[1] at the same time?

> If I'm wrong about this then I've got an awful lot of code to review.
> I hope I'm not wrong.

I'm afraid you are. No standard makes any guarantees about accesses to
one part of a storage object not affecting other parts of that same
storage object against concurrent accesses that the compiler has no
idea about.

This is why reference count classes allocate the reference count as a
separate storage object.

DS

David Schwartz

unread,
May 3, 2007, 5:35:12 PM5/3/07
to
On May 3, 9:45 am, David Hopwood <david.hopw...@industrial-
designers.co.uk> wrote:

> True, but beside the point, since this implementation behaviour would be
> a bug for single-threaded code given the 'volatile' declaration.

So how do you think a platform should handle an allocation like this
one if it can only implement 8-bit operations as 32-bit operations:

char j[8];

Are you seriously arguing that it has to put each character in its own
32-bit storage unit just so that '*(volatile char *)(&j[2])=0;' works
the way you think it should?

Or are you suggesting that
char j[8];
and
volatile char j[8];
should produce different memory layouts and that casting to and from
'volatile' should cause you to access the wrong memory area?

Essentially, you are arguing that a compiler must put every variable
in its own storage unit because any variable might be a target of a
'volatile' operation. (Or that casts to and from 'volatile' can cause
the wrong memory area to be accessed and basically just have to be
illegal.)

I doubt you will find many sane people who agree with you.

DS

Chris Thomasson

unread,
May 3, 2007, 5:42:23 PM5/3/07
to
"dick.dunbar" <dick....@gmail.com> wrote in message
news:1178123120.1...@u30g2000hsc.googlegroups.com...

> On May 2, 3:37 am, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> using fetch_and_add ... can't be used as-is to create an atomically
>> thread-safe counter...
>
> Ok, I'm intrigued ... why do you make this claim?
>
> fetch_and_add(int* data, value)
>
[...]

Well, if you want strong thread-safety (e.g., atomically thread-safe) you
have to design the algorithm to handle the following race-condition:

1. Threads A-C own references to Object A
2. Thread D loads pointer to Object A from a shared location
3. Threads A-C drop references to Object A; therefore, dtor on A
4: Thread D tries to increment count of Object A
5: Thread D goes boom.

"current" shared_ptr cannot handle this situation, normal thread-safe
reference counting using fetch-and-add cant handle this situation.

David Schwartz

unread,
May 3, 2007, 6:49:12 PM5/3/07
to
On May 3, 9:37 am, David Hopwood <david.hopw...@industrial-
designers.co.uk> wrote:

> If there were any exception allowing an access to a nonvolatile object
> to affect nearby volatile fields, then this documentation would have to
> mention it. Since it doesn't, the behaviour is an implementation bug.

I was with you up to this last part. Why would the documentation have
to mention such an exception?

DS


David Schwartz

unread,
May 3, 2007, 6:53:25 PM5/3/07
to
On May 3, 2:42 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

> Well, if you want strong thread-safety (e.g., atomically thread-safe) you
> have to design the algorithm to handle the following race-condition:
>
> 1. Threads A-C own references to Object A
> 2. Thread D loads pointer to Object A from a shared location
> 3. Threads A-C drop references to Object A; therefore, dtor on A
> 4: Thread D tries to increment count of Object A
> 5: Thread D goes boom.

This should be impossible. If an object can be found in a shared
location, then that shared location/structure must have a reference to
that object, a reference it may not release after step '2' but before
step '4'.

A 'shared location' typically provides the following semantics:

1) Make an object visible to other threads. This results in the
location having a reference to the object.

2) Make an object invisible to other threads. This releases the
location's reference.

3) Find an object. This finds an object and gives the caller a
reference to it atomically.

A 'shared location' that doesn't provide atomic 'find and reference'
semantics is broken by design.

DS

David Hopwood

unread,
May 3, 2007, 7:31:25 PM5/3/07
to
David Schwartz wrote:
> No standard makes any guarantees about accesses to
> one part of a storage object not affecting other parts of that same
> storage object against concurrent accesses that the compiler has no
> idea about.
>
> This is why reference count classes allocate the reference count as a
> separate storage object.

What is a "storage object"? I cannot find any reference to such a thing
in C99+TC2.

--
David Hopwood <david....@industrial-designers.co.uk>

David Hopwood

unread,
May 3, 2007, 7:45:03 PM5/3/07
to
David Schwartz wrote:
> On May 3, 9:45 am, David Hopwood <david.hopw...@industrial-
> designers.co.uk> wrote:
>
>>True, but beside the point, since this implementation behaviour would be
>>a bug for single-threaded code given the 'volatile' declaration.
>
> So how do you think a platform should handle an allocation like this
> one if it can only implement 8-bit operations as 32-bit operations:
>
> char j[8];
>
> Are you seriously arguing that it has to put each character in its own
> 32-bit storage unit just so that '*(volatile char *)(&j[2])=0;' works
> the way you think it should?

No, I think that the documentation of that platform must make clear that
"what consitutes an access to a volatile-qualified object" is a read or
write of some 32-bit-aligned word overlapping the object, and not necessarily
of a byte within the object. To be precise, I think that such documentation
is required as a consequence of C99 section 6.7.3.

Also, I think that as a quality-of-implementation matter, only platforms
that *actually* cannot support 8-bit operations should do this. I don't
believe that the existence of such (arguably broken) platforms is intended
as a license for *any* implementation to effectively ignore 'volatile' in
some cases that are clearly within the scope of its intended use.

AIX-on-PowerPC can support independent reads and writes of 8-bit bytes
perfectly well, and it doesn't (AFAICS) have any such documentation.

--
David Hopwood <david....@industrial-designers.co.uk>

David Hopwood

unread,
May 3, 2007, 7:53:14 PM5/3/07
to

C99 6.7.3:
# What constitutes an access to an object that has volatile-qualified type
# is implementation-defined.

C99 3.4.1:
# implementation-defined behavior
# unspecified behavior where each implementation documents how the choice
# is made

--
David Hopwood <david....@industrial-designers.co.uk>

David Schwartz

unread,
May 3, 2007, 9:14:48 PM5/3/07
to
On May 3, 4:53 pm, David Hopwood <david.hopw...@industrial-
designers.co.uk> wrote:

> David Schwartz wrote:

> > On May 3, 9:37 am, David Hopwood <david.hopw...@industrial-
> > designers.co.uk> wrote:

> >>If there were any exception allowing an access to a nonvolatile object
> >>to affect nearby volatile fields, then this documentation would have to
> >>mention it. Since it doesn't, the behaviour is an implementation bug.

> > I was with you up to this last part. Why would the documentation have
> > to mention such an exception?

> C99 6.7.3:
> # What constitutes an access to an object that has volatile-qualified type
> # is implementation-defined.
>
> C99 3.4.1:
> # implementation-defined behavior
> # unspecified behavior where each implementation documents how the choice
> # is made

I don't think it's plausible to read this as mandating that every
possible way to access something that can also be accessed as a
volatile-qualified type must be documented. This would imply that
failure to document a debugger's ability to access such a variable is
a bug, which it obviously is not.

However, I think this is probably the strongest argument available.

DS

Chris Thomasson

unread,
May 4, 2007, 12:37:32 AM5/4/07
to
"David Schwartz" <dav...@webmaster.com> wrote in message
news:1178232805....@y5g2000hsa.googlegroups.com...

> On May 3, 2:42 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> Well, if you want strong thread-safety (e.g., atomically thread-safe) you
>> have to design the algorithm to handle the following race-condition:
>>
>> 1. Threads A-C own references to Object A
>> 2. Thread D loads pointer to Object A from a shared location
>> 3. Threads A-C drop references to Object A; therefore, dtor on A
>> 4: Thread D tries to increment count of Object A
>> 5: Thread D goes boom.
>
> This should be impossible.

It is possible. Read this:

http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220060037026%22.PGNR.&OS=DN/20060037026&RS=DN/20060037026

See?


Chris Thomasson

unread,
May 4, 2007, 12:41:15 AM5/4/07
to
"David Schwartz" <dav...@webmaster.com> wrote in message
news:1178232805....@y5g2000hsa.googlegroups.com...
> On May 3, 2:42 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> Well, if you want strong thread-safety (e.g., atomically thread-safe) you
>> have to design the algorithm to handle the following race-condition:
>>
>> 1. Threads A-C own references to Object A
>> 2. Thread D loads pointer to Object A from a shared location
>> 3. Threads A-C drop references to Object A; therefore, dtor on A
>> 4: Thread D tries to increment count of Object A
>> 5: Thread D goes boom.
>

You need to atomically modify a pointer location in conjunction with a
reference counter location. Seems like you need DCAS, but there are other
algorithms out there.

Please read this:

http://citeseer.ist.psu.edu/cache/papers/cs/25408/http:zSzzSzwww.sun.comzSzresearchzSzpeoplezSzdetlefszSzpaperszSz01-podc.pdf/detlefs01lockfree.pdf

this explains how the race-condition works in more detail.


Chris Thomasson

unread,
May 4, 2007, 12:47:24 AM5/4/07
to
"David Schwartz" <dav...@webmaster.com> wrote in message
news:1178232805....@y5g2000hsa.googlegroups.com...
> On May 3, 2:42 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> Well, if you want strong thread-safety (e.g., atomically thread-safe) you
>> have to design the algorithm to handle the following race-condition:
>>
>> 1. Threads A-C own references to Object A
>> 2. Thread D loads pointer to Object A from a shared location
>> 3. Threads A-C drop references to Object A; therefore, dtor on A
>> 4: Thread D tries to increment count of Object A
>> 5: Thread D goes boom.
>

The classic race-condition basically boils down to the reference counter
getting destroyed before it can be incremented. This condition is real.
Normal thread-safe reference counting cannot be used to handle the above
data race.


Chris Thomasson

unread,
May 4, 2007, 1:01:02 AM5/4/07
to
"David Schwartz" <dav...@webmaster.com> wrote in message
news:1178232805....@y5g2000hsa.googlegroups.com...
> On May 3, 2:42 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> Well, if you want strong thread-safety (e.g., atomically thread-safe) you
>> have to design the algorithm to handle the following race-condition:
>>
>> 1. Threads A-C own references to Object A
>> 2. Thread D loads pointer to Object A from a shared location
>> 3. Threads A-C drop references to Object A; therefore, dtor on A
>> 4: Thread D tries to increment count of Object A
>> 5: Thread D goes boom.
>
> This should be impossible.
[...]

Sorry for making so many posts! ;^(...

Anyway, here is another example of the race-condition:


static refs_t *g_refs = new refs_t(1); // refcount == 1

1. Thread A loads pointer to g_refs into la_refs
2. Thread B atomic swaps g_refs with NULL, stores prev value into lb_refs
3. Thread B decrements lb_refs and calls dtor
4. Thread A tries to increment la_refs
5. Thread A seg-faults!


See? Step 2 transferred the count from the shared location to thread B, so
the reference count of 1 still stands because g_refs is now NULL. BUT,
remember in Step 1 well, Thread A loaded a pointer... Step 3 destroys the
pointer because the result of the reference count decrement was zero. Step
4-5 is the end result.

This above scenario can and will occur with shared_ptr, or any other normal
thread-safe counting. You HAVE to follow the rules. If you not using a
counting algorithm that does not provide strong thread safety, well, the
rule is that you can only allow a thread to increment a reference counter
that is already owns period.

This rule does not apply to atomically thread-safe counting.

Does this better explain the race-condition?


dick.dunbar

unread,
May 4, 2007, 2:47:15 AM5/4/07
to
On May 3, 4:45 pm, David Hopwood <david.hopw...@industrial-
designers.co.uk> wrote:

> AIX-on-PowerPC can support independent reads and writes of 8-bit bytes
> perfectly well, and it doesn't (AFAICS) have any such documentation.

The "load-and-reserve", "store-and-release" instructions are flexible
to roll-your-own algorithms.

AIX provides 16-bit (short) atomic updates, but you could
just as easily provide a byte atomic update.

The restriction here is that the load-reserve / store-release
instructions
only operate on word boundaries (32-bit word family or 64-bit word
family).

In /usr/include/sys/atomic_op.h
typedef ushort *atomic_h;
boolean_t test_and_setsp(atomic_h, short);
ushort fetch_and_add_h(atomic_h,int);


rfis...@gmail.com

unread,
May 4, 2007, 5:49:01 AM5/4/07
to
On 3 Mag, 23:30, David Schwartz <dav...@webmaster.com> wrote:

> > be that you can only ever serialise access to the entire class and
> > not just a part of it.
>
> That is correct. You can only serialize access to an entire storage
> object.

Holey crap! I did not know that.

> Are you seriously arguing that this should be done under a bus lock on
> such a platform just on the off chance that another thread might read-
> modify-write j[1] at the same time?

I don't know! Maybe!?

> > If I'm wrong about this then I've got an awful lot of code to review.
> > I hope I'm not wrong.
>
> I'm afraid you are. No standard makes any guarantees about accesses to
> one part of a storage object not affecting other parts of that same
> storage object against concurrent accesses that the compiler has no
> idea about.


Ok - so is it enough to wrap the object in a struct?

typedef struct {
char j[4];
} a_storage_object;

Can accesses to the above struct inside another object sneak outside
of its boundaries? If not it won't be too hard to remedy.

> This is why reference count classes allocate the reference count as a
> separate storage object.

How does one get oneself a separate storage object?

RF

Gil Hamilton

unread,
May 4, 2007, 9:54:24 AM5/4/07
to
David Schwartz <dav...@webmaster.com> wrote in
news:1178228112.0...@c35g2000hsg.googlegroups.com:

> On May 3, 9:45 am, David Hopwood <david.hopw...@industrial-
> designers.co.uk> wrote:
>
>> True, but beside the point, since this implementation behaviour would
be
>> a bug for single-threaded code given the 'volatile' declaration.
>
> So how do you think a platform should handle an allocation like this
> one if it can only implement 8-bit operations as 32-bit operations:
>
> char j[8];
>
> Are you seriously arguing that it has to put each character in its own
> 32-bit storage unit just so that '*(volatile char *)(&j[2])=0;' works
> the way you think it should?

I would argue so. There's nothing in the standard saying that char must
be 8 bits. On the other hand, the specification seems pretty clear about
modifications to volatile objects being made "strictly according to the
rules of the abstract machine". Modifying j[0], j[1] and j[3] along with
j[2] above would clearly be outside the "rules of the abstract machine".

Also, I just noticed this note in the section (of the C standard) on
qualifiers which lends further support:
If the specification of an array type includes any type
qualifiers, the element type is so qualified, not the array
type.

So in the case you cite, the standard requires the individual array
elements to be treated as volatile not the array as a whole.


> Or are you suggesting that
> char j[8];
> and
> volatile char j[8];
> should produce different memory layouts and that casting to and from
> 'volatile' should cause you to access the wrong memory area?

This definitely isn't allowed. A volatile-qualified type is required to
have the same representation and alignment requirements as the same type
not so qualified.


> Essentially, you are arguing that a compiler must put every variable
> in its own storage unit because any variable might be a target of a
> 'volatile' operation. (Or that casts to and from 'volatile' can cause
> the wrong memory area to be accessed and basically just have to be
> illegal.)
>
> I doubt you will find many sane people who agree with you.

It is certainly the case that the semantics required of 'volatile' are
not particularly well-defined. However, there's really no support
whatsoever in the standards for your position.

Try removing multiple threads from the scenario. Consider instead a
single-threaded program that is accessing individual byte-sized registers
in memory-mapped hardware.

Finally, I wonder if you would make the same arguments for this
structure:
struct machine_regs {
volatile uint32_t StatusReg;
volatile uint32_t ControlReg1;
volatile uint32_t ControlReg2; ...
} * regs = 0xbc800000;
regs->ControlReg1 = 0x20c074;

That is, is the compiler free to modify StatusReg here or ControlReg2?

GH

Marcin 'Qrczak' Kowalczyk

unread,
May 4, 2007, 10:48:30 AM5/4/07
to
Dnia 03-05-2007, czw o godzinie 14:30 -0700, David Schwartz napisał(a):

> There is no reason it shouldn't. For example, if you have:
>
> char j[4];
>
> There is no reason a compiler couldn't optimize 'j[2]=0;' as a 32-bit
> fetch for all four values, a bit mask to zero out the third byte, and
> then a 32-bit store. If that was faster on that platform, a compiler
> *should* do this.

It can't do this if it supports threads (unless it has determined that
in a particular case this is safe), because the programmer is allowed
to use separate mutexes for synchronizing accesses to separate elements
of this array, and they should not interfere with each other.

Moreover, if the processor allows this as the only option, then a
compiler for it can't support pthreads (without expensive workarounds
like taking a global lock for modifying bytes which might have adjacent
bytes used by other threads), and its support for volatile must break
the conventional meaning of volatile (even though the reinterpreted
meaning can be technically legal).

--
__("< Marcin Kowalczyk
\__/ qrc...@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/

David Hopwood

unread,
May 4, 2007, 12:15:51 PM5/4/07
to
dick.dunbar wrote:
> On May 3, 4:45 pm, David Hopwood <david.hopw...@industrial-
> designers.co.uk> wrote:
>
>>AIX-on-PowerPC can support independent reads and writes of 8-bit bytes
>>perfectly well, and it doesn't (AFAICS) have any such documentation.
>
> The "load-and-reserve", "store-and-release" instructions are flexible
> to roll-your-own algorithms.
>
> AIX provides 16-bit (short) atomic updates, but you could
> just as easily provide a byte atomic update.

Remember that the original STLPort code under discussion in this thread
did not need 8- or 16-bit atomic updates. It needed 32-bit atomic
updates that were not interfered with by an *ordinary* update to an 8-bit
field stored nearby (but in a different 32-bit word).

--
David Hopwood <david....@industrial-designers.co.uk>

David Hopwood

unread,
May 4, 2007, 12:20:31 PM5/4/07
to
Gil Hamilton wrote:
> David Schwartz <dav...@webmaster.com> wrote in
> news:1178228112.0...@c35g2000hsg.googlegroups.com:
>
>>On May 3, 9:45 am, David Hopwood <david.hopw...@industrial-
>>designers.co.uk> wrote:
>>
>>> [...] this implementation behaviour would be

>>> a bug for single-threaded code given the 'volatile' declaration.
>>
>>So how do you think a platform should handle an allocation like this
>>one if it can only implement 8-bit operations as 32-bit operations:
>>
>>char j[8];
>>
>>Are you seriously arguing that it has to put each character in its own
>>32-bit storage unit just so that '*(volatile char *)(&j[2])=0;' works
>>the way you think it should?
>
> I would argue so. There's nothing in the standard saying that char must
> be 8 bits.

Not in C99 or C++, but there is (indirectly) in POSIX. Then again, a platform
that "can only implement 8-bit operations as 32-bit operations" would have quite
some difficulty conforming to POSIX.

--
David Hopwood <david....@industrial-designers.co.uk>

dick.dunbar

unread,
May 4, 2007, 1:38:48 PM5/4/07
to
On May 4, 9:15 am, David Hopwood <david.hopw...@industrial-
designers.co.uk> wrote:
> dick.dunbar wrote:

> Remember that the original STLPort code under discussion in this thread
> did not need 8- or 16-bit atomic updates. It needed 32-bit atomic
> updates that were not interfered with by an *ordinary* update to an 8-bit
> field stored nearby (but in a different 32-bit word).

Correct. I'm just trying to keep up with all the twists and turns
this thread has taken.


dick.dunbar

unread,
May 4, 2007, 1:58:50 PM5/4/07
to
On May 3, 10:01 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
> Anyway, here is another example of the race-condition:
>
> static refs_t *g_refs = new refs_t(1); // refcount == 1
>
> 1. Thread A loads pointer to g_refs into la_refs
> 2. Thread B atomic swaps g_refs with NULL, stores prev value into lb_refs
> 3. Thread B decrements lb_refs and calls dtor
> 4. Thread A tries to increment la_refs
> 5. Thread A seg-faults!
>
> See? Step 2 transferred the count from the shared location to thread B, so
> the reference count of 1 still stands because g_refs is now NULL. BUT,
> remember in Step 1 well, Thread A loaded a pointer... Step 3 destroys the
> pointer because the result of the reference count decrement was zero. Step
> 4-5 is the end result.
>
> This above scenario can and will occur with shared_ptr, or any other normal
> thread-safe counting. You HAVE to follow the rules. If you not using a
> counting algorithm that does not provide strong thread safety, well, the
> rule is that you can only allow a thread to increment a reference counter
> that is already owns period.
>
> This rule does not apply to atomically thread-safe counting.
>
> Does this better explain the race-condition?

Sorry, I still don't follow. In our test case, I only started the
static counter at 1
in order to demonstrate the problem of how a counter could reach zero
without the code explicitly doing so.

The application scenario goes more like this:
1. Thread A creates the object, and atomicly increments the counter.
RC=1
2. Thread B wishes to use the object, and atomicly increments the
counter. RC=2
3. Thread B performs some modification of the object, indicating
object state.
4. Thread C wishes to use the object, and increments the counter. RC=3
5. Thread C does not view or use the modification by B.
6. Thread A (the creator) is done with the object, and decrements the
counter RC=2
7. Thread C is done, and decrements the counter RC=1
8. Thread B is done, and decrements the counter RC=0
And now it is thread B's responsibility to free the object (not the
creator).
9. All threads must see a non-zero count in order to acquire usage of
the object.

Before the straw-man arguments start to fly, I did not write any of
this code.

I have observed it intensely on AIX using low-level tools, and it
operates as described. The only failure in this algorithm was in a
64-bit environment when the counter was modified by unrelated source
statements without following our rules.

David Hopwood

unread,
May 4, 2007, 3:51:18 PM5/4/07
to
dick.dunbar wrote:
> The application scenario goes more like this:
> 1. Thread A creates the object, and atomicly increments the counter.
> RC=1
> 2. Thread B wishes to use the object, and atomicly increments the
> counter. RC=2
> 3. Thread B performs some modification of the object, indicating
> object state.
> 4. Thread C wishes to use the object, and increments the counter. RC=3
> 5. Thread C does not view or use the modification by B.
> 6. Thread A (the creator) is done with the object, and decrements the
> counter RC=2
> 7. Thread C is done, and decrements the counter RC=1
> 8. Thread B is done, and decrements the counter RC=0
> And now it is thread B's responsibility to free the object (not the
> creator).
> 9. All threads must see a non-zero count in order to acquire usage of
> the object.

I'd just like to highlight a subtle point here. In addition to the
increments and decrements of the refcount being atomic, they must also
be serializing (act as bidirectional memory barriers). That is, it must
not be possible for a load or store of any other location by the same
thread to be hoisted above or sunk below a refcount update. This is not,
strictly speaking, implied by atomicity.

--
David Hopwood <david....@industrial-designers.co.uk>

dick.dunbar

unread,
May 4, 2007, 4:41:46 PM5/4/07
to
On May 4, 12:51 pm, David Hopwood <david.hopw...@industrial-
designers.co.uk> wrote:

> I'd just like to highlight a subtle point here. In addition to the
> increments and decrements of the refcount being atomic, they must also
> be serializing (act as bidirectional memory barriers). That is, it must
> not be possible for a load or store of any other location by the same
> thread to be hoisted above or sunk below a refcount update. This is not,
> strictly speaking, implied by atomicity.

Right. We use the inlined assembly code I posted above to do reference
counts.
That code does not create the barrier you describe, but other forms of
memory synchronization on PowerPC do issue an "instruction
sync" (isync) to cause aggressive fetches in the pipeline to be
restarted. For our purposes, that wasn't necessary.

David Schwartz

unread,
May 4, 2007, 5:02:45 PM5/4/07
to
On May 4, 2:49 am, "rfist...@gmail.com" <rfist...@gmail.com> wrote:

> Ok - so is it enough to wrap the object in a struct?
>
> typedef struct {
> char j[4];
>
> } a_storage_object;

I believe it is enough to wrap the object in a struct.

> > This is why reference count classes allocate the reference count as a
> > separate storage object.

> How does one get oneself a separate storage object?

The typical way is a call to 'malloc'.

DS

David Schwartz

unread,
May 4, 2007, 5:07:53 PM5/4/07
to
On May 3, 9:47 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

> The classic race-condition basically boils down to the reference counter
> getting destroyed before it can be incremented. This condition is real.
> Normal thread-safe reference counting cannot be used to handle the above
> data race.

I cannot imagine and have never seen any sane code where that race
condition could occur. Whoever or whatever is trying to increment the
reference counter must already have a reference to the object,
otherwise how would it know what object to increment the reference
counter of?

You can only get a reference to an object from something that already
has a reference. Otherwise, there is no reference (and hence no
object) to give you. Any method of finding or owning an object must
already have found/owned that object.

I honestly don't understand what this whole issue is about. The
claimed race is impossible under any reasonable scenario I can
imagine.

DS

David Schwartz

unread,
May 4, 2007, 5:17:40 PM5/4/07
to
On May 3, 2:42 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

> Well, if you want strong thread-safety (e.g., atomically thread-safe) you
> have to design the algorithm to handle the following race-condition:
>
> 1. Threads A-C own references to Object A
> 2. Thread D loads pointer to Object A from a shared location
> 3. Threads A-C drop references to Object A; therefore, dtor on A
> 4: Thread D tries to increment count of Object A
> 5: Thread D goes boom.

Just to restate my position, this race condition can only occur in
code that's so broken this is the least of your problems. The reason
the race condition cannot happen is because in step 2, when thread D
loaded the pointer, it got it from a "shared location". There are
three possible cases:

1) The shared location has a pointer to the object but does not hold a
reference. In this case, the implementation is fundamentally broken.
Anything that keeps a pointer to an object must hold a reference
because the reference is the only thing that keeps that pointer valid.

2) This shared location has a reference to the object, but the shared
location drops the reference after step 2 but before step 4. In this
case, the implementation is fundamentally broken. You must hold a
reference until you are done with the pointer to the object the
reference protects and cannot drop it in the middle of an operation on
that pointer or its reference.

3) This shared location has a reference and does not drop it while
it's in use. In this case, there is no race. Step 3 cannot release the
last reference because the shared location still has a reference to
the object and it will not release that reference while an operation
against that reference is still in progress (steps 2 and 4 are such an
operation).

So unless you are doing something so mind-bogglingly broken that you
defeat the whole point of reference counts, this race simply cannot
happen. The whole point of reference counts is to avoid exactly this
issue.

DS

rfis...@gmail.com

unread,
May 4, 2007, 7:23:18 PM5/4/07
to
On 4 Mag, 23:02, David Schwartz <dav...@webmaster.com> wrote:

> > typedef struct {
> > char j[4];
>
> > } a_storage_object;
>
> I believe it is enough to wrap the object in a struct.
>
> > > This is why reference count classes allocate the reference count as a
> > > separate storage object.
> > How does one get oneself a separate storage object?
>
> The typical way is a call to 'malloc'.

So reference count class malloc the reference count?
Or do they keep the reference count in a struct?


David Schwartz

unread,
May 4, 2007, 10:03:25 PM5/4/07
to
On May 4, 4:23 pm, "rfist...@gmail.com" <rfist...@gmail.com> wrote:

> > The typical way is a call to 'malloc'.

> So reference count class malloc the reference count?
> Or do they keep the reference count in a struct?

The typical way is to call 'malloc' and allocate enough space to
ensure that you get a cache line to yourself.

DS

David Hopwood

unread,
May 5, 2007, 12:48:43 PM5/5/07
to
David Schwartz wrote:
> On May 4, 2:49 am, "rfist...@gmail.com" <rfist...@gmail.com> wrote:
>
>>Ok - so is it enough to wrap the object in a struct?
>>
>>typedef struct {
>> char j[4];
>>
>>} a_storage_object;
>
> I believe it is enough to wrap the object in a struct.

Why?

That is, I agree that this probably avoids word tearing on most platforms,
but only by coincidence (because the alignment of a struct is typically
a multiple of the largest unit that is used to combine memory operations,
although there's no reason why it must be). What is your reasoning about
why it must work?

>>>This is why reference count classes allocate the reference count as a
>>>separate storage object.
>
>>How does one get oneself a separate storage object?
>
> The typical way is a call to 'malloc'.

I don't see anything in (for instance) POSIX, that would make malloc-allocated
objects different from other objects in this respect. Of course malloc allocates
objects that are aligned for any type, but again, the fact that this avoids
word tearing on most platforms is just a coincidence.

--
David Hopwood <david....@industrial-designers.co.uk>

dick.dunbar

unread,
May 5, 2007, 2:20:21 PM5/5/07
to
On May 5, 9:48 am, David Hopwood <david.hopw...@industrial-

designers.co.uk> wrote:
> Of course malloc allocates objects that are aligned for any type,

SUSV3 ....
http://www.opengroup.org/onlinepubs/009695399/functions/malloc.html

"The pointer returned if the allocation succeeds shall be suitably
aligned so that it may be assigned to a pointer to any type of object
and then used to access such an object in the space allocated ..."

On a 64-bit platform, malloc must return storage that is 64-bit
aligned.

David Hopwood

unread,
May 5, 2007, 4:42:58 PM5/5/07
to
dick.dunbar wrote:
> On May 5, 9:48 am, David Hopwood <david.hopw...@industrial-
> designers.co.uk> wrote:
>
>> Of course malloc allocates objects that are aligned for any type,
>
> SUSV3 ....
> http://www.opengroup.org/onlinepubs/009695399/functions/malloc.html
>
> "The pointer returned if the allocation succeeds shall be suitably
> aligned so that it may be assigned to a pointer to any type of object
> and then used to access such an object in the space allocated ..."

Yes, thanks for finding the chapter and verse.

> On a 64-bit platform, malloc must return storage that is 64-bit
> aligned.

No. A "64-bit platform" (that is, one that efficiently supports
operations on 64-bit values, and/or a 64-bit user virtual address space)
doesn't *necessarily* require 64-bit alignment for any type. If it
doesn't, then malloc need not return 64-bit aligned storage.

Not that I can think of a concrete counterexample. Even on most 32-bit
platforms, malloc returns a pointer with 64-bit or larger alignment.

--
David Hopwood <david....@industrial-designers.co.uk>

dick.dunbar

unread,
May 5, 2007, 5:17:29 PM5/5/07
to
On May 5, 1:42 pm, David Hopwood <david.hopw...@industrial-
designers.co.uk> wrote:

> Not that I can think of a concrete counterexample. Even on most 32-bit
> platforms, malloc returns a pointer with 64-bit or larger alignment.

The reason for this is 8-byte floats. I don't know any platforms that
"don't care" about alignment. PowerPC 601 was a "permissive"
cpu, which had to accomodate a lot of legacy code where structures
were packed, without concern for boundary.

Gradually, the PowerPC family became more restrictive ...
letting the kernel do the work of intercepting instruction boundary
faults,
moving the data to a boundary, and restarting the operation.
Slow, but your application kept running and AIX had a monitor
to count how many times you encountered these faults.

If you don't play by the rules today, the application gets to
deal with the SIGnal faults that result from off-boundary ops.
It's just easier to fix the code.

David Schwartz

unread,
May 5, 2007, 6:03:06 PM5/5/07
to
On May 5, 9:48 am, David Hopwood <david.hopw...@industrial-
designers.co.uk> wrote:

> That is, I agree that this probably avoids word tearing on most platforms,
> but only by coincidence (because the alignment of a struct is typically
> a multiple of the largest unit that is used to combine memory operations,
> although there's no reason why it must be). What is your reasoning about
> why it must work?

It's not really a coincidence. It's one of those coincidences that,
because there's such a coincidence, we don't have to find other ways
to make things works. Another one is that because, by coincidence, any
way another thread could access a variable an external function could
too, and we don't tell the compiler pthread_mutex_lock/unlock are
special, we don't need to do anything to flush values cached in
registers. If not for that "coincidence" we'd have to solve the
problem some other way.

It's guaranteed to work, but since it works by coincidence, nothing
special needs to be done to make it work.

> >>>This is why reference count classes allocate the reference count as a
> >>>separate storage object.
>
> >>How does one get oneself a separate storage object?
>
> > The typical way is a call to 'malloc'.
>
> I don't see anything in (for instance) POSIX, that would make malloc-allocated
> objects different from other objects in this respect. Of course malloc allocates
> objects that are aligned for any type, but again, the fact that this avoids
> word tearing on most platforms is just a coincidence.

Other objects are not different in that respect.

The sequence 'char a; char b;' does not allocate two objects, it
allocates one object that contains two characters. The compiler is
free to optimize all accesses to 'a' or 'b' as an access to both of
them.

DS

Chris Thomasson

unread,
May 5, 2007, 7:07:29 PM5/5/07
to
"David Schwartz" <dav...@webmaster.com> wrote in message
news:1178313460....@y5g2000hsa.googlegroups.com...

> On May 3, 2:42 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> Well, if you want strong thread-safety (e.g., atomically thread-safe) you
>> have to design the algorithm to handle the following race-condition:
[...]

> Just to restate my position, this race condition can only occur in
> code that's so broken this is the least of your problems.

If you allow a thread to increment a reference counter of an object that it
does not previously own AND your not using an atomically thread-safe
counting algorithm, then yes the code is broken...

However, if you using atomic counting, then the code is NOT broken in any
way shape or form.


Chris Thomasson

unread,
May 5, 2007, 7:10:18 PM5/5/07
to
"David Schwartz" <dav...@webmaster.com> wrote in message
news:1178312873.1...@n59g2000hsh.googlegroups.com...

> On May 3, 9:47 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> The classic race-condition basically boils down to the reference counter
>> getting destroyed before it can be incremented. This condition is real.
>> Normal thread-safe reference counting cannot be used to handle the above
>> data race.
>
> I cannot imagine and have never seen any sane code where that race
> condition could occur.

I take it that you have never messed around with Joe Seighs atomic_ptr?


> Whoever or whatever is trying to increment the
> reference counter must already have a reference to the object,

[...]

This is true for normal thread-safe countint. Is NOT true for atomic_ptr,
and other atomic counting algorithms.


> I honestly don't understand what this whole issue is about. The
> claimed race is impossible under any reasonable scenario I can
> imagine.

Well, did you read the papers I pointed you to? The scenario is real, and
there are solutions for it.

Chris Thomasson

unread,
May 5, 2007, 7:15:03 PM5/5/07
to
"dick.dunbar" <dick....@gmail.com> wrote in message
news:1178301530.3...@y5g2000hsa.googlegroups.com...

> On May 3, 10:01 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>> Anyway, here is another example of the race-condition:
>>
>> static refs_t *g_refs = new refs_t(1); // refcount == 1
[...]

>> Does this better explain the race-condition?
>
> Sorry, I still don't follow. In our test case, I only started the
> static counter at 1
> in order to demonstrate the problem of how a counter could reach zero
> without the code explicitly doing so.
>
> The application scenario goes more like this:
> 1. Thread A creates the object, and atomicly increments the counter.
> RC=1

Okay.


> 2. Thread B wishes to use the object, and atomicly increments the
> counter. RC=2

[...]

Well, just from line 2... It sure seems that thread B is allowed to acquire
a reference count to an object that it does not own. Thread A owns a
reference, Thread B does not... And you seem to be allowing Thread B to grab
a reference... So, IMO, the race-condition can apply to your scheme... This
is illegal counting for even shared_ptr.

So, if you wanted to avoid the race-condition AND not have to use a special
atomically thread-safe counting algorithm, then you must make Thread A
increment the reference count to 2 AND pass it to thread B. That way thread
B no longer has to take a reference to an object that it does not own. This
is because Thread A increment the counter on behalf of Thread B.


Chris Thomasson

unread,
May 5, 2007, 7:20:54 PM5/5/07
to
"David Schwartz" <dav...@webmaster.com> wrote in message
news:1178313460....@y5g2000hsa.googlegroups.com...

> On May 3, 2:42 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> Well, if you want strong thread-safety (e.g., atomically thread-safe) you
>> have to design the algorithm to handle the following race-condition:
>>
[...]

> So unless you are doing something so mind-bogglingly broken that you
> defeat the whole point of reference counts, this race simply cannot
> happen.

This race cannot happen if you follow the rules for Boost shared_ptr which
explicitly says that it does not allow for strong competing access.

Joe Seighs atomic_ptr DOES allow for strong competing access...

-----------
Seems like you not too familiar with this type of reference counting.

We have had many discussions on this type of stuff in this group over the
years...

P.S.

Seems like you trying to say that there is absolutely NO need whatsoever for
algorithms like atomic_ptr? Am I way off base here?


Chris Thomasson

unread,
May 5, 2007, 8:20:17 PM5/5/07
to
[...]

> Well, did you read the papers I pointed you to? The scenario is real, and
> there are solutions for it.

One solution is to bind yourself to the rules of shared_ptr.

Another solution is to use something like atomic_ptr, or this:

http://appcore.home.comcast.net/vzoom/refcount/


Are you saying the any other solution that is allowed to break shared_ptr
rules is no good? Or in other words, do you see a use for strong
thread-safety in reference counting algorithms?


David Schwartz

unread,
May 5, 2007, 10:40:48 PM5/5/07
to
On May 5, 4:07 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

> If you allow a thread to increment a reference counter of an object that it
> does not previously own AND your not using an atomically thread-safe
> counting algorithm, then yes the code is broken...

But this is impossible. In order for a thread to increment a reference
counter of an object, it must have a pointer to that object. The
pointer needs to be valid both before and after the reference count is
acquired. The only way to *keep* the pointer valid is to hold a
reference count. So the thread must already have a reference count.
Otherwise, your code is already broken.

I honestly can't imagine how someone can have this particular race
condition in code that is even close to sanely implemented. How can
you even *find* an object that nothing has a reference to? Any way of
finding an object must have a reference to it.

If I tried as hard as I could, I don't think I could produce sane-
looking code that suffered from this race.

DS

Chris Thomasson

unread,
May 5, 2007, 11:57:32 PM5/5/07
to
"David Schwartz" <dav...@webmaster.com> wrote in message
news:1178419248....@o5g2000hsb.googlegroups.com...

> On May 5, 4:07 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> If you allow a thread to increment a reference counter of an object that
>> it
>> does not previously own AND your not using an atomically thread-safe
>> counting algorithm, then yes the code is broken...
>
> But this is impossible. In order for a thread to increment a reference
> counter of an object, it must have a pointer to that object.

> The only way to *keep* the pointer valid is to hold a
> reference count.

No. You have to be able to both load a pointer AND increment the reference
count atomically. You seem to be saying that this is impossible, when its
clearly not the case.

You can use DWCAS, DCAS, offset tricks, alignment tricks, KCSS, MCAS, ect,
ect...


Chris Thomasson

unread,
May 6, 2007, 12:04:22 AM5/6/07
to
"David Schwartz" <dav...@webmaster.com> wrote in message
news:1178419248....@o5g2000hsb.googlegroups.com...

> On May 5, 4:07 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> If you allow a thread to increment a reference counter of an object that
>> it
>> does not previously own AND your not using an atomically thread-safe
>> counting algorithm, then yes the code is broken...
>
> But this is impossible. In order for a thread to increment a reference
> counter of an object, it must have a pointer to that object. The
> pointer needs to be valid both before and after the reference count is
> acquired. The only way to *keep* the pointer valid is to hold a
> reference count. So the thread must already have a reference count.
> Otherwise, your code is already broken.

Seems like your confusing normal thread safety with strong thread safety?
---

Again, the code is only broken if your using normal reference counting
(e.g., shared_ptr) and you break the rules by allowing a thread to increment
the reference count of an object that is does not already own. This rule is
compatible with normal thread safety.

This rule does not apply to atomically thread-safe counting. Any thread can
try to grab a reference concurrently; this requires strong thread safety.

There is a use for this type of counting... For instance, you can use it to
transform a lock-free algorithm that relies on garbage collection to one
that does not. Boost shared_ptr cannot be used for this. You need atomically
thread-safe counting to do this.

---
Seems like your confusing normal thread safety with strong thread safety?


Chris Thomasson

unread,
May 6, 2007, 12:05:36 AM5/6/07
to
"Chris Thomasson" <cri...@comcast.net> wrote in message
news:lvSdncYYd9WgzqDb...@comcast.com...

> "David Schwartz" <dav...@webmaster.com> wrote in message
> news:1178419248....@o5g2000hsb.googlegroups.com...
>> On May 5, 4:07 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

Perhaps I need some help from Joe Seigh wrt getting across the point of
atomic thread-safe counting vs. normal thread-safe counting...

;^)


Joe Seigh

unread,
May 6, 2007, 7:44:39 AM5/6/07
to

Just say atomic or atomically. They either get it or they don't.

Atomic just means that any operations that are observable, are
observed to happen as if they were applied discretely. The
observed order doesn't have to be the order they actually occurred
in (if that has any meaning).

Thread-safe basically just means no hidden side effects (from internal
sharing usually).

Thread-safety can get gotten by using locks. You have reentran thread-safe
which means thread-safety gotten without locks, using by not sharing.
Also there's async safe.

So the ultimate would be atomically thread-safe async-safe reentrant
functions. :)

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

It is loading more messages.
0 new messages