Boolean Buyers Beware ... AIX compiler bug --- PMR 26241,756

22 views
Skip to first unread message

dick.dunbar

unread,
Apr 30, 2007, 3:14:58 PM4/30/07
to
Boolean Buyers Beware ... AIX compiler bug --- PMR 26241,756.

Our code runs flawlessly in 32-bit mode.
During a 24-hour torture test of our 64-bit implementation, the system
mysteriously
crashed 10 hours into the run. Hardware watchpoints weren't much help
to isolate the
crash, so we spent several weeks writing small testcases to reproduce
the problem.
The resulting test case will crash within a few seconds.

Hint: _Tag _M_tag:8; is the line that causes the problem; an 8-bit
holder of enum data.

Test Case Essentials:
--------------------
class NQNString {
public:
enum _Tag {_S_leaf, _S_concat, _S_substringfn, _S_function};
volatile int _M_ref_count; // 32-bit int used in tcpip to
32-bit clients
_Tag _M_tag:8; // 8-bit char is as large as
we need
int _M_incr() {return fetch_and_add(&_M_ref_count, 1);}
int _M_decr() {return fetch_and_add(&_M_ref_count,-1);}
};

void *ThreadFunc(void *threadid) {
for (int i = 0; i < 100000000; ++i) {
gs._M_incr(); // atomicly increment
gs._M_tag=NQNString::_S_concat; // set it to one
if(gs._M_decr() == 0 ) ++*(int*)0; // ref_count should never go
to zero!
}
return 0;
}

The Bug:
-------
1. ref_count is incremented/decremented consistently using AIX atomic
ops

2. _M_tag assignment causes both ref_count and _tag to be loaded into
a 64-bit register, updated by twiddling bits, and stored non-
atomicly.
ref_count is modified when only _tag was specified in the source.

3. If multiple threads execute this code, it is only a matter of time
before a thread updates the count (1 -> 2), and another thread
loads the same count (1), updates the _Tag field, and stores the
1 back. When the first thread decrements, the count now goes from
1 to zero, when the expectation was from 2 to 1.

4. The reason for this is an IBM compiler construct called
BITCONTAINER,
where the volatile int and bit-field are manipulated as a unit.

PMR 26241,756.
-------------
IBM identified a source work-around, and claimed the problem was our
source.
A TechNote was published but fails to relate to our 64-bit experience.

http://www-1.ibm.com/support/docview.wss?rs=2239&uid=swg21259159

Problem
-------
In a multithreaded program, updating a bitfield variable may affect
the values of other variables.

Cause
-----
Memory is logically partitioned into containers for storing data.
The size of the container depends on the alignment of the structure.
For example,

struct foo {
short x;
int y : 8;
};

In this case, a 4-byte container will be created:
foo::x will occupy the first and second bytes of the container
foo::y will occupy the third byte

Assume that struct foo is used in a multithreaded program, and
each thread updates both foo::x and foo::y.
An unexpected value may potentially be passed to foo::x when multiple
threads are processing data in struct foo at the same time because the
memory manipulation of bit-field variables can potentially
"load and store" other variables in the same container.

Solution
--------
When updating a variable, users should use thread mutex lock to ensure
only one thread is updating it. This will avoid the issue.
Alternatively users may insert "int :0;" before the bitfield.
This will change the layout of the struct, so the container for the
bitfields does not overlap the first member.
------- End of IBM TechNote ----------------

I cited section 9.6 of the C++ standard as justification for our
coding.

ISO_14482 C++ Standard Section: 9.6 bit-fields (class.bit)
------------------------------------------------------------
* id:constant ... represents bits. "constant" may be larger than
needed, for padding
* allocation of bit-fields within a class object is implementation
defined
* Alignment of bit-fields is implementation-defined
* bit fields are packed into some addressable allocation unit
* bit-fields straddle allocation units on some machine and not on
others.
* bit-fields are assigned right-to-left on some machines, left-to-
right on others
* unnamed bit-fields are not members and cannot be initialized
* special case: unnamed bit-field with width zero specifies alignment
of next bit-field
at an allocation unit boundary. Only unnamed bit-fields may use :0
* A bool value can successfully be stored in a bit-field of any
nonzero size.
The phrase "allocation unit" is only used in section 9.6, and never
defined.
------------- End of Standard snippets ------

IBM Response:
------------
Section 9.6 of ISO14882 does not specify the mandatory bit-field
implementation in multithreaded environment.
As a result, we have not violated the current C++ bit-field standard.
Currently we do not have plans to revise the bit-field implementation
of the compiler.

Response:
--------
The standard expects you to do something rational on your hardware.
IBM can do anything they like to bit-fields, but to subsume
non-bitfield POD data into your BITCONTAINER is wrong.

Modifying two pieces of data, when only one object was specified in
our source
code is about the worst kind of compiler error I can think of.

To avoid the compiler bug, we changed _M_tag to remove the
implication of bits,
and therefore stop IBM from "inventing this BITCONTAINER construct"
when none was
intended or implied in our code.

_Tag _M_tag:8; // Fails: single byte of enum data
char _M_tag; // Works: but no longer associates _M_tag with the
enum

This code only fails on IBM compilers (Version 7 and 8), and no other
platform.

So what do you think? Our bug or theirs?

dick.dunbar

unread,
Apr 30, 2007, 7:59:31 PM4/30/07
to
xlC_r -q64 -qsource -qlist -c tc.cpp
vi tc.lst
/17| finds the generated code for stmt 17 gs.c =
NQNString::_S_concat;


--- tc.cpp ---
#include <sys/atomic_op.h>


class NQNString {
public:
enum _Tag {_S_leaf, _S_concat, _S_substringfn, _S_function};
volatile int _M_ref_count;

_Tag t:8; bool b:8; int i:8; char c:8;
char* p; // Remove this, and correct code
is generated
NQNString():_M_ref_count(1) { }
int _M_incr() {return fetch_and_add((int*)&_M_ref_count, 1);}
int _M_decr() {return fetch_and_add((int*)&_M_ref_count,-1);}
};

NQNString gs; // ref_count gets initialized to 1


void *ThreadFunc(void *threadid) {
for (int i = 0; i < 100000000; ++i) {
gs._M_incr();

gs.c=NQNString::_S_concat; // gs.t | gs.b | gs.i | gs.c
if(gs._M_decr() == 0 ) ++*(int*)0; // ref_count should never be
zero
}
return 0;
}

ref_count and c are in a single 8-byte "BITCONTAINER". char *p is in
a second container.
ref_count gets atomicly updated by "fetch_and_add()"
modification of gs.c is performed by an 8byte load of ref_count and
gs.c,
maneuvering some bits into place for gs.c, and an 8-byte store.

Change gs.c from char c:8 to char c, and you get the expected "store
byte" instruction.

If you remove the "char *p" from the class (structure), the compiler
then manipulates
a single word of storage, not a double word. Inefficient when all
that is required is a
byte store, but non-destructive for our purposes.

Barry Kelly

unread,
Apr 30, 2007, 8:44:51 PM4/30/07
to
dick.dunbar wrote:

> [big snip]


> This code only fails on IBM compilers (Version 7 and 8), and no other
> platform.
>
> So what do you think? Our bug or theirs?

My first thought: the C++ standard says next to nothing about threads,
so you can't rely on it much for supporting evidence. Second thing, if a
machine doesn't have simple byte-addressable memory, then a compiler
will need to do the kind of stuff it sounds like the IBM compiler is
doing: load a larger unit, modify it, then save it back down. This works
transparently in single-threaded code.

In summary: I think it's questionable that the compiler acts differently
for a bitfield of width 8 bits (and doesn't straddle different bytes)
and an 8-bit char, but I can't say that it is incorrect.

-- Barry

--
http://barrkel.blogspot.com/

Thomas Richter

unread,
Apr 30, 2007, 9:46:45 PM4/30/07
to
dick.dunbar <dick....@gmail.com> wrote:

> Boolean Buyers Beware ... AIX compiler bug --- PMR 26241,756.

> Hint: _Tag _M_tag:8; is the line that causes the problem; an 8-bit


> holder of enum data.
>
> Test Case Essentials:
> --------------------
> class NQNString {
> public:
> enum _Tag {_S_leaf, _S_concat, _S_substringfn, _S_function};
> volatile int _M_ref_count; // 32-bit int used in tcpip to
> 32-bit clients
> _Tag _M_tag:8; // 8-bit char is as large as
> we need
> int _M_incr() {return fetch_and_add(&_M_ref_count, 1);}
> int _M_decr() {return fetch_and_add(&_M_ref_count,-1);}
> };
>
> void *ThreadFunc(void *threadid) {
> for (int i = 0; i < 100000000; ++i) {
> gs._M_incr(); // atomicly increment
> gs._M_tag=NQNString::_S_concat; // set it to one
> if(gs._M_decr() == 0 ) ++*(int*)0; // ref_count should never go
> to zero!
> }
> return 0;
> }

First of all, please note that C++ makes no definitions about threads
at all, and provides no guarantees whatsoever concerning thread-safety.
Thus, it's purely an extension your vendor provides, reliable or unreliable.
Make sure you know what you're doing, but the C++ standard offers you nothing
here.

> The Bug:
> -------
> 1. ref_count is incremented/decremented consistently using AIX atomic
> ops

Thus, test_and_set() works according to what was intented by the
vendor.

> 2. _M_tag assignment causes both ref_count and _tag to be loaded into
> a 64-bit register, updated by twiddling bits, and stored non-
> atomicly.

Well, if that is the only way how to access an 8 bit field on the
architecture, that's it. If you want thread-safe access on a
structure, use a mutex, hopefully provided by the implementation.

> PMR 26241,756.
> -------------
> IBM identified a source work-around, and claimed the problem was our
> source.
> A TechNote was published but fails to relate to our 64-bit experience.
>
> http://www-1.ibm.com/support/docview.wss?rs=2239&uid=swg21259159
>
> Problem
> -------
> In a multithreaded program, updating a bitfield variable may affect
> the values of other variables.

That is definitely a problem. It is also a problem the C++ standardization
committee is aware of, see the discussion in comp.lang.c++.moderated.
I think it is really that: On some architectures, it is unreasonable
to expect that you can access structure units atomically (without adding
much overhead). If you need to be thread safe, you should protect the whole
thing by a mutex.

> I cited section 9.6 of the C++ standard as justification for our
> coding.

I don't think the C++ standard has a word to say about threads. Note
that a C++ compiler is always free to operate on an "as if" basis, i.e.
its implementation might do whatever it wants as long as the observed
result is the same. And as C++ only addresses single-threaded programs,
observed behaviour is only defined in this sense. Thus, your program
has been proven to work correctly for single-threaded by the compiler,
and thus the compiler is fine.

> IBM Response:
> ------------
> Section 9.6 of ISO14882 does not specify the mandatory bit-field
> implementation in multithreaded environment.

A bit "off" I would call that. The problem is that the standard does
not say anything about multithreading. Leave alone how bitfields are
supposed to work..

> The standard expects you to do something rational on your hardware.
> IBM can do anything they like to bit-fields, but to subsume
> non-bitfield POD data into your BITCONTAINER is wrong.

No, why? You seem to imply that an "allocation unit" itself is something
that can be addressed in a thread-safe way. There is no such guarantee
in the standard. An allocation unit might well be a byte, even if the
processor has no provisions to access that unit atomically.

> To avoid the compiler bug, we changed _M_tag to remove the
> implication of bits,
> and therefore stop IBM from "inventing this BITCONTAINER construct"
> when none was
> intended or implied in our code.

A "bitcontainer" is nothing specified in C++. Compilers may use that
*if* they can prove correct behaivor on the virtual machine C++ defines.
And that one is single-threaded. Thus, *formally* the code is running
into undefined-behaivour land. IBM provides additional guarantees to
you when allowing you to run multi-threaded code. As it seems, they don't
express them very clearly.

> So what do you think? Our bug or theirs?

Your bug to depend on something that isn't specified. But unfortunately,
C++ doesn't specify anything here, and still one needs multithreading.

Thus, IBM may specify whatever they found reasonable. Aparently,
that's different from what you expected or think it's reasonable.
Thus, probably bad documentation.

So long,
Thomas

David Schwartz

unread,
Apr 30, 2007, 10:53:09 PM4/30/07
to
On Apr 30, 12:14 pm, "dick.dunbar" <dick.dun...@gmail.com> wrote:

> public:
> enum _Tag {_S_leaf, _S_concat, _S_substringfn, _S_function};
> volatile int _M_ref_count; // 32-bit int used in tcpip to
> 32-bit clients
> _Tag _M_tag:8; // 8-bit char is as large as
> we need
> int _M_incr() {return fetch_and_add(&_M_ref_count, 1);}
> int _M_decr() {return fetch_and_add(&_M_ref_count,-1);}
>
> };

Umm, *you* made '_M_ref_count' and '_M_tag' parts of the same storage
object. How is it the compiler's fault that this is not what you
meant?

Suppose you had a platform that did not have atomic 8-bit operations,
but you did this:
volatile char a;
char b;

What could 'a=8;' do but read in both a and b, modify the 'a' part,
and write it back? What could 'b=8;' do but read in both a and b,
modify the 'b' part, and write it back?

Even if the platform has small atomic operations, it is never required
to use them. That's why you *had* to use atomic operations for
_M_ref_count. Surely the platform wasn't required to somehow determine
that it was important to you that it do so for _M_tag too?

Sorry, it's your bug. You assumed that manipulating one part of an
object wouldn't affect another part. In analogous cases, it is clear
that this will never be the case. (Consider successive character
variables on a platform that has no 8-bit atomic operations.)

So you relied on something that was not guaranteed by anything.

DO NOT EVER DO THAT. The optimization is not worth the ultimate
suffering when the assumption turns out not to be true on some other
platform or with some other compiler.

If you needed to do this as an optimization because of a platform
where it was proven to be required and safe, why did you enable it on
*this* platform?

No standard guarantees this, hence it's a platform-specific
optimization to be used only where it is known to help and known to be
safe.

DS

dick.dunbar

unread,
Apr 30, 2007, 11:17:41 PM4/30/07
to
On Apr 30, 4:44 pm, Barry Kelly <barry.j.ke...@gmail.com> wrote:

> My first thought: the C++ standard says next to nothing about threads,
> so you can't rely on it much for supporting evidence. Second thing, if a
> machine doesn't have simple byte-addressable memory, then a compiler
> will need to do the kind of stuff it sounds like the IBM compiler is
> doing: load a larger unit, modify it, then save it back down. This works
> transparently in single-threaded code.

The compiler didn't need to (and doesn't) combine these fields in an
unexpected way. It has byte addressable storage.
I did _not_ expect the compiler to modify two fields when the source
only
named one field.

>
> In summary: I think it's questionable that the compiler acts differently
> for a bitfield of width 8 bits (and doesn't straddle different bytes)
> and an 8-bit char, but I can't say that it is incorrect.

See my second post in this thread. If we declare the object as "char
c" and not "char c:8",
then the compiler generates the expected code.


dick.dunbar

unread,
Apr 30, 2007, 11:29:34 PM4/30/07
to
On Apr 30, 5:46 pm, Thomas Richter <t...@mersenne.math.TU-Berlin.DE>
wrote:

> > The Bug:
> > -------
> > 1. ref_count is incremented/decremented consistently using AIX atomic
> > ops
>
> Thus, test_and_set() works according to what was intented by the
> vendor.

Yes, of course. That's my point.

>
> > 2. _M_tag assignment causes both ref_count and _tag to be loaded into
> > a 64-bit register, updated by twiddling bits, and stored non-
> > atomicly.
>
> Well, if that is the only way how to access an 8 bit field on the
> architecture, that's it. If you want thread-safe access on a
> structure, use a mutex, hopefully provided by the implementation.

It is not the only way to accomplish this, and if we stop defining
things in
terms of the number of bits required for storage, the compiler
generates
the proper byte-oriented modifications to the field we specified in
our source.


> > -------
> > In a multithreaded program, updating a bitfield variable may affect
> > the values of other variables.
>
> That is definitely a problem. It is also a problem the C++ standardization
> committee is aware of, see the discussion in comp.lang.c++.moderated.
> I think it is really that: On some architectures, it is unreasonable
> to expect that you can access structure units atomically (without adding
> much overhead). If you need to be thread safe, you should protect the whole
> thing by a mutex.

But the compiler does have other options for updating byte fields,
and using those techniques as long as we don't declare a bit-field in
a structure.
Not only is the generated code to store one byte of constant data
destructive
of other fields ... it is also enormously inefficient. I haven't
posted the generated code.


> I don't think the C++ standard has a word to say about threads. Note
> that a C++ compiler is always free to operate on an "as if" basis, i.e.
> its implementation might do whatever it wants as long as the observed
> result is the same. And as C++ only addresses single-threaded programs,
> observed behaviour is only defined in this sense. Thus, your program
> has been proven to work correctly for single-threaded by the compiler,
> and thus the compiler is fine.

I'm still flabbergasted that reasonable people come to this
conclusion.

Note: The problem isn't confined to threaded programs, if you were to
build a 64-bit device driver that modified adapter storage in a
particular
way, this would be completely unexpected that the compiler should
modify
two fields when one was specified.

>
> > IBM Response:
> > ------------
> > Section 9.6 of ISO14882 does not specify the mandatory bit-field
> > implementation in multithreaded environment.
>
> A bit "off" I would call that. The problem is that the standard does
> not say anything about multithreading. Leave alone how bitfields are
> supposed to work..

I agree. I didn't bring the C++ standard into the discussion, IBM
did.

>
> > The standard expects you to do something rational on your hardware.
> > IBM can do anything they like to bit-fields, but to subsume
> > non-bitfield POD data into your BITCONTAINER is wrong.
>
> No, why? You seem to imply that an "allocation unit" itself is something
> that can be addressed in a thread-safe way. There is no such guarantee
> in the standard. An allocation unit might well be a byte, even if the
> processor has no provisions to access that unit atomically.

Byte modification is allowed in the PowerPC architecture.
The compiler had no reason to use an algorithm like that.


>
> > So what do you think? Our bug or theirs?
>
> Your bug to depend on something that isn't specified. But unfortunately,
> C++ doesn't specify anything here, and still one needs multithreading.

Wow. Forget the standard. This code runs flawlessly on 32-bit AIX,
and 32/64-bit on every other platform.

You think specifying the alteration of a single character of storage
is depending
on some undefined behaviour? How about common sense here??

dick.dunbar

unread,
Apr 30, 2007, 11:40:43 PM4/30/07
to
On Apr 30, 6:53 pm, David Schwartz <dav...@webmaster.com> wrote:
>
> Umm, *you* made '_M_ref_count' and '_M_tag' parts of the same storage
> object. How is it the compiler's fault that this is not what you
> meant?

Huh. I meant EXACTLY that. Class objects in C++ are an abstraction.
We "lump things together" that it makes sense to, from the description
of the problem space we're defining.

>
> Suppose you had a platform that did not have atomic 8-bit operations,
> but you did this:
> volatile char a;
> char b;

Let's not get into wild "Suppose" speculations here.
The PowerPC does not have 8-bit atomic operations, except by clever
usage of lwarx/stwcx.

And no atomic modification of the 8-bit char object was expressed or
implied in the source code. We meant for a single byte to be modified
to a specific value when we know we are the only owner of such an
object.

>
> What could 'a=8;' do but read in both a and b, modify the 'a' part,
> and write it back? What could 'b=8;' do but read in both a and b,
> modify the 'b' part, and write it back?

On PowerPC, it would:
li R0,1 // Load Immediate into Register 0 the value of 1
stb R0,address // Store that single byte into the field specified by
the source

>
> Even if the platform has small atomic operations, it is never required
> to use them. That's why you *had* to use atomic operations for
> _M_ref_count. Surely the platform wasn't required to somehow determine
> that it was important to you that it do so for _M_tag too?

I didn't expect it to do any atomic operations on _M_tag at all.
This is part of making an application "thread efficient".

The problem is the compiler modified two fields when only one was
specified
in our source code. I expect it to modify the single byte specified
by the
source, without modifying anything else. And to do this non-
atomicly.
Atomic update is not required at all.

If every field in a threaded program required mutex or atomic
semantics
for modification, nothing would work efficiently.

>
> Sorry, it's your bug. You assumed that manipulating one part of an
> object wouldn't affect another part. In analogous cases, it is clear
> that this will never be the case. (Consider successive character
> variables on a platform that has no 8-bit atomic operations.)
>
> So you relied on something that was not guaranteed by anything.

Exactly what do you think I relied on? That the compiler should
alter
the byte I specified and nothing else. Oh yes, I rely on that every
day.

>
> DO NOT EVER DO THAT. The optimization is not worth the ultimate
> suffering when the assumption turns out not to be true on some other
> platform or with some other compiler.

This not an optimization. Our code works exactly as we intended on
AIX 32-bit
and every other platform in 64-bit. The implication is that we did
something
wrong in our source. I don't believe it.

>
> If you needed to do this as an optimization because of a platform
> where it was proven to be required and safe, why did you enable it on
> *this* platform?

Sorry, you completely lost me. We made no such assumptions for AIX.
We didn't enable anything specific for this platform.

And, you make the same argument that I've heard from a number of IBM
contacts:

"If your code doesn't work in a threaded environment, it must be
something you
did wrong, because our compiler doesn't make any mistakes."

Wow. Talk about naiive.

Barry Kelly

unread,
May 1, 2007, 12:25:26 AM5/1/07
to
dick.dunbar wrote:

> Exactly what do you think I relied on? That the compiler should
> alter
> the byte I specified and nothing else. Oh yes, I rely on that every
> day.

And that's what it does do, when single-threaded, which is all C++
guarantees for you. I think IBM's case is pretty air-tight, when viewed
as a defect. Without knowing all the details, I suspect it may not be
optimal, however.

> > DO NOT EVER DO THAT. The optimization is not worth the ultimate
> > suffering when the assumption turns out not to be true on some other
> > platform or with some other compiler.
>
> This not an optimization. Our code works exactly as we intended on
> AIX 32-bit
> and every other platform in 64-bit. The implication is that we did
> something
> wrong in our source. I don't believe it.

I'm fairly sure you're not going to win this battle. You need to accept
that sometimes you make mistakes too :)

> And, you make the same argument that I've heard from a number of IBM
> contacts:
>
> "If your code doesn't work in a threaded environment, it must be
> something you
> did wrong, because our compiler doesn't make any mistakes."
>
> Wow. Talk about naiive.

I'm pretty sure that isn't a verbatim quote, because I can't imagine a
compiler engineer or QA engineer saying "our compiler doesn't make any
mistakes" - shipping commercial compilers have hundreds, if not
thousands, of very minor corner case bugs, and it's even more true when
the language is as subtle and complex as C++. I say that working as I do
on the Delphi compiler at Borland.

But it's also ironic, because it seems that you yourself cannot admit
the possibility of error on your own part!

rfis...@gmail.com

unread,
May 1, 2007, 4:33:42 AM5/1/07
to
dick.dunbar ha scritto:

> Boolean Buyers Beware ... AIX compiler bug --- PMR 26241,756.
>
> Our code runs flawlessly in 32-bit mode.

[...]

> _Tag _M_tag:8; // 8-bit char is as large as

There's your problem! Looks like you're using bitfields.
Those things are basically undefined. Those people who
do use them usually do so to get sub-byte ints, but yours is 1 byte...
Why not just use a 1 byte int, something like u_int8_t from types.h?

P.S. I'm pretty sure it was Bjarne who said bitfields
were for "idiots and children".

RF

--
felix: lightweight threads, HOFs and game scripting.
svn co https://svn.sourceforge.net/svnroot/felix/felix/trunk felix

Gil Hamilton

unread,
May 1, 2007, 7:48:05 AM5/1/07
to
"dick.dunbar" <dick....@gmail.com> wrote in
news:1177960498.3...@o5g2000hsb.googlegroups.com:

> class NQNString {
> public:
> enum _Tag {_S_leaf, _S_concat, _S_substringfn, _S_function};
> volatile int _M_ref_count; // 32-bit int used in tcpip to
> 32-bit clients
> _Tag _M_tag:8; // 8-bit char is as large as

[snip]
> };

> 2. _M_tag assignment causes both ref_count and _tag to be loaded into
> a 64-bit register, updated by twiddling bits, and stored non-
> atomicly.
> ref_count is modified when only _tag was specified in the source.

OK. I'm going to go against the flow here and express sympathy for your
position. :-)

I think IBM has erred with their cavalier treatment of a volatile. From
the C++ standard at
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2135.pdf
(section 1.9):

6 The observable behavior of the abstract machine is its
sequence of reads and writes to volatile data and calls
to library I/O functions.(8)

7 Accessing an object designated by a volatile lvalue
[other side effects] are all side effects, which are
changes in the state of the execution environment.

There is also this from section 7.1.5.1 paragraph 7:

[ Note: volatile is a hint to the implementation to avoid
aggressive optimization involving the object because the
value of the object might be changed by means undetectable
by an implementation. See 1.9 for detailed semantics. In
general, the semantics of volatile are intended to be the
same in C++ as they are in C. 容nd note ]

Which would lead to this reference in section 6.7.3 paragraph 6 of the C
standard (May 6, 2005 draft):

6 An object that has volatile-qualified type may be modified
in ways unknown to the implementation or have other unknown
side effects. Therefore any expression referring to such an
object shall be evaluated strictly according to the rules of
the abstract machine, as described in 5.1.2.3. Furthermore,
at every sequence point the value last stored in the object
shall agree with that prescribed by the abstract machine,
except as modified by the unknown factors mentioned
previously.(114) What constitutes an access to an object that
has volatile-qualified type is implementation-defined.

By loading up and modifying _M_ref_count along with _M_tag, they are
violating the "rules of the abstract machine" with respect to the
volatile variable; i.e. it has been modified in ways other than those
prescribed by the abstract machine with a resulting change to the
"observable behavior".

Now everything in the standard(s) concerning implementation of volatile
leaves plenty of wiggle room for interpretation so it's difficult to say
that they have actually violated the standard even so. However, I'm
quite comfortable saying "it's broken".

Frankly, I don't understand why a compiler implementer would ever allow a
volatile to share an "atomic" data container with another variable at
all. The way to handle this in the compiler is simply to align the bit
field (or whatever the next variable is) to the next atomic boundary.
Structure alignment padding is something programmers know to expect and
account for while having a volatile loaded up along with another field
and stored back under the covers violates the law of programmer
expectations. It's just a stupid thing for the compiler developer to do.

As much as I sympathize, however, the bottom line is that I don't think
you're going to win the battle. You probably need to move on after
placing a big warning comment in the code (wherein you say rude things
about IBM's implementation).

GH

Thomas Richter

unread,
May 1, 2007, 9:33:23 AM5/1/07
to
dick.dunbar <dick....@gmail.com> wrote:

>> Well, if that is the only way how to access an 8 bit field on the
>> architecture, that's it. If you want thread-safe access on a
>> structure, use a mutex, hopefully provided by the implementation.
>
> It is not the only way to accomplish this, and if we stop defining
> things in
> terms of the number of bits required for storage, the compiler
> generates
> the proper byte-oriented modifications to the field we specified in
> our source.

Probably, but probably its the most efficient way of doing it. And
note, the compiler *is* allowed to do that by acting on the basis of
the "as-if" rule.

> But the compiler does have other options for updating byte fields,
> and using those techniques as long as we don't declare a bit-field in
> a structure.

But there is no requirement to choose one specific method over
another.

> Not only is the generated code to store one byte of constant data
> destructive
> of other fields ... it is also enormously inefficient. I haven't
> posted the generated code.

I don't know. Maybe not. On some architectures, an aligned 64bit
transfer is easier than an unaligned byte transfer.

>> I don't think the C++ standard has a word to say about threads. Note
>> that a C++ compiler is always free to operate on an "as if" basis, i.e.
>> its implementation might do whatever it wants as long as the observed
>> result is the same. And as C++ only addresses single-threaded programs,
>> observed behaviour is only defined in this sense. Thus, your program
>> has been proven to work correctly for single-threaded by the compiler,
>> and thus the compiler is fine.
>
> I'm still flabbergasted that reasonable people come to this
> conclusion.

On which ground would you want to understand this a compiler bug.
A compiler bug is the failure of the compiler to comply to the
language definition. The language definition, however, defines only
a single-threaded execution model. A compiler can do anything it
likes for multithreading as far as the standard is concerned. Thus,
the compiler is well-behaiving *in terms of standard C++*.

If the vendor provides any guarantees beyond that standard, and
the compiler breaks those rules, then you also found a bug. But
aparently bitfields are, according to the IBM specs, not expected
to work reasonable with multithreaded code, there is no atomic
access.

> Note: The problem isn't confined to threaded programs, if you were to
> build a 64-bit device driver that modified adapter storage in a
> particular
> way, this would be completely unexpected that the compiler should
> modify
> two fields when one was specified.

And again, the standard says nothing about that. If you access any
I/O devices thru pointers, you're up to yourself and what your
vendor guarantees. The virtual machine defined by the C++ standard
again doesn't say a word about I/O devices. You *may* get around
this problem then by "hinting" the compiler by the "volatile" keyword,
but again, you're still in "undefined" land.

> I agree. I didn't bring the C++ standard into the discussion, IBM
> did.

They did, but that's *one* basis on which one can define whether a compiler
is working correctly or not. Thus, it *is* conforming to the C++ standard.
If you want to do something that is not covered by the standard, well,
you're up to yourself.

Maybe the compiler isn't conforming to its own (IBM) specs, but for that, you
need to point IBM to their own manual where they (potentially) claim
how it handles multithreading. But again, likely IBM did not specify
anything at all in this corner.

>> > The standard expects you to do something rational on your hardware.
>> > IBM can do anything they like to bit-fields, but to subsume
>> > non-bitfield POD data into your BITCONTAINER is wrong.
>>
>> No, why? You seem to imply that an "allocation unit" itself is something
>> that can be addressed in a thread-safe way. There is no such guarantee
>> in the standard. An allocation unit might well be a byte, even if the
>> processor has no provisions to access that unit atomically.
>
> Byte modification is allowed in the PowerPC architecture.
> The compiler had no reason to use an algorithm like that.

Aparently, it had. It is probably an optimization strategy. Even then,
it does not have have a reason provided it can prove that the code
works on the C++ "virtual machine". And that it does. The VM is single
threaded (and has no I/O devices).

>> > So what do you think? Our bug or theirs?
>>
>> Your bug to depend on something that isn't specified. But unfortunately,
>> C++ doesn't specify anything here, and still one needs multithreading.
>
> Wow. Forget the standard. This code runs flawlessly on 32-bit AIX,
> and 32/64-bit on every other platform.

Ok, but on which other ground do you want to claim that the compiler
has a bug? The only other way I can think of is to point IBM to their
own manual, but I doubt you'll find anything. Thus, you're having
expectations on the compiler that were never written down and spelled
out, and it only failed to comply to your own expectations. Bad luck.

> You think specifying the alteration of a single character of storage
> is depending
> on some undefined behaviour?

No. I'm saying that simultaneous access from several threads to a single
character is undefined behaviour. And if your program were single-threaded,
it would have worked fine.

> How about common sense here??

It's not. It is the failure of the standard not to cover multithreaded
code. It's being worked on, as said. From the discussions I followed
in comp.lang.c++.moderated (I'm not in the C++ ISO committee), I got
the impression that even with the enhanced standard, atomic access
to bitfields is nothing that will be covered in the future, though.
The reason is that C++ compilers might want to apply optimization
techniques to them, and on several platforms, there is also no way
how to make an access to one field of a structure without also
accessing neighbouring fields.

So long,
Thomas

David Hopwood

unread,
May 1, 2007, 12:06:04 PM5/1/07
to
dick.dunbar wrote:
> --------------------
> class NQNString {
> public:
> enum _Tag {_S_leaf, _S_concat, _S_substringfn, _S_function};
> volatile int _M_ref_count; // 32-bit int used in tcpip to
> 32-bit clients
> _Tag _M_tag:8; // 8-bit char is as large as
> we need
> int _M_incr() {return fetch_and_add(&_M_ref_count, 1);}
> int _M_decr() {return fetch_and_add(&_M_ref_count,-1);}
> };
[...]

> The Bug:
> -------
> 1. ref_count is incremented/decremented consistently using AIX atomic
> ops
>
> 2. _M_tag assignment causes both ref_count and _tag to be loaded into
> a 64-bit register, updated by twiddling bits, and stored non-
> atomicly.
> ref_count is modified when only _tag was specified in the source.
>
> 3. If multiple threads execute this code, it is only a matter of time
> before a thread updates the count (1 -> 2), and another thread
> loads the same count (1), updates the _Tag field, and stores the
> 1 back. When the first thread decrements, the count now goes from
> 1 to zero, when the expectation was from 2 to 1.
>
> 4. The reason for this is an IBM compiler construct called
> BITCONTAINER, where the volatile int and bit-field are manipulated
> as a unit.
>
> PMR 26241,756.
> -------------
> IBM identified a source work-around, and claimed the problem was our
> source.

It isn't. I think it is fairly clear that the compiler is violating the
intended semantics of 'volatile' in POSIX and/or C++.

(Strictly speaking, there is no applicable standard for multi-threaded C++,
but it's likely that the C compiler does the same thing, and that the
C++ compiler does the same thing for single-threaded programs.)

The standard line in this newsgroup is that 'volatile' has no well-defined
semantics under multi-threading. While that may be true, the problem here
has nothing specifically to do with multi-threading. The 'volatile'
declaration is supposed to guarantee that there are no accesses to
_M_ref_count that are not explicit in the program. The only wiggle room
that an implementor has here is that "What constitutes an access to an object
that has volatile-qualified type is implementation-defined" (that is the
wording in section 6.7.3 #6 of the C99 standard, but section 7.1.5.1 of
the C++ standard says: "In general, the semantics of volatile are intended
to be the same in C++ as they are in C.")

So, go look for the documentation that specifies what constitutes a volatile
access on this platform (ask IBM if you can't find it):

- if there is no such documentation, then the C99 and C++ standards are
being violated, since they require implementation-defined things to
be documented.

- if the documentation says something to the effect that in this case, an
access means that some machine instruction reads or writes the virtual
memory that holds the representation of _M_ref_count, then it's clear
that semantics of volatile (C99 section 6.7.3) are being violated.

- if the documentation essentially makes 'volatile' meaningless, then that's
a big problem for the AIX platform, since many, many programs rely on
it.

> A TechNote was published but fails to relate to our 64-bit experience.
>
> http://www-1.ibm.com/support/docview.wss?rs=2239&uid=swg21259159

The TechNote doesn't cover the 'volatile' case. It is clear that if
_M_ref_count were not declared volatile, then the word tearing would be
allowed.

That's the wrong track. Sections 7.1.5.1 of C++ and 6.7.3 of C99 are what
you want.

--
David Hopwood <david....@industrial-designers.co.uk>

Peter Dimov

unread,
May 1, 2007, 12:11:32 PM5/1/07
to
On Apr 30, 10:14 pm, "dick.dunbar" <dick.dun...@gmail.com> wrote:

> class NQNString {
> public:
> enum _Tag {_S_leaf, _S_concat, _S_substringfn, _S_function};
> volatile int _M_ref_count; // 32-bit int used in tcpip to
> 32-bit clients
> _Tag _M_tag:8; // 8-bit char is as large as
> we need
> int _M_incr() {return fetch_and_add(&_M_ref_count, 1);}
> int _M_decr() {return fetch_and_add(&_M_ref_count,-1);}
>
> };

...

> So what do you think? Our bug or theirs?

Theirs because of the volatile, as Gil Hamilton pointed out.

In the upcoming C++0x standard it would likely be required to work
even without a volatile:

http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2007/n2171.htm#location

You are using reserved identifiers, by the way. The compiler is
allowed to have a macro _M_incr or _Tag and break your code.

dick.dunbar

unread,
May 1, 2007, 12:20:34 PM5/1/07
to
"First things first, but not necessarily in that order." - Doctor Who

I am trying to convince others this is a serious bug, but it doesn't
really matter if you agree or not.
This was a _serious_ bug that held up shipment of our code for a few
weeks.
I didn't write the code ... in fact, it comes from STLport, but I
don't see how I could have expressed
the intent any better and have it work on all compilers.

Let's simplify, and show 3 code generations by changing the class
declarations.

The three relevant data items in the class are:

volatile int Ref_count; // Updated with atomic routines
_Tag T:8; // An enumeration container 1-
byte wide
char* P; // Pointer to a char string

And the source:
gs.T = NQNString::_S_concat; // Set T to a constant (enum)
value

_Tag T:8 ... contains the values of an enumeration. The :8 tells the
compiler to use a byte.

The IBM compiler treats this as a bit-field, and creates two internal
BITCONTAINER objects:
+0 Ref_count BITCONTAINER@0
+4 T
+5 3 bytes of Padding
+8 64-bit pointer to char BITCONTAINER@8

The generated code for "gs.T = " is 5 instructions, 2 storage touches,
8 bytes modified non-atomicly

L8 gr5=gs.BITCONTAINER@0(gr3,0)
LI gr4=1
RN8 gr5=gr5,56,0xFFFFFFFFFFFFFFFF
RI8 gr4=gr5,8,gr4,0xFFFFFFFFFFFFFF00
ST8 gs.BITCONTAINER@0(gr3,0)=gr4

If you remove the "char *p" pointer from the class, the generated
code:
4 instructions, 2 storage touches, 4 bytes altered

L4 gr5=gs.BITCONTAINER@4(gr3,4)
LI gr4=1
RI4 gr4=gr5,0r4,0xFFFFFF00
ST4 gs.BITCONTAINER@4(gr3,4)=gr4

Obvious question: So why did the first code manipulate 8 bytes and
this code manipulate 4
using exactly the same source line?
If the original generated code had manipulated 4-bytes instead of 8-
byte, we never would have
discovered another error (next post).

If you change the declaration of T:

char T;

The generated code becomes:

LI gr4=1
STB gs.c@7(gr3,7)=gr4

Clearly, the intent of our source was to modify a single byte with a
constant,
and this does it in 2 instructions with one storage touch.

You still don't see a bug? Bools Beware.

dick.dunbar

unread,
May 1, 2007, 12:37:33 PM5/1/07
to
The light at the end of the tunnel has been extinquished to save
energy.

There is a happier ending to this story than you might imagine, and my
reason for
posting is to share the discovery.

Once we discovered the BITCONTAINER compiler construct, I compiled our
application with xlC_r flags -qsource -qlist.

This shows the generated code, which I posted above. I used grep to
investigate all of the BITCONTAINER references, particularly if it
involved a LoadDouble (LD or L8) instruction.

It turned out to be a whole bunch; the bulk due to this single byte
width specification for the enumeration.

When we rebuilt our application and ran it under stress, an
astonishing
thing happened ... for the first time the application actually scaled
on AIX similiar to our other hardware platforms.

The lack of scaling was observed by using AIX "vmstat" command.
The first two columns are "r" (Runnable), and "b" (Blocked).
Prior to this change using a 200-thread pool, 175+ threads
would be in the "Run" queue.

Clearly, we're not going to scale with that kind of processor demand.
What could be wrong? I hunted and hunted through our use of threaded
constructs, and never found anything I could pin the fault to.

With this single change, the number of threads in the run queue is now
about 5-6 ... what we expected from our application.

Such things are hard to observe, but it is hard to avoid the
conclusion that
these inefficient storage modification algorithms are the root cause
of
processor contention and stalls in the processor pipeline.

Is it a bug? Who cares, there's a happy ending. Happy Hunting.

ma...@pulsesoft.com

unread,
May 1, 2007, 12:49:11 PM5/1/07
to
Peter Dimov <pdi...@gmail.com> writes:

Care to cite relevant part of standard on this? This indeed violates
rules imposed by _C_ standard, but last time i tried to read relevant
parts of C++ draft i remember only identifiers with double underscore
(regardless of position) being reserved.

--
vale

dick.dunbar

unread,
May 1, 2007, 1:15:26 PM5/1/07
to
On May 1, 4:48 am, Gil Hamilton <gil_hamil...@hotmail.com> wrote:

> "dick.dunbar" <dick.dun...@gmail.com> wrote innews:1177960498.3...@o5g2000hsb.googlegroups.com:
> I think IBM has erred with their cavalier treatment of a volatile. From
> the C++ standard athttp://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2135.pdf

> (section 1.9):
>
> 6 The observable behavior of the abstract machine is its
> sequence of reads and writes to volatile data and calls
> to library I/O functions.(8)
>
> 7 Accessing an object designated by a volatile lvalue
> [other side effects] are all side effects, which are
> changes in the state of the execution environment.
>
> There is also this from section 7.1.5.1 paragraph 7:
>
> [ Note: volatile is a hint to the implementation to avoid
> aggressive optimization involving the object because the
> value of the object might be changed by means undetectable
> by an implementation. See 1.9 for detailed semantics. In
> general, the semantics of volatile are intended to be the
> same in C++ as they are in C. -end note ]

>
> Which would lead to this reference in section 6.7.3 paragraph 6 of the C
> standard (May 6, 2005 draft):
>
> 6 An object that has volatile-qualified type may be modified
> in ways unknown to the implementation or have other unknown
> side effects. Therefore any expression referring to such an
> object shall be evaluated strictly according to the rules of
> the abstract machine, as described in 5.1.2.3. Furthermore,
> at every sequence point the value last stored in the object
> shall agree with that prescribed by the abstract machine,
> except as modified by the unknown factors mentioned
> previously.(114) What constitutes an access to an object that
> has volatile-qualified type is implementation-defined.
>
> By loading up and modifying _M_ref_count along with _M_tag, they are
> violating the "rules of the abstract machine" with respect to the
> volatile variable; i.e. it has been modified in ways other than those
> prescribed by the abstract machine with a resulting change to the
> "observable behavior".
>
> Now everything in the standard(s) concerning implementation of volatile
> leaves plenty of wiggle room for interpretation so it's difficult to say
> that they have actually violated the standard even so. However, I'm
> quite comfortable saying "it's broken".

Thank you for your insight on this. I went down the "volatile"
argument path with them, but didn't have the "standard-legal-
background"
to pursue it further. Those are valuable references.

>
> Frankly, I don't understand why a compiler implementer would ever allow a
> volatile to share an "atomic" data container with another variable at
> all. The way to handle this in the compiler is simply to align the bit
> field (or whatever the next variable is) to the next atomic boundary.
> Structure alignment padding is something programmers know to expect and
> account for while having a volatile loaded up along with another field
> and stored back under the covers violates the law of programmer
> expectations. It's just a stupid thing for the compiler developer to do.

Agreed. Long email exchanges around this point; they weren't
convinced.
But I'm not in agreement about the alignment issues, because it
involves
an enlarged footprint for the application for the compiler to pad in
such a
manner. I think they just didn't use good algorithms in the
generation
of the code. They had the ability to manipulate a word, they just
didn't do it.

It's mostly the efficiency that I object to. Stupid to read data that
you can modify directly.

>
> As much as I sympathize, however, the bottom line is that I don't think
> you're going to win the battle. You probably need to move on after
> placing a big warning comment in the code (wherein you say rude things
> about IBM's implementation).

I would do that if I hadn't worked 30 years for IBM. Things have gone
downhill since I left :-)


dick.dunbar

unread,
May 1, 2007, 1:20:27 PM5/1/07
to
On May 1, 9:49 am, m...@pulsesoft.com wrote:
> Peter Dimov <pdi...@gmail.com> writes:
> > On Apr 30, 10:14 pm, "dick.dunbar" <dick.dun...@gmail.com> wrote:
>
> > > class NQNString {
> > > public:
> > > enum _Tag {_S_leaf, _S_concat, _S_substringfn, _S_function};
> > > volatile int _M_ref_count; // 32-bit int used in tcpip to
> > > 32-bit clients
> > > _Tag _M_tag:8; // 8-bit char is as large as
> > > we need
> > > int _M_incr() {return fetch_and_add(&_M_ref_count, 1);}
> > > int _M_decr() {return fetch_and_add(&_M_ref_count,-1);}
>
> > > };
>
> > ...
>
> > > So what do you think? Our bug or theirs?
>
> > Theirs because of the volatile, as Gil Hamilton pointed out.
>
> > In the upcoming C++0x standard it would likely be required to work
> > even without a volatile:
>
> >http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2007/n2171.htm#loc...

>
> > You are using reserved identifiers, by the way. The compiler is
> > allowed to have a macro _M_incr or _Tag and break your code.
>
> Care to cite relevant part of standard on this? This indeed violates
> rules imposed by _C_ standard, but last time i tried to read relevant
> parts of C++ draft i remember only identifiers with double underscore
> (regardless of position) being reserved.

I do remember the same thing about reserved words to the compiler.
The STLport code was intended to be adopted as part of a compiler
component, so I don't fault the author for choosing this mechanism.
Such things are easily discovered using compiler diagnostics,
and we protect ourselves by using a unique namespace.
With nightly builds, this issue doesn't stay buried for long.

dick.dunbar

unread,
May 1, 2007, 1:34:52 PM5/1/07
to
On May 1, 1:33 am, "rfist...@gmail.com" <rfist...@gmail.com> wrote:
> dick.dunbar ha scritto:

> > _Tag _M_tag:8; // 8-bit char is as large as
>
> There's your problem! Looks like you're using bitfields.
> Those things are basically undefined. Those people who
> do use them usually do so to get sub-byte ints, but yours is 1 byte...
> Why not just use a 1 byte int, something like u_int8_t from types.h?

Indeed it is the problem. See my 3 code examples.

The reason for using "_Tag" is to clearly indicate this field will
only
contain the values of an enumeration. Valuable to know when you're
reading code.

I believe the problem is that that this designation is used for two
different purposes:
- To define bit fields ... had I used int or bool, I would clearly be
doing that.
- To define container widths ... that's what we are doing here.

The compiler has the freedom to define whatever width it wants for
an enumeration. We know our data and there are fewer than 255
values, so :8 is a clear choice for this purpose.

There's a compiler option "-qenum=1", but obviously we can't adopt
that
for our whole application ... and then there are these other platforms
to worry about. _Tag T:8 is the clearest, most uniform way to
express the intent.

The fact that the compiler used this as a bit-field, without
investigating
further the usage ... and specifically, that the entire field was
being assigned a constant ... is a thought-error.

>
> P.S. I'm pretty sure it was Bjarne who said bitfields
> were for "idiots and children".

Bjarne Stroustrup, C++ Programming Language, Special Edition.

pg 125: Convenient bit manipulation can be very important, but for
reliability,
maintainability, portability, etc, it should be kept at low levels of
the system.

16.3.11 vector<bool> implemented with int's.
17.5.3 bitset ... addressing a single bit

dick.dunbar

unread,
May 1, 2007, 1:49:05 PM5/1/07
to
On Apr 30, 9:25 pm, Barry Kelly <barry.j.ke...@gmail.com> wrote:

> But it's also ironic, because it seems that you yourself cannot admit
> the possibility of error on your own part!

If you were to explain the error that I have made, I'd be happy to
adjust.

The reason for putting this out in a public forum is to gain
community input. No need to attack me (yet).

It isn't even my code, so I don't have nearly the attachment to it
that you seem to imply.

What we have done is to change our code for the personality
of a particular compiler, and I am always opposed to that,
particularly when it obscures the semantics of the code.

Our code base is about 99% generic ... builds and runs everywhere.

dick.dunbar

unread,
May 1, 2007, 1:56:50 PM5/1/07
to
On May 1, 6:33 am, Thomas Richter <t...@mersenne.math.TU-Berlin.DE>
wrote:

> > You think specifying the alteration of a single character of storage
> > is depending on some undefined behaviour?
>
> No. I'm saying that simultaneous access from several threads to a single
> character is undefined behaviour. And if your program were single-threaded,
> it would have worked fine.

Thomas, this is a very thoughtful contribution. Thanks.
We agree, except for the snippets above.

Simultaneous access to the single byte was never an issue.
IBM kept trying to move the discussion in that direction ... that I've
altered a single byte without using a mutex, but in fact, on the
PowerPC
this is perfectly allowable and safe.

The issue was that in the process of modifying that single byte, they
also modified a volatile int that we did not intend to be modified.

That is the crux of this problem. Not atomic updates to a single
byte,
but the alteration of adjacent fields.

Peter Dimov

unread,
May 1, 2007, 2:11:51 PM5/1/07
to
On May 1, 7:49 pm, m...@pulsesoft.com wrote:
> Peter Dimov <pdi...@gmail.com> writes:

> > You are using reserved identifiers, by the way. The compiler is
> > allowed to have a macro _M_incr or _Tag and break your code.
>
> Care to cite relevant part of standard on this? This indeed violates
> rules imposed by _C_ standard, but last time i tried to read relevant
> parts of C++ draft i remember only identifiers with double underscore
> (regardless of position) being reserved.

"Each name that contains a double underscore _ _ or begins with an
underscore followed by an uppercase letter (2.11) is reserved to the
implementation for any use." - 17.4.3.1.2 [global.names].

I didn't realize that the code originated from STLPort, though. It
makes sense for a standard library replacement to use such a
convention. Sorry for the distraction.

David Schwartz

unread,
May 1, 2007, 2:18:21 PM5/1/07
to
On Apr 30, 8:40 pm, "dick.dunbar" <dick.dun...@gmail.com> wrote:

> On Apr 30, 6:53 pm, David Schwartz <dav...@webmaster.com> wrote:

> > Umm, *you* made '_M_ref_count' and '_M_tag' parts of the same storage
> > object. How is it the compiler's fault that this is not what you
> > meant?

> Huh. I meant EXACTLY that. Class objects in C++ are an abstraction.
> We "lump things together" that it makes sense to, from the description
> of the problem space we're defining.

No, that's not what you meant. You meant to be able to modify one
without having any effect on the other. Again, consider:

char a, b;

Is it reasonable to expect the compiler to be able to modify 'a'
without reading and writing back 'b'?

The compiler is violating a dependency that it cannot possibly know
about. How is that the compiler's fault?

> > Suppose you had a platform that did not have atomic 8-bit operations,
> > but you did this:
> > volatile char a;
> > char b;
>
> Let's not get into wild "Suppose" speculations here.
> The PowerPC does not have 8-bit atomic operations, except by clever
> usage of lwarx/stwcx.

You are missing my point. The issue is whether or not your code is
guaranteed to work. If perfectly analogous code on another platform is
not guaranteed to work then your code is not guaranteed to work on
your platform. Either you have a guarantee or you do not.

> And no atomic modification of the 8-bit char object was expressed or
> implied in the source code. We meant for a single byte to be modified
> to a specific value when we know we are the only owner of such an
> object.

Except that you had a dependency that the compiler did not know about.
You could have told the compiler about that dependency (by using
mutexes or separate storage objects) but you did not.

> > What could 'a=8;' do but read in both a and b, modify the 'a' part,
> > and write it back? What could 'b=8;' do but read in both a and b,
> > modify the 'b' part, and write it back?

> On PowerPC, it would:
> li R0,1 // Load Immediate into Register 0 the value of 1
> stb R0,address // Store that single byte into the field specified by
> the source

This isn't a PowerPC question. This is a compiler question. We are
asking about what the compiler is required to do, not what it could
do. Yes, if the compiler somehow knew about this secret dependency, it
is possible for it to do the right thing. But there are cases where
even if it did, it could not do the right thing, so it can't be
required to.

> > Even if the platform has small atomic operations, it is never required
> > to use them. That's why you *had* to use atomic operations for
> > _M_ref_count. Surely the platform wasn't required to somehow determine
> > that it was important to you that it do so for _M_tag too?

> I didn't expect it to do any atomic operations on _M_tag at all.
> This is part of making an application "thread efficient".

Yes, you did expect it to.

> The problem is the compiler modified two fields when only one was
> specified
> in our source code. I expect it to modify the single byte specified
> by the
> source, without modifying anything else. And to do this non-
> atomicly.
> Atomic update is not required at all.

That's what it did, so why are you complaining? *YOU* modified the
other byte, not the compiler. (And you did so without following the
rules.)

> If every field in a threaded program required mutex or atomic
> semantics
> for modification, nothing would work efficiently.

I'm afraid that is the requirement, at least, in an platform-neutral
way. There might be ways you can make optimizations for particular
platforms, but you can only do so where they are known to be safe for
a particular platform. You are outside the standards here.

> > Sorry, it's your bug. You assumed that manipulating one part of an
> > object wouldn't affect another part. In analogous cases, it is clear
> > that this will never be the case. (Consider successive character
> > variables on a platform that has no 8-bit atomic operations.)

> > So you relied on something that was not guaranteed by anything.

> Exactly what do you think I relied on? That the compiler should
> alter
> the byte I specified and nothing else. Oh yes, I rely on that every
> day.

The compiler altered the byte you specified and nothing else. The
problem is, *you* modified another byte in the same storage object
while the compiler was accessing it.

You place the blame on the compiler, but it takes two to tango. You
did something that affected what the compiler was doing, and the
combination was unsafe. What the compiler did is perfectly safe in
isolation and there is no rule that prohibits the compiler from doing
it.

> > DO NOT EVER DO THAT. The optimization is not worth the ultimate
> > suffering when the assumption turns out not to be true on some other
> > platform or with some other compiler.

> This not an optimization. Our code works exactly as we intended on
> AIX 32-bit
> and every other platform in 64-bit. The implication is that we did
> something
> wrong in our source. I don't believe it.

Don't believe it, but it's true.

The compiler can do anything it wants that doesn't broke code that
compiles with the standards. You failed to comply with the standards
by not protecting an access with mutexes.

Again, if you did the same thing with two character objects on a
platform that didn't have atomic operations on a single byte, would
you still insist your code what valid? If not, then why is it
guaranteed to work on your platform? Because it is possible for the
compiler to make it work if it knew what you were doing, which is
doesn't?

> > If you needed to do this as an optimization because of a platform
> > where it was proven to be required and safe, why did you enable it on
> > *this* platform?

> Sorry, you completely lost me. We made no such assumptions for AIX.
> We didn't enable anything specific for this platform.

That's your problem. You are making platform-specific optimizations as
if they were portable.

> And, you make the same argument that I've heard from a number of IBM
> contacts:
>
> "If your code doesn't work in a threaded environment, it must be
> something you
> did wrong, because our compiler doesn't make any mistakes."
>
> Wow. Talk about naiive.

I have identified a specific defect in your code that is the cause of
your problem. How is that the same as saying that there must be a
defect in your code without finding one?

Your code is broken. Really.

DS

David Schwartz

unread,
May 1, 2007, 2:20:06 PM5/1/07
to
On May 1, 10:49 am, "dick.dunbar" <dick.dun...@gmail.com> wrote:

> If you were to explain the error that I have made, I'd be happy to
> adjust.

The error is really this simple -- you modify an object in one thread
while another thread is or might be accessing that same storage object
and you haven't protected the accesses with mutexes. This is what the
pthreads standard requires. The fix is either to use mutexes or to put
the two variables in separate storage objects.

DS

David Schwartz

unread,
May 1, 2007, 2:25:27 PM5/1/07
to
On May 1, 4:48 am, Gil Hamilton <gil_hamil...@hotmail.com> wrote:

> Frankly, I don't understand why a compiler implementer would ever allow a
> volatile to share an "atomic" data container with another variable at
> all. The way to handle this in the compiler is simply to align the bit
> field (or whatever the next variable is) to the next atomic boundary.
> Structure alignment padding is something programmers know to expect and
> account for while having a volatile loaded up along with another field
> and stored back under the covers violates the law of programmer
> expectations. It's just a stupid thing for the compiler developer to do.

Consider:

volatile char array[1024];

Suppose this platform needs separate storage objects to be at least 64-
bits. If the compiler puts each of my array entries in a separate
storage object, this code will break:

strcpy(array, "hello there");

But if it doesn't, then this code will break because it may affect
neighboring volatiles:

array[52]=0;

How is the compiler supposed to know what I'm going to do with the
array when it lays it out in memory? And what if the 'volatile'
appears in the form of a cast?

DS

David Schwartz

unread,
May 1, 2007, 2:27:07 PM5/1/07
to
On May 1, 9:11 am, Peter Dimov <pdi...@gmail.com> wrote:

> Theirs because of the volatile, as Gil Hamilton pointed out.

Then explain to me what:

volatile char foo[1024];
foo[1]=0;

is supposed to do on a platform that only has 32-bit operations.

DS

dick.dunbar

unread,
May 1, 2007, 2:31:21 PM5/1/07
to

Not exactly ... my source modified one field, and the compiler
modified two.
I have no problem with the field that was modified.

We know our code. The sample program conveniently lumps everything
together so you can see the issue of the generated code.

The byte alteration is made in an entirely different part of the code,
where
we know we "own" the object, and we're only leaving bread-crumbs
behind when another atomic decrement occurs to take the count to zero.

Please review my generated code examples.
Without changing our code at all to involve mutexes you seem to think
are
essential, the code now works perfectly ... and far more efficiently.

We got two fixes with one simple declaration change ...
thread safety and thread efficiency ... 32 and 64-bit.

I just don't know how to factor in your suggestions.


dick.dunbar

unread,
May 1, 2007, 2:38:13 PM5/1/07
to
On May 1, 11:18 am, David Schwartz <dav...@webmaster.com> wrote:
> Your code is broken. Really.

Our code is now un-broken on AIX (Really) with a simple declaration
change.
_Tag T:8;
char T;

The code is also unbroken on every other platform: win, linux, hpux,
solaris, aix


Markus Elfring

unread,
May 1, 2007, 3:07:29 PM5/1/07
to
> Our code is now un-broken on AIX (Really) with a simple declaration change.
> _Tag T:8;
> char T;

Do you know if this variable or any adjacent members of your data structure will
be simultaneously accessed by multiple threads in your programs?

Regards,
Markus

David Schwartz

unread,
May 1, 2007, 3:53:29 PM5/1/07
to

There you go. That's the way non-portable code works, you have to
tweak it for each new platform.

Your mistake was in not realizing that the code wasn't portable and
not having a portable version to use for platforms where you didn't
know that the optimization was safe.

DS

Peter Dimov

unread,
May 1, 2007, 3:58:35 PM5/1/07
to

I see two possibilities.

1. Define char as 32 bits.
2. Violate the strict interpretation of the standard and write to
foo[0], foo[2] and foo[3].

Our case is less problematic, though. The compiler can do the right
thing, it just chooses not to.

Peter Dimov

unread,
May 1, 2007, 4:17:00 PM5/1/07
to

_M_ref_count and _M_tag are separate objects. They are both subobjects
of NQNString which is the "complete object" in this case. Interpreting
the pthreads standard - which talks about "memory locations" - to only
apply to complete objects outlaws many sensible use cases and seems
contrary to its intent.

The problem here is that the compiler is inventing conflicting writes
to a memory location behind the programmer's back. The pthreads
standard doesn't, and cannot, say what the programmer should do in
this case except complain to the compiler vendor. This will hopefully
be clarified on the C++ side in a few years.

dick.dunbar

unread,
May 1, 2007, 4:54:12 PM5/1/07
to

Of course. The RefPointer is atomicly updated by threads all the
time.

The _Tag byte is modified, without conflict among threads, as a way to
leave
state. It is fine if two threads simultaneously modify this field,
and
we don't care who wins the race. It represents work to be done.

The only time this flag will be accessed is when the RefCount goes to
zero.
Only a single thread will see this action, and the result is to
destroy the object
this structure represents.

And that was the difficult thing to find ... that this object should
be referenced
by a thread _after_ it had been destroyed. No clear pattern of
failure, since
the memory once owned by that object could get reused.

And the reason the count went to zero is the compiler modified this
field "behind our back".
Studying the code produced no clues to this fact. Lots of hard work
and very fast
machines was the only way to focus in on the failing component.

Reference Counted objects are a well known, well understood, well
behaved
design pattern in the C++ world.

Marcin 'Qrczak' Kowalczyk

unread,
May 1, 2007, 6:44:33 PM5/1/07
to
Dnia 01-05-2007, wto o godzinie 11:25 -0700, David Schwartz napisał(a):

> volatile char array[1024];
>
> Suppose this platform needs separate storage objects to be at least 64-
> bits. If the compiler puts each of my array entries in a separate
> storage object, this code will break:
>
> strcpy(array, "hello there");

It's not supposed to work because strcpy takes a non-volatile pointer.

--
__("< Marcin Kowalczyk
\__/ qrc...@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/

Chris Thomasson

unread,
May 1, 2007, 7:22:29 PM5/1/07
to
"dick.dunbar" <dick....@gmail.com> wrote in message
news:1177960498.3...@o5g2000hsb.googlegroups.com...
> Boolean Buyers Beware ... AIX compiler bug --- PMR 26241,756.

[...]
> The Bug:
> -------
> 1. ref_count is incremented/decremented consistently using AIX atomic
> ops

[...]

> void *ThreadFunc(void *threadid) {
> for (int i = 0; i < 100000000; ++i) {
> gs._M_incr(); // atomicly increment
> gs._M_tag=NQNString::_S_concat; // set it to one
> if(gs._M_decr() == 0 ) ++*(int*)0; // ref_count should never go to
> zero!
> }
> return 0;
> }


> So what do you think?

Humm... You know, I hesitated for a moment before I posted this query
because I don't really think it applies to your particular scenario, but I
just felt the need to mention it anyway! Lets put it this way, I hope that I
am wrong and your reference counting algorithm is correct!

So please try to bear with me here...

;^)


> Our bug or theirs?

Well, before I answer that, I have one very quick question:

------------------------------------
Does any thread (e.g., ThreadFunc) in your application have the ability to
increment the reference count of any shared instance of NQNString (e.g.,
'gs' in your example), if it does not already own a previous reference?
------------------------------------

If so, you just may be experiencing a fairly nasty race-condition wrt the
reference counting algorithm itself. As you probably know, you must use some
form of atomic reference counting algorithm whenever a thread needs to
increment the reference count of any shared object that it does NOT
previously own a reference to. Please note that "atomically thread-safe"
reference counting is completely different than "normal thread-safe"
counting. For instance, Boost's "current" shared_ptr algorithm cannot be
used as a "standalone" solution for this because it's only a normal
thread-safe counted pointer. In order to make correct use of Boost
shared_ptr, a thread MUST own a reference to a shared data-structure BEFORE
it tries to acquire another one.


If you want a thread to be able to grab a reference to a object that it does
not own, you need to use some form of atomic counting. FWIW, here are some
links to some full-blown atomic counted pointers:

http://atomic-ptr-plus.sourceforge.net/

http://appcore.home.comcast.net/vzoom/refcount/


Any thoughts?

P.S.

If this is not applicable to your situation, please forgive me for wasting
your time!

:^0

rfis...@gmail.com

unread,
May 1, 2007, 7:46:05 PM5/1/07
to
dick.dunbar ha scritto:

> > Why not just use a 1 byte int, something like u_int8_t from types.h?
>
> Indeed it is the problem. See my 3 code examples.
>
> The reason for using "_Tag" is to clearly indicate this field will
> only
> contain the values of an enumeration. Valuable to know when you're
> reading code.

A comment could do that.
char _M_tag; // assumes values of enum _Tag.

[...]

> There's a compiler option "-qenum=1", but obviously we can't adopt
> that
> for our whole application ... and then there are these other platforms
> to worry about. _Tag T:8 is the clearest, most uniform way to
> express the intent.

I disagree - a char + comment expresses the intent just as well. This
problem
was an excruciating intersection of standard haziness, enum size,
bit-field layout/manipulation, multithreading and an IBM compiler for
powerpc.
Knowing what you know now you're going to have to put an essay sized
comment
in anyway, so why not use a byte and remove the possibility once and
for all
of anything like this happening ever again? I may seem over-
conservative,
but in fact I'm just superstitious. That bit-field caused so much
trouble that
I'd go as far as saying that it's malign, even evil. It's cursed code,
inauspicious, and ill-boding. It's as unlucky as women on ships.
And it needs to be cleansed. With fire.

> > P.S. I'm pretty sure it was Bjarne who said bitfields
> > were for "idiots and children".
>
> Bjarne Stroustrup, C++ Programming Language, Special Edition.

[... I love bit-fields ...]

That would be some backflip! Ok, in that case it was, uh, one of
Jon Bentley's pearls. Or was it Dennis Ritchie? Ken Thompson?

RF

--
felix: lightweight threads, HOFs and game scripting.
svn co https://svn.sourceforge.net/svnroot/felix/felix/trunk felix

dick.dunbar

unread,
May 2, 2007, 1:12:24 AM5/2/07
to
On May 1, 4:46 pm, "rfist...@gmail.com" <rfist...@gmail.com> wrote:
> dick.dunbar ha scritto:
>
> A comment could do that.
> char _M_tag; // assumes values of enum _Tag.

Of course.

> > There's a compiler option "-qenum=1", but obviously we can't adopt
> > that
> > for our whole application ... and then there are these other platforms
> > to worry about. _Tag T:8 is the clearest, most uniform way to
> > express the intent.
>
> I disagree - a char + comment expresses the intent just as well.

Well, yes ... but that's only when you know the answer to the problem.
I was giving credit to the STLport author, for using the
most obvious construct for the task, which previously worked fine.

I just hate coding to the personality of some frankenstein compiler.
In this case, it was pretty easy to "just give up" because it also
produced a huge performance boost.

> Knowing what you know now you're going to have to put an essay sized comment
> in anyway, so why not use a byte and remove the possibility once and for all
> of anything like this happening ever again? I may seem over- conservative,

That's exactly what we've chosen to do; use char.
I closed my open issue with IBM today.

> And it needs to be cleansed. With fire.

I think the exorcism you seek is for the IBM compiler.


dick.dunbar

unread,
May 2, 2007, 1:31:33 AM5/2/07
to
On May 1, 4:22 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
> "dick.dunbar" <dick.dun...@gmail.com> wrote in message

> ------------------------------------
> Does any thread (e.g., ThreadFunc) in your application have the ability to
> increment the reference count of any shared instance of NQNString (e.g.,
> 'gs' in your example), if it does not already own a previous reference?
> ------------------------------------

Yes. It always uses the member function to do so, which is an atomic
update.

>
> If so, you just may be experiencing a fairly nasty race-condition wrt the
> reference counting algorithm itself.

You're correct that it's a race-condition, but it was not from our
algorithm.
We always update the counter atomicly; the compiler updated the
counter
without our knowledge ... we were storing a constant into a single
byte
in an adjacent word. Who would guess that would alter RefCounter.
The only way to discover such things is to be able to read assembler.

In fact, I should probably disclose that we don't use the AIX
fetch_and_add
routines at all. I wrote inline assembly functions to Increment/
Decrement
an atomic pointer by 1.

Consider this: You want to update a counter.
On PowerPC, that would be load/add/store instructions.
Now you want to do this atomicly as fast as possible, and it is just
two more instructions if you use assembler. Here are the functions:

#pragma mc_func AtomicInc { "7c641b78" "7c602028" "38630001"
"7c60212d" "4082fff4" }
#pragma reg_killed_by AtomicInc gr3-gr4, cr0
#pragma mc_func AtomicDec { "7c641b78" "7c602028" "3863ffff"
"7c60212d" "4082fff4" }
#pragma reg_killed_by AtomicDec gr3-gr4, cr0

> If this is not applicable to your situation, please forgive me for wasting your time!

Not a waste of time. Open dialog was what I was seeking.

Markus Elfring

unread,
May 2, 2007, 5:13:39 AM5/2/07
to
> The _Tag byte is modified, without conflict among threads, as a way
> to leave state. It is fine if two threads simultaneously modify
> this field, and we don't care who wins the race.

Do you really want to ignore race conditions at this place?


> It represents work to be done.

Are you going to add any synchronisation for bigger memory regions?


> And that was the difficult thing to find ... that this object should
> be referenced by a thread _after_ it had been destroyed.
> No clear pattern of failure, since the memory once owned by that
> object could get reused.
>
> And the reason the count went to zero is the compiler modified this
> field "behind our back".

Would you like to add any checks or a guarantee to protect against unexpected
updates for several fields in the same data structure?
Where is a "mass update" of a few places (of word size) in memory quicker than
the write access for a single byte?
Are the issues between performance/efficiency and correct concurrent behaviour
resolved in your use case for all target platforms now?


> Reference Counted objects are a well known, well understood, well
> behaved design pattern in the C++ world.

It seems that a few special cases must still be considered in their implementations.

Regards,
Markus

rfis...@gmail.com

unread,
May 2, 2007, 6:18:24 AM5/2/07
to

dick.dunbar ha scritto:

Yay! Don't forget the big fat comment!

rfis...@gmail.com

unread,
May 2, 2007, 6:23:19 AM5/2/07
to
David Schwartz ha scritto:

> On May 1, 11:38 am, "dick.dunbar" <dick.dun...@gmail.com> wrote:
> > On May 1, 11:18 am, David Schwartz <dav...@webmaster.com> wrote:
> >
> > > Your code is broken. Really.
> >
> > Our code is now un-broken on AIX (Really) with a simple declaration
> > change.
> > _Tag T:8;
> > char T;
> >
> > The code is also unbroken on every other platform: win, linux, hpux,
> > solaris, aix
>
> There you go. That's the way non-portable code works, you have to
> tweak it for each new platform.

But wait, the tweak (->char) is portable now, isn't it?
Or maybe not, we're still talking about threads without mutexes,
so "portable" probably isn't the right word to use. Could we say
that the char version presents fewer surprises?
At the very least it's luckier, right?

Chris Thomasson

unread,
May 2, 2007, 6:37:38 AM5/2/07
to
"dick.dunbar" <dick....@gmail.com> wrote in message
news:1178083893....@c35g2000hsg.googlegroups.com...

> On May 1, 4:22 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>> "dick.dunbar" <dick.dun...@gmail.com> wrote in message
>
>> ------------------------------------
>> Does any thread (e.g., ThreadFunc) in your application have the ability
>> to
>> increment the reference count of any shared instance of NQNString (e.g.,
>> 'gs' in your example), if it does not already own a previous reference?
>> ------------------------------------
>
> Yes. It always uses the member function to do so, which is an atomic
> update.

Well, FWIW, using fetch_and_add alone is not strong enough to implement
atomically thread-safe reference counting. You could use fetch_and_add to
create a thread-safe counter like Boost shared_ptr, however, you can't use
it as-is to create an atomically thread-safe counter...


rfis...@gmail.com

unread,
May 2, 2007, 7:49:03 AM5/2/07
to
dick.dunbar ha scritto:

> > I disagree - a char + comment expresses the intent just as well.
>
> Well, yes ... but that's only when you know the answer to the problem.

Yah, sorry. 20-20 hindsight speaking here.

> I was giving credit to the STLport author, for using the
> most obvious construct for the task, which previously worked fine.

I'd've tracked the author down and asked for damages for
the several weeks of my life flushed away or at least tried
to convince him as a library writer to write more
conservative code. However, you were pushing it quite a bit
expecting the library to just happen to be threadsafe.
Or did it purport to be threadsafe?

Bit-fields are one of those things that I see and especially
use, so rarely that I'd have to look up the notation, so to
me they're not an obvious solution. I didn't even know you
could declare an enum type as a bitfield. Is that the only
occurence of such a thing in the code? Seeing exactly one
occurence of something outlandish in a large code-base is
a red-flag. Bad juju fer sure.

> > Knowing what you know now you're going to have to put an essay sized comment
> > in anyway, so why not use a byte and remove the possibility once and for all
> > of anything like this happening ever again? I may seem over- conservative,
>
> That's exactly what we've chosen to do; use char.
> I closed my open issue with IBM today.
>
> > And it needs to be cleansed. With fire.
>
> I think the exorcism you seek is for the IBM compiler.

Dunno, personally I'd never've run into that problem, I'm too middle
of the road.
Interesting thread though!

dick.dunbar

unread,
May 2, 2007, 12:09:17 PM5/2/07
to
On May 2, 2:13 am, Markus Elfring <Markus.Elfr...@web.de> wrote:

> It seems that a few special cases must still be considered in their implementations.


You mean, like, checking for a buggy compiler? Sure, I agree it's a
special case :-)

The focus still seems to be on our thread designs. Honestly, they
work fine.
We don't need help in this area. The thrust of this exchange was to
alert others
to the bizarre behaviour of a single compiler in a special situation.
This error never would have happened in a 32-bit world, because the
BITCONTAINERS and the Atomic fields are both 32-bits.
No chance to destroy an adjacent word by an update.

Problem solved. Our application runs as expected, with no race
conditions
and no unexpected behaviour related to threading.

dick.dunbar

unread,
May 2, 2007, 12:18:37 PM5/2/07
to
On May 2, 3:23 am, "rfist...@gmail.com" <rfist...@gmail.com> wrote:
> David Schwartz ha scritto:

> But wait, the tweak (->char) is portable now, isn't it?
> Or maybe not, we're still talking about threads without mutexes,
> so "portable" probably isn't the right word to use. Could we say
> that the char version presents fewer surprises?
> At the very least it's luckier, right?

It is portable, and it accepts an assignment of an enumeration, so no
code changes are required to add (cast)s.

I would say that the C/C++ syntax is "overloaded" by usage.

char c;
char c:8;

They're the same size, but they have different semantics to some
compilers.
IBM's compiler thinks we want to deal with bits ... our code only
intended
to set a particular size for an enumeration.

Assignment to any of these objects would have triggered the failure.

char c:8;
int i:8;
bool b:8;
_Tag t:8;
size_t s:16; // STLport also uses this form of declaration.

And, it is only a problem in a 64-bit arena.

We made the change to "trick" one compiler; thankfully,
the other compilers know how to deal with the trick (it's portable).


dick.dunbar

unread,
May 2, 2007, 12:25:20 PM5/2/07
to
On May 2, 3:37 am, "Chris Thomasson" <cris...@comcast.net> wrote:

> using fetch_and_add ... can't be used as-is to create an atomically thread-safe counter...

Ok, I'm intrigued ... why do you make this claim?

fetch_and_add(int* data, value)

will use these PowerPC instructions:

a: lwarx rx, data // load and reserve
add rx, value // add the value (1 or -1)
stwcx. rx, data // store and release
bne a: // loop back if reservation is lost

The loop continues until our thread is the last one to touch the data
address.

dick.dunbar

unread,
May 2, 2007, 12:44:22 PM5/2/07
to
On May 2, 4:49 am, "rfist...@gmail.com" <rfist...@gmail.com> wrote:


> > I was giving credit to the STLport author,.


>
> I'd've tracked the author down and asked for damages

Damages? on OpenSource? The damage was caused by the IBM compiler.

> library writer to write more conservative code.

I fail to see how the original author could have anticipated a bug in
a
specific compiler running 64-bit.

> expecting the library to just happen to be threadsafe.

It is thread safe ... but not without a lot of work on our end
to plug some race conditions. The failure wasn't one of those.

> Or did it purport to be threadsafe?
>
> Bit-fields are one of those things that I see and especially
> use, so rarely that I'd have to look up the notation, so to
> me they're not an obvious solution.

My background lends itself to bit twiddling, but I was surprised
when we found the error related to manipulation of bits
when none was intended by the code.


> I didn't even know you could declare an enum type as a bitfield.

Neither did I. I had to look up the syntax again, because I'd never
seen it used in that context. That's what I mean about an overloaded
usage meaning ... the author intended to specify object size, not
bit semantics.

> Is that the only occurence of such a thing in the code? Seeing exactly one
> occurence of something outlandish in a large code-base is
> a red-flag. Bad juju fer sure.

This point cranks my engine. In any large code base, written by
hundreds of programmers, and adopting multiple styles from Open Source
will, without a doubt, produce anomolies.

It is _impossible_ for me to read C++ code and understand what it
does.
My only recourse is to rely on compiler generated code to see what
bizzare
class warfare has prepared for me today.

When the goal is to get product (function, business value) out the
door,
and make money at it, then academic quibbles over coding style
will go unheeded. They have to, or we'd never ship anything.

"If it works, ship it."
"If you claim it doesn't work, prove it with a test case."

That's the reality in a commercial software world.
We delayed shipment by more than a month because of this bug.
That's lost revenue.

The backup plan was to withhold the 64-bit AIX version.
No customer would put up with a 10-hour mean-time-to-failure.
That's lost customer confidence.

David Schwartz

unread,
May 2, 2007, 12:34:47 PM5/2/07
to
On May 1, 3:44 pm, Marcin 'Qrczak' Kowalczyk <qrc...@knm.org.pl>
wrote:

> > strcpy(array, "hello there");

> It's not supposed to work because strcpy takes a non-volatile pointer.

That really doesn't matter. You can write your own 'strcpy' that takes
a 'volatile' pointer and you'll have the same problem. This would also
make casting to or from 'volatile' an undefined operation (because the
packaging would be different).

My point stands -- a volatile object cannot be packed differently from
a non-volatile object without breaking tons of code that currently
works.

A 'volatile' char can't be packed so as to prevent conflict with
adjacent volatile objects, so it cannot be required for any type to do
so.

DS

David Schwartz

unread,
May 2, 2007, 12:38:19 PM5/2/07
to
On May 2, 9:25 am, "dick.dunbar" <dick.dun...@gmail.com> wrote:

> Ok, I'm intrigued ... why do you make this claim?
>
> fetch_and_add(int* data, value)
>
> will use these PowerPC instructions:
>
> a: lwarx rx, data // load and reserve
> add rx, value // add the value (1 or -1)
> stwcx. rx, data // store and release
> bne a: // loop back if reservation is lost

Your use of the phrase "will use" strongly suggests that you have not
learned anything from this experience and fully intend to continue to
write broken code. That is kind of sad.

DS

dick.dunbar

unread,
May 2, 2007, 1:08:37 PM5/2/07
to
On May 2, 9:38 am, David Schwartz <dav...@webmaster.com> wrote:

> That is kind of sad.

Indeed, but cheer up.
I'd rather be President; there are no consequences for being wrong.


David Hopwood

unread,
May 2, 2007, 8:49:30 PM5/2/07
to
Thomas Richter wrote:
> dick.dunbar <dick....@gmail.com> wrote:
>
>>>I don't think the C++ standard has a word to say about threads. Note
>>>that a C++ compiler is always free to operate on an "as if" basis, i.e.
>>>its implementation might do whatever it wants as long as the observed
>>>result is the same. And as C++ only addresses single-threaded programs,
>>>observed behaviour is only defined in this sense. Thus, your program
>>>has been proven to work correctly for single-threaded by the compiler,
>>>and thus the compiler is fine.
>>
>>I'm still flabbergasted that reasonable people come to this
>>conclusion.

Me too. It's as though some people think that because multi-threaded C++
is not defined by any Standard, that means that we cannot say anything
about what it is *reasonable* for a compiler to do.

>>Note: The problem isn't confined to threaded programs, if you were to
>>build a 64-bit device driver that modified adapter storage in a
>>particular way, this would be completely unexpected that the compiler
>>should modify two fields when one was specified.
>
> And again, the standard says nothing about that. If you access any
> I/O devices thru pointers, you're up to yourself and what your
> vendor guarantees. The virtual machine defined by the C++ standard
> again doesn't say a word about I/O devices.

Section 7.1.5.1: "In general, the semantics of volatile are intended
to be the same in C++ as they are in C."

Everyone knows how volatile is supposed to work in C; it's supposed to
inhibit any compiler optimizations that affect how a volatile-qualified
object is accessed -- including the optimization that is at issue in this
thread. It is true that the only way you can observe such optimizations
is by non-strictly-conforming code, but so what? The whole point of
"volatile" is to constrain the *implementation* of accesses to volatile
objects, so that non-strictly-conforming programs can rely on those
constraints.

> You *may* get around
> this problem then by "hinting" the compiler by the "volatile" keyword,
> but again, you're still in "undefined" land.

No, implementation-defined. The vendor is supposed to document what
constitutes a volatile access. So if IBM have explicitly documented
somewhere that a volatile object can be written when a nearby
(but not overlapping) object is accessed, then they're following the
standard; otherwise they are violating it. But in the former case,
I think many C and C++ programmers would not be at all happy with that
implementation.

--
David Hopwood <david....@industrial-designers.co.uk>

David Hopwood

unread,
May 2, 2007, 9:00:03 PM5/2/07
to
David Schwartz wrote:
> On May 1, 9:11 am, Peter Dimov <pdi...@gmail.com> wrote:
>
>>Theirs because of the volatile, as Gil Hamilton pointed out.
>
> Then explain to me what:
>
> volatile char foo[1024];
> foo[1]=0;
>
> is supposed to do on a platform that only has 32-bit operations.

Whatever the required platform documentation as to "what constitutes an
access to an object that has volatile-qualified type" says.

--
David Hopwood <david....@industrial-designers.co.uk>

David Hopwood

unread,
May 3, 2007, 12:37:25 PM5/3/07
to
dick.dunbar wrote:
> On Apr 30, 6:53 pm, David Schwartz <dav...@webmaster.com> wrote:
>
>>Sorry, it's your bug. You assumed that manipulating one part of an
>>object wouldn't affect another part. In analogous cases, it is clear
>>that this will never be the case. (Consider successive character
>>variables on a platform that has no 8-bit atomic operations.)
>>
>>So you relied on something that was not guaranteed by anything.
>
> Exactly what do you think I relied on? That the compiler should
> alter the byte I specified and nothing else.

More precisely: and no other volatile-qualified objects.

I found the AIX documentation of "what constitutes an access to a
volatile-qualified object", such as it is:

<http://www-1.ibm.com/support/docview.wss?uid=swg27005417>

# Abstract
# The C/C++ Language Reference describes the syntax, semantics, and
# IBM implementation of the C and C++ programming languages.
[...]
# The volatile Type Qualifier
#
# The volatile qualifier maintains consistency of memory access to data
# objects. Volatile objects are read from memory each time their value
# is needed, and written back to memory each time they are changed. The
# volatile qualifier declares a data object that can have its value
# changed in ways outside the control or detection of the compiler (such
# as a variable updated by the system clock). The compiler is thereby
# notified not to apply certain optimizations to code referring to the
# object.
#
# Accessing any lvalue expression that is volatile-qualified produces a
# side effect. A side effect means that the state of the execution
# environment changes.

If there were any exception allowing an access to a nonvolatile object
to affect nearby volatile fields, then this documentation would have to
mention it. Since it doesn't, the behaviour is an implementation bug.

--
David Hopwood <david....@industrial-designers.co.uk>

David Hopwood

unread,
May 3, 2007, 12:45:45 PM5/3/07
to
David Schwartz wrote:
> On May 1, 10:49 am, "dick.dunbar" <dick.dun...@gmail.com> wrote:
>
>>If you were to explain the error that I have made, I'd be happy to
>>adjust.
>
> The error is really this simple -- you modify an object in one thread
> while another thread is or might be accessing that same storage object
> and you haven't protected the accesses with mutexes. This is what the
> pthreads standard requires.

True, but beside the point, since this implementation behaviour would be
a bug for single-threaded code given the 'volatile' declaration.

--
David Hopwood <david....@industrial-designers.co.uk>

dick.dunbar

unread,
May 3, 2007, 1:56:50 PM5/3/07
to
On May 3, 9:37 am, David Hopwood <david.hopw...@industrial-
designers.co.uk> wrote:

> > Exactly what do you think I relied on? That the compiler should
> > alter the byte I specified and nothing else.
>
> More precisely: and no other volatile-qualified objects.
>

> <http://www-1.ibm.com/support/docview.wss?uid=swg27005417>

> # The volatile Type Qualifier
> #
> # The volatile qualifier maintains consistency of memory access to data
> # objects. Volatile objects are read from memory each time their value
> # is needed, and written back to memory each time they are changed. The
> # volatile qualifier declares a data object that can have its value
> # changed in ways outside the control or detection of the compiler (such
> # as a variable updated by the system clock). The compiler is thereby
> # notified not to apply certain optimizations to code referring to the
> # object.

The version 8 language manual, page 56-57 has similar (more narrow)
language.
http://www.elink.ibmlink.ibm.com/publications/servlet/pbi.wss?CTY=US&FNC=SRX&PBL=SC09-8004-00#
------
The volatile qualifier declares a data object that can have its value


changed in ways outside the control or detection of the compiler (such

as a variable updated by the system clock or by another program). This
prevents the compiler from optimizing code referring to the object by
storing the object's value in a register and re-reading it from there,
rather than from memory, where it may have changed.
-------
The V7 sentence is changed.
# The compiler is thereby notified not to apply "certain
optimizations" to code referring to the object.

Acknowledging that a volatile variable could be "changed in ways
outside the control/detection of the compiler" gives the impression
that side-effect modification of this variable using a <dubious>
optimization might be problematic.

Thanks

<aside> English is a hopeless language to describe computer
expectations.
I guess that's why we have so many lawyers in the USA.
How certain are we that IBM knew what they were doing when they wrote
the V7 manual? :-)

http://tinyurl.com/268j7z

* certain(a): definite but not specified or identified; "set aside a
certain sum each week"; "to a certain degree"; "certain members have
not paid their dues"; "a certain Mrs. Jones"

* certain(p): having or feeling no doubt or uncertainty; confident and
assured; "felt certain of success"; "was sure (or certain) she had
seen it";

* certain(p): established beyond doubt or question; definitely known;
"what is certain is that every effect must have a cause"; "it is
certain that they were on the bus"; "his fate is certain"; "the date
for the invasion is certain"

* certain to occur; destined or inevitable; "he was certain to fail";
"his fate is certain";
"In this life nothing is certain but death and taxes"- Benjamin
Franklin;

* reliable in operation or effect; "a quick and certain remedy";

* exercising or taking care great enough to bring assurance; "be
certain to disconnect the iron when you are through";

rfis...@gmail.com

unread,
May 3, 2007, 4:00:47 PM5/3/07
to
On 3 Mag, 18:45, David Hopwood <david.hopw...@industrial-

Hey, sorry, my last post went away for some reason. I've come around
to dick's way of thinking. That compiler's just wrong.

I would say the follow topics in this thread are complete red
herrings:
bit-fields, the volatile specifier, c++ standards, and atomic ops.

That leaves a compiler that writes to members when it shouldn't.
Our tacit expectation of thread "reasonableness" in c++ says that
you should be able to use a mutex to serialise access to a single
member, however the compiler shouldn't generate code that modifies
that member when you change another unrelated one. It's like
the famous crossroads here with traffic lights in only one direction
(here=Arezzo): it's just wrong - if it were right the implication
would
be that you can only ever serialise access to the entire class and
not just a part of it.

If I'm wrong about this then I've got an awful lot of code to review.
I hope I'm not wrong.

RF