Most conforming POSIX threads implementation

Dima Volodin

unread,

Jul 17, 2001, 5:15:45 PM7/17/01

to

[I cross-posted it to compt.std.c in attempt to make sure that I don't miss
anything as far as C is concerned]

Kaz Kylheku wrote:

> In article <3B543B11...@dvv.org>, Dima Volodin wrote:
> >When you define a struct with two chars, you also define two chars - being a
> >member of a struct doesn't make an object less of an object.
>
> Yes it does, because the members of the struct must be allocated in a single
> larger object,

Let me rephrase it - the way memory is allocated for an object does not make
this object less of an object.

> which consits of a contiguous sequence of individually
> addressable bytes (as far as the C program can tell).

Contiguous? All the language requires is that addresses that "Each non-bit-field
member of a structure or union object is aligned in an implementation-defined
manner appropriate to its type" and "Within a structure object, the
non-bit-field members [...] have addresses that increase in the order in which
they are declared", but it doesn't require them to be contiguous (in fact, it
doesn't even define what "increase" means for objects that are not array
members), so I don't see any problems for a hypothetical POSIX standard to
require that named objects - be it stand-alone objects or struct or union
members - were placed in separate memory granules. Also, MHO is that
additionally or alternatively, a standard should introduce something like
pthread_memorygranule_t - a type that shall be used in unions to guarantee an
object's allocation in a separate memory granule (the same way unions shall be
used to guarantee a particular alignment for, e.g., a char array). And, of
course, it must be spelled out that no two memory areas allocated by malloc()
shall have common memory granules.

> There is a C language level difference, because the language allows the
> ptrdiff_t pointer displacement between two members of the same aggregate
> to be computed, and poiners to members of the same aggregate can be
> compared using the relational operators:
>
> int x;
> int y;
>
> int a[2];
>
> if (&a[0] < &a[1]) { /* Correct */
> /* Will execute this */
> }
>
> if (&x < &y) /* Undefined behavior */
> {
> }
>
> This allows objects which are created by separate declarations, or
> separate requests to the storage allocator, to be placed into distinct
> units of memory, such as memory segments on a segmented architecture.

The language doesn't know anything about "memory granules", so there's nothing
that would prevent an implementation from placing x and y into the same memory
granule. AFAIK, the cuurent POSIX standard doesn't require any kind of placement
either (beside the one that is dictated by C, of course) and it is, IMHO, a
defect.

> This is the area of the language where it would be easy to introduce
> a POSIX requirement that distinct primary objects (my term) do not
> share memory granules, and so may be concurrently accessed without
> interference. Since there is already some kind of difference,
> introducing another one isn't a big deal.

See my proposals above.

Dima

Dima Volodin

unread,

Jul 17, 2001, 5:19:25 PM7/17/01

to

[I cross-posted it to compt.std.c in attempt to make sure that I don't
miss
anything as far as C is concerned]

Kaz Kylheku wrote:

> In article <3B543B11...@dvv.org>, Dima Volodin wrote:
> >When you define a struct with two chars, you also define two chars - being a
> >member of a struct doesn't make an object less of an object.
>
> Yes it does, because the members of the struct must be allocated in a single
> larger object,

Let me rephrase it - the way memory is allocated for an object does not
make
this object less of an object.

> which consits of a contiguous sequence of individually
> addressable bytes (as far as the C program can tell).

Contiguous? All the language requires is that "Each non-bit-field

member of a structure or union object is aligned in an
implementation-defined
manner appropriate to its type" and "Within a structure object, the
non-bit-field members [...] have addresses that increase in the order in
which
they are declared", but it doesn't require them to be contiguous (in
fact, it

doesn't even define what "increase" means for addresses of objects that

Dima Volodin

unread,

Jul 17, 2001, 5:50:03 PM7/17/01

to

[I cross-posted it to compt.std.c in attempt to make sure that I don't miss
anything as far as C is concerned]

Kaz Kylheku wrote:

> In article <3B543B11...@dvv.org>, Dima Volodin wrote:
> >When you define a struct with two chars, you also define two chars - being a
> >member of a struct doesn't make an object less of an object.
>
> Yes it does, because the members of the struct must be allocated in a single
> larger object,

Let me rephrase it - the way memory is allocated for an object does not make
this object less of an object.

> which consits of a contiguous sequence of individually
> addressable bytes (as far as the C program can tell).

Contiguous? All the language requires is that "Each non-bit-field member of a
structure or union object is aligned in an implementation-defined manner
appropriate to its type" and "Within a structure object, the non-bit-field
members [...] have addresses that increase in the order in which they are
declared", but it doesn't require them to be contiguous (in fact, it doesn't
even define what "increase" means for addresses of objects that are not array

members), so I don't see any problems in a hypothetical POSIX standard's
requiring that named objects - be it stand-alone objects or struct or union

James Kuyper Jr.

unread,

Jul 17, 2001, 8:14:40 PM7/17/01

to

Dima Volodin wrote:
>
> [I cross-posted it to compt.std.c in attempt to make sure that I don't
> miss
> anything as far as C is concerned]
>
> Kaz Kylheku wrote:

...

> > which consits of a contiguous sequence of individually
> > addressable bytes (as far as the C program can tell).
>
> Contiguous? All the language requires is that "Each non-bit-field
> member of a structure or union object is aligned in an
> implementation-defined
> manner appropriate to its type" and "Within a structure object, the
> non-bit-field members [...] have addresses that increase in the order in
> which
> they are declared", but it doesn't require them to be contiguous (in
> fact, it

6.2.5p20: "A structure type describes a sequentially allocated nonempty
set of member objects"

"Sequentially allocated" would in itself seem to be sufficient to
establish this point, but there's also:

6.2.6.1p2: "... objects are composed of contiguous sequences of one or
more bytes ..."

Structures are composed of objects, but they are also objects in their
own right, and hence must occupy contiguous bytes. I can't find any
place where that's directly stated in the standard, but it is indirectly
implied in literally dozens of places. The clearest statement I've been
able to find in support of that concept is the following:

6.7.2.1p12: "Each non-bit-field member of a structure or union object
..."

Note the reference to a "structure ... object". The phrase "structure
object" appears in a half dozen other places as well, and I see no way
to interpret it at any of those places as meaning anything other than
"an object of structure type".

Dima Volodin

unread,

Jul 18, 2001, 8:52:24 AM7/18/01

to

"James Kuyper Jr." wrote:

>
> Dima Volodin wrote:
> > Contiguous? All the language requires is that "Each non-bit-field
> > member of a structure or union object is aligned in an
> > implementation-defined
> > manner appropriate to its type" and "Within a structure object, the
> > non-bit-field members [...] have addresses that increase in the order in
> > which
> > they are declared", but it doesn't require them to be contiguous (in
> > fact, it
>
> 6.2.5p20: "A structure type describes a sequentially allocated nonempty
> set of member objects"
>
> "Sequentially allocated" would in itself seem to be sufficient to
> establish this point, but there's also:

Of course, a struct object shouldn't scattered all across the memory. What I'm
talking about is padding between struct members, and the language readily
allows for that. As I have already qouted, "Each non-bit-field member of a

structure or union object is aligned in an implementation-defined manner

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
appropriate to its type".

Dima

James Kuyper

unread,

Jul 18, 2001, 12:03:27 PM7/18/01

to

Then I'm confused. I traced your discussion back before sending that
message, and came away with the impression that you were arguing for
different members of an array to be stored in different blocks of
memory.

Alexander Terekhov

unread,

Jul 18, 2001, 2:58:42 PM7/18/01

to

James Kuyper wrote:

[...]

> Then I'm confused. I traced your discussion back before sending that
> message, and came away with the impression that you were arguing for
> different members of an array to be stored in different blocks of
> memory.

it seems that there is no portable way to fight word tearing race
condition.. how about yet another 'granularizer' ;-) qualifier:

/* distinct */ char byte1; // should be word
tearing safe
/* distinct */ char byte2; // should be word
tearing safe
distinct char byteArr[] = { 'a','b' }; // should be word
tearing safe
distinct char* bytePtr = byteArr; // should be word
tearing safe
struct { distinct char a,b; } ab = { 'a','b' }; // should be word
tearing safe
char _byteArr[] = { 'a','b' }; // could be word tearing
unsafe
char* _bytePtr = byteArr; // could be word tearing
unsafe
bytePtr = _byteArr; // COMPILE ERROR!!
_bytePtr = byteArr; // COMPILE ERROR!!
bytePtr = _bytePtr; // COMPILE ERROR!!
_bytePtr = bytePtr; // COMPILE ERROR!!
// sizeof( byteArr ) >= sizeof( _byteArr ) // extra space could be
added!

btw, that is actually an 'existing practice' already.
well, sort of..

Compaq uses 'volatile' qualifier to ensure word tearing
safe programming (basically switching over to single
byte granularity which could require software emulation
on older processors):

http://tru64unix.compaq.com/faqs/publications/base_doc/DOCUMENTATION/V51_HTML/ARH9RBTE/DOCU0007.HTM#gran_sec
http://tru64unix.compaq.com/faqs/publications/base_doc/DOCUMENTATION/V51_HTML/ARH9RBTE/DOCU0008.HTM

"(On OpenVMS Alpha or OpenVMS VAX) Compile all application
modules for byte actual granularity. Doing so automatically
prevents word-tearing race conditions for structure or union
members and array elements of size byte or larger that are
accessed concurrently by different threads. No other program
modification is required. This may have a performance penalty
on Alpha EV4 and EV5 processors.
Or,
(On Tru64 UNIX systems) For arrays, add the C language
volatile storage qualifier to the definition of the entire
array; for structures, add volatile to the declaration of
only those members that share the pertinent memory granule.
You must also compile the application's modules using the
Compaq C or Compaq C++ compiler's -strong-volatile switch.
Doing so causes the compiler to produce code that forces
all accesses to those members to occur as atomic operations.
See the description of the -strong-volatile switch in the
Compaq C or Compaq C++ documentation and on the cc reference
page. This may also have a severe performance penalty. "

next step... :) 'very distinct' for fighting cache trashing :) :)

regards,
alexander.

Kaz Kylheku

unread,

Jul 18, 2001, 3:17:11 PM7/18/01

to

In article <3B55DC62...@web.de>, Alexander Terekhov wrote:
>James Kuyper wrote:
>
>[...]
>> Then I'm confused. I traced your discussion back before sending that
>> message, and came away with the impression that you were arguing for
>> different members of an array to be stored in different blocks of
>> memory.
>
>it seems that there is no portable way to fight word tearing race
>condition.. how about yet another 'granularizer' ;-) qualifier:
>
>/* distinct */ char byte1; // should be word
>tearing safe
>/* distinct */ char byte2; // should be word
>tearing safe

The problem is getting all the compiler vendors to put this in.

And some compilers, like GCC, already have features that can do the job,
although not through special type specifiers. Their developers would
rightfully complain.

In GCC, you can enforce alignment like this:

char byte1 __attribute__ ((aligned (32)));
char byte2 __attribute__ ((aligned (32)));

So now byte1 is placed at the start of a 32 byte block, and
byte2 is placed at the start of the next one. So if the granule
size is 32, everything is cool.

All you need is some macro which adds this to your declaration

#define GRANULARIZE(X) X __attribute__ ((aligned (GRANULE_SIZE)));

so you can write your declaration:

GRANULARIZE(char byte1);

This macro can be implemented using the GCC mechanism, or the type specifier
mechanism.

Alexander Terekhov

unread,

Jul 19, 2001, 5:56:12 AM7/19/01

to

Kaz Kylheku wrote:

[...]

> In GCC, you can enforce alignment like this:
>
> char byte1 __attribute__ ((aligned (32)));
> char byte2 __attribute__ ((aligned (32)));
>
> So now byte1 is placed at the start of a 32 byte block, and
> byte2 is placed at the start of the next one. So if the granule
> size is 32, everything is cool.
>
> All you need is some macro which adds this to your declaration
>
> #define GRANULARIZE(X) X __attribute__ ((aligned (GRANULE_SIZE)));

hmm.. your macro controls alignment, fine.. but how about padding?

GRANULARIZE(char byte1); // shared scope 1; OK
GRANULARIZE(char byte2); // shared scope 2; OK??
.
.
.
char byte3;

could easily 'break' byte2 (and byte3) !

regards,
alexander.

Kaz Kylheku

unread,

Jul 19, 2001, 12:03:29 PM7/19/01

to

In article <3B56AEBC...@web.de>, Alexander Terekhov wrote:
>Kaz Kylheku wrote:
>
>[...]
>> In GCC, you can enforce alignment like this:
>>
>> char byte1 __attribute__ ((aligned (32)));
>> char byte2 __attribute__ ((aligned (32)));
>>
>> So now byte1 is placed at the start of a 32 byte block, and
>> byte2 is placed at the start of the next one. So if the granule
>> size is 32, everything is cool.
>>
>> All you need is some macro which adds this to your declaration
>>
>> #define GRANULARIZE(X) X __attribute__ ((aligned (GRANULE_SIZE)));
>
>hmm.. your macro controls alignment, fine.. but how about padding?
>
>GRANULARIZE(char byte1); // shared scope 1; OK
>GRANULARIZE(char byte2); // shared scope 2; OK??

Since both bytes are aligned to a granule, they can't share any granules.

So you need either alignment or padding, but not necessarily both.

>.
>.
>char byte3;
>
>could easily 'break' byte2 (and byte3) !

That's right, but we could attribute the cause of that breakage to
that byte not being wrapped in GRANULARIZE(), rather than to the lack
of padding in the previous wrapped object.

Alexander Terekhov

unread,

Jul 20, 2001, 3:38:06 AM7/20/01

to

Kaz Kylheku wrote:

> In article <3B56AEBC...@web.de>, Alexander Terekhov wrote:
> >Kaz Kylheku wrote:
> >
> >[...]
> >> In GCC, you can enforce alignment like this:
> >>
> >> char byte1 __attribute__ ((aligned (32)));
> >> char byte2 __attribute__ ((aligned (32)));
> >>
> >> So now byte1 is placed at the start of a 32 byte block, and
> >> byte2 is placed at the start of the next one. So if the granule
> >> size is 32, everything is cool.
> >>
> >> All you need is some macro which adds this to your declaration
> >>
> >> #define GRANULARIZE(X) X __attribute__ ((aligned (GRANULE_SIZE)));
> >
> >hmm.. your macro controls alignment, fine.. but how about padding?
> >
> >GRANULARIZE(char byte1); // shared scope 1; OK
> >GRANULARIZE(char byte2); // shared scope 2; OK??
>
> Since both bytes are aligned to a granule, they can't share any granules.

right, they do not. with "shared scope" i meant the following:

b11,b12,b13// shared scope 1 = threads A & B (using some lock L1)
b21,b22,b23// shared scope 2 = threads C & D (using some lock L2)
b31,b32,b33// non-shared scope 3 = thread X
b41,b42,b43// non-shared scope 4 = thread Y
// "special" scope 5 = any thread & signal handler - covered by
// static volatile sig_atomic_t

clearly, with respect to memory access within each serial scope
(shared or non-shared) - bN?<-"in the middle", we do not have
any problems with respect to word tearing; we just need to isolate
data accessed within each serial scope from the data accessed in
all other serial scopes -- here (on the borders - bN1..?,?..bN?)
we potential have a problem of word tearing (correctness) and
a problem of cache trashing on multiprocessors (performance).
for sequentially allocated bXYs it is sufficient to align bX1 and
to pad bX3..

> So you need either alignment or padding, but not necessarily both.

hmm.. in order to isolate all our bytes from "default" non-GRANULARIZEd
data, first one need to be aligned, last one need to be padded,
second, third, ... could be aligned or padded -- result is the
same (but only if the order of actual allocation matches the
order of declaration -- there is no risk to have aligned_or_padded
mixed with non-aligned and non-padded "default" data). it is just
much more robust to align & pad, IMHO.

regards,
alexander.