Most conforming POSIX threads implementation

Timur Aydin

unread,

Jul 10, 2001, 12:38:43 PM7/10/01

to

Hi,

Which OS has the most conforming POSIX threads implementation?

Timur.

David Butenhof

unread,

Jul 11, 2001, 8:10:12 AM7/11/01

to

"Timur Aydin" <tayd at bicom-inc dot com> wrote:

> Which OS has the most conforming POSIX threads implementation?

Mine, of course, on Tru64 UNIX V5.1A. (Oh yeah, but it hasn't actually
released yet... ;-) )

Seriously, though, any "fully conforming POSIX threads implementation",
right now, is broken and shouldn't be used. There are several serious bugs
in POSIX 1003.1-1996 that should not be implemented. (These have been fixed
for 1003.1-2001, but that's still in balloting and thus isn't yet really a
standard.)

So what you really want is an implementation that's "sufficiently
conforming" without being "fully conforming". So do you want one that does
everything in the standard that SHOULD be done and nothing that SHOULDN'T
be done? Is it a firm requirement that this implementation have no bugs,
known or unknown? Yeah, that's a grin, but it's also serious, since an
implementation with conformance bugs can't really be said to conform, at
least by any abstract and objective definition of "conform". Pragmatically,
the best objective use of the term would be to claim the UNIX 98 brand,
proof of having passed the VSTH test suite; but that suite isn't perfect.

Once you loosen the bounds of "100% strict conformance", we get to the
important issue... which is deciding what meets your actual needs. The
current LinuxThreads implementation falls well short of full conformance;
but while it fails to implement many features of the standard, it also so
far as I know fails to implement any of the standard's bugs. For most
applications, that implementation is going to be quite sufficient.

IBM is working on NGPT ("Next Generation" POSIX threads), which they
claim will relieve most if not all of the conformance bugs in LinuxThreads.
However, as far as I can tell (as it appears to require no substantial
kernel changes) it will inevitably add a set of bugs that the developers
apparently like (or at least accept), and will share many of the
"weaknesses" (some of which many consider actual conformance bugs) of the
Solaris and AIX two-level scheduler implementations. They appear to be
doing this principally because current limitations of Java encourage a
"thread per client" design pattern, and "one to one" kernel thread
implementations such as LinuxThreads tend to perform poorly with
unreasonably large numbers of threads. They will give up a lot to gain
support for "thousands of threads" servers that violate many principles of
good threaded design and probably won't work well anyway.

So... what particular conformance do you want? ;-)

Or, to put it another way... choose your own most personally useful
definition of "conformance", and look for the implementation that most
closely implements it.

/------------------[ David.B...@compaq.com ]------------------\
| Compaq Computer Corporation POSIX Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/

Timur Aydin

unread,

Jul 11, 2001, 12:07:06 PM7/11/01

to

"David Butenhof" <David.B...@compaq.com> wrote in message

Thanks for the info...

I was thinking of using a system with that OS as a "reference system".
Currently, the API libraries that I am working on are supported for Windows
NT and Linux. I was going to port the libraries under this OS with a highly
conforming PTHREADS implementation and use it as an additional pre-release
test. This way, if the library code accidentally relies on a bug or
something else that is not conformant to the standard, it would show up on
the reference system.

> IBM is working on NGPT ("Next Generation" POSIX threads), which they
> claim will relieve most if not all of the conformance bugs in
LinuxThreads.

Yes, I have read about this. But what I found strange is that they have a
web based defect tracking system where only the developers can add new
defects. Everybody else can only add comments to existing defects. It looks
like this is not a collaborative development effort similar to the linux
kernel or mozilla.

Timur.

Alexander Terekhov

unread,

Jul 11, 2001, 1:35:49 PM7/11/01

to

Timur Aydin wrote:

[...]

> > IBM is working on NGPT ("Next Generation" POSIX threads), which they
> > claim will relieve most if not all of the conformance bugs in
> > LinuxThreads.
>
> Yes, I have read about this. But what I found strange is that they have a
> web based defect tracking system where only the developers can add new
> defects. Everybody else can only add comments to existing defects. It looks
> like this is not a collaborative development effort similar to the linux
> kernel or mozilla.

really?

http://oss.software.ibm.com/developerworks/bugs/?func=addbug&group_id=25

"Registered developers should log in to submit a bug.

Users: Please enter your e-mail address and then fill
in the form below to submit your bug. "

i am not "registered developer" and was able to submit a bug:

Project: NGPT - Next Generation POSIX Threading
Summary: pthread_once lacks thread/mem (e.g. DCL with membars) synch

Bug #1784 has been published.

http://oss.software.ibm.com/developerworks/bugs/?group_id=25

Bug ID Summary Date Assigned to Submitted by

1784 pthread_once lacks thread/mem (e.g. DCL with membars) synch
07/11/01 12:44 None tere...@web.de

regards,
alexander.

Norman Black

unread,

Jul 11, 2001, 4:15:33 PM7/11/01

to

> Or, to put it another way... choose your own most personally useful
> definition of "conformance",

I favorite definition is, "does it work as documented".

Dealing with the rest is easy for me since I never call APIs directly. I use
encapsulations, mostly 'thin' encapsulations.

--
Norman Black
Stony Brook Software
the reply, fubar => ix.netcom

"David Butenhof" <David.B...@compaq.com> wrote in message

news:FmX27.537$rc5....@news.cpqcorp.net...

Timur Aydin

unread,

Jul 11, 2001, 5:19:12 PM7/11/01

to

"Alexander Terekhov" <tere...@web.de> wrote in message

> really?
>
> http://oss.software.ibm.com/developerworks/bugs/?func=addbug&group_id=25
>
> "Registered developers should log in to submit a bug.
>
> Users: Please enter your e-mail address and then fill
> in the form below to submit your bug. "
>
> i am not "registered developer" and was able to submit a bug:

I had checked that website about a month ago and it didn't allow users to
add a but. But I understand now it does allow this.

Timur.

David Butenhof

unread,

Jul 12, 2001, 1:29:33 PM7/12/01

to

"Timur Aydin" <tayd at bicom-inc dot com> wrote:

>
> "David Butenhof" <David.B...@compaq.com> wrote in message
>
> Thanks for the info...
>
> I was thinking of using a system with that OS as a "reference system".
> Currently, the API libraries that I am working on are supported for
> Windows NT and Linux. I was going to port the libraries under this OS with
> a highly conforming PTHREADS implementation and use it as an additional
> pre-release test. This way, if the library code accidentally relies on a
> bug or something else that is not conformant to the standard, it would
> show up on the reference system.

This is a little different. What you want is a "fully limited" system that
does nothing that's not absolutely required by the standard. (Therefore,
one might suppose that anything that works there will work elsewhere.) I
doubt there is any such implementation. Certainly my implementation
provides a lot of features (both implicit and explicit) that are not
required by the standard: therefore, any application dependent on those
extensions (or options) wouldn't necessarily be guaranteed to run on any
other "fully conforming" implementation.

This condition is similar to, but more strict, than the definition of a
"strictly conforming" implementation. Even the definition of "strictly
conforming" allows for some variation, where behavior is undefined or
unspecified.

You should be careful to write a strictly conforming APPLICATION, that does
not presume any behavior where the specification says "undefined" or
"unspecified", or "implementation specified", and does not use any
extensions or options. That application (assuming you've done everything
correctly) should behave about the same on any CORRECT implementation. But
there's no simple way to VALIDATE that you've written a strictly conforming
application.

Dima Volodin

unread,

Jul 13, 2001, 8:23:24 AM7/13/01

to

David Butenhof wrote:
>
> You should be careful to write a strictly conforming APPLICATION, that does
> not presume any behavior where the specification says "undefined" or
> "unspecified", or "implementation specified", and does not use any
> extensions or options. That application (assuming you've done everything
> correctly) should behave about the same on any CORRECT implementation. But
> there's no simple way to VALIDATE that you've written a strictly conforming
> application.

Here's an application in wihch one thread modifies one memory location, and
another thread modifies a different memory location:

#include <stdio.h>
#include <pthread.h>

char twoBytes [2];

void*
run (void* v)
{
if ((char*)v == &twoBytes[0]) {
twoBytes [0] = 'a';
} else {
twoBytes [1] = 'b';
}

return 0;
}

int
main ()
{
pthread_t t1, t2;
void* ret;

if (pthread_create (&t1, NULL, &run, &twoBytes [0]) ||
pthread_create (&t2, NULL, &run, &twoBytes [1])) {
printf ("failed to create a thread\n");
return 1;
}

pthread_join (t1, &ret);
pthread_join (t2, &ret);

printf ("twoBytes = %.2s\n", twoBytes);

return 0;
}

/**************************************/

What can you tell about the level of coformity of this application? Why?

> /------------------[ David.B...@compaq.com ]------------------\

Thanks!

Dima

Timur Aydin

unread,

Jul 13, 2001, 10:30:53 AM7/13/01

to

"Dima Volodin" <d...@dvv.org> wrote in message

> What can you tell about the level of coformity of this application? Why?

Well, aside from syntactical problems (you must pass "run" to
pthread_create, not "&run"), I didn't see any nonconforming usage...

Timur.

Kaz Kylheku

unread,

Jul 13, 2001, 1:04:43 PM7/13/01

to

In article <3B4EE84F...@dvv.org>, Dima Volodin wrote:
>David Butenhof wrote:
>>
>> You should be careful to write a strictly conforming APPLICATION, that does
>> not presume any behavior where the specification says "undefined" or
>> "unspecified", or "implementation specified", and does not use any
>> extensions or options. That application (assuming you've done everything
>> correctly) should behave about the same on any CORRECT implementation. But
>> there's no simple way to VALIDATE that you've written a strictly conforming
>> application.
>
>Here's an application in wihch one thread modifies one memory location, and
>another thread modifies a different memory location:

>#include <stdio.h>
>#include <pthread.h>
>
>char twoBytes [2];

Note that in the abstract C semantics, twoBytes is considered a single
object, an array of 2 char.

To be safe, you have to ensure mutual exclusion on the entire C object,
not portions thereof. That applies to arrays and structs.

Of course, if you understand how objects are allocated, and if you
understand what the smallest atomic data size is on the target machine,
you can get away with more. But then you are making hardware-dependent
assumptions.

Dima Volodin

unread,

Jul 13, 2001, 12:31:35 PM7/13/01

to

Timur Aydin wrote:

> "Dima Volodin" <d...@dvv.org> wrote in message
> > What can you tell about the level of coformity of this application? Why?
>
> Well, aside from syntactical problems (you must pass "run" to
> pthread_create, not "&run"),

I mustn't, see [6.3.3.2] and other places.

> I didn't see any nonconforming usage...\

> Timur.

Dima

Kaz Kylheku

unread,

Jul 13, 2001, 1:15:54 PM7/13/01

to

In article <3B4F2277...@dvv.org>, Dima Volodin wrote:
>Timur Aydin wrote:
>
>> "Dima Volodin" <d...@dvv.org> wrote in message
>> > What can you tell about the level of coformity of this application? Why?
>>
>> Well, aside from syntactical problems (you must pass "run" to
>> pthread_create, not "&run"),
>
>I mustn't, see [6.3.3.2] and other places.

What? &run and run mean the same thing. A function designator expression
yields a pointer to the function, unless it is the subject of the &
operator or sizeof operator. If it's the operand of &, then that operator
yields a pointer to the function. So, same thing! (The sizeof case
is a constraint violation, since sizeof ``shall not be applied
to an expression that has function type ...'').

You can dereference the pointer-to-function to recover a function
designator, which again yields the pointer. So these calls
are all identical:

puts("hello");
(&puts)("hello");
(*puts)("hello");
(**puts)("hello");
(***puts)("hello");
(****puts)("hello");
(****&puts)("hello");

Dima Volodin

unread,

Jul 13, 2001, 1:29:04 PM7/13/01

to

Kaz Kylheku wrote:

> In article <3B4EE84F...@dvv.org>, Dima Volodin wrote:
> >David Butenhof wrote:
> >>
> >> You should be careful to write a strictly conforming APPLICATION, that does
> >> not presume any behavior where the specification says "undefined" or
> >> "unspecified", or "implementation specified", and does not use any
> >> extensions or options. That application (assuming you've done everything
> >> correctly) should behave about the same on any CORRECT implementation. But
> >> there's no simple way to VALIDATE that you've written a strictly conforming
> >> application.
> >
> >Here's an application in wihch one thread modifies one memory location, and
> >another thread modifies a different memory location:
>
> >#include <stdio.h>
> >#include <pthread.h>
> >
> >char twoBytes [2];
>
> Note that in the abstract C semantics, twoBytes is considered a single
> object, an array of 2 char.

What part of the C standard describes that?

> To be safe, you have to ensure mutual exclusion on the entire C object,
> not portions thereof. That applies to arrays and structs.

Does POSIX define its memory model in terms of C objects? Could you qoute the
appropriate parts of the standard, please? Anyway, here's another app which
differs somewhat from the original one:

#include <stdio.h>
#include <pthread.h>

int twoInts [2];

void*
run (void* v)
{
if ((int*)v == &twoInts[0]) {
twoInts [0] = 'a';
} else {
twoInts [1] = 'b';
}

return 0;
}

int
main ()
{
pthread_t t1, t2;
void* ret;

if (pthread_create (&t1, NULL, &run, &twoInts [0]) ||
pthread_create (&t2, NULL, &run, &twoInts [1])) {

printf ("failed to create a thread\n");
return 1;
}

pthread_join (t1, &ret);
pthread_join (t2, &ret);

printf ("twoInts = %c%c\n", twoInts [0], twoInts [1]);

return 0;
}

Note that we are still dealing with arrays here. Is this variant POSIX-compliant?

> Of course, if you understand how objects are allocated, and if you
> understand what the smallest atomic data size is on the target machine,
> you can get away with more. But then you are making hardware-dependent
> assumptions.

I don't want to make hardware assumptions or get away with more, I just want to
stay strictly inside POSIX boundaries.

Dima

Alexander Terekhov

unread,

Jul 13, 2001, 1:48:46 PM7/13/01

to

Kaz Kylheku wrote:

[...]

> >#include <stdio.h>
> >#include <pthread.h>
> >
> >char twoBytes [2];
>
> Note that in the abstract C semantics, twoBytes is considered a single
> object, an array of 2 char.

ok. lets make two "objects":

char byte1;
char byte2;

is it safe now? i do not think so.

> To be safe, you have to ensure mutual exclusion on the entire C object,
> not portions thereof. That applies to arrays and structs.

to be safe i have to ensure mutual exclusion on every "memory
location"..
the problem is that no standard defines it :(

and it could be really ugly.. have a look at:

http://www.openvms.compaq.com:8000/72final/6493/6101pro_007.html#index_x_370

...volatiles ! ;-)

"(On DIGITAL UNIX systems) For arrays, add the C language volatile
storage qualifier to the definition of the entire array; for
structures,
add volatile to the declaration of only those members that share the
pertinent memory granule. You must also compile the application's
modules using the DEC C or DEC C++ compiler's -strong-volatile switch.
Doing so causes the compiler to produce code that forces all accesses
to those members to occur as atomic operations. See the description of
the -strong-volatile switch in the DEC C or DEC C++ documentation and
on the cc reference page. "

regards,
alexander.

David Schwartz

unread,

Jul 13, 2001, 1:46:03 PM7/13/01

to

Dima Volodin wrote:

> Here's an application in wihch one thread modifies one memory location, and
> another thread modifies a different memory location:

Another memory location != another object.

DS

Dima Volodin

unread,

Jul 13, 2001, 2:09:36 PM7/13/01

to

Kaz Kylheku wrote:

> In article <3B4F2277...@dvv.org>, Dima Volodin wrote:
> >Timur Aydin wrote:
> >
> >> "Dima Volodin" <d...@dvv.org> wrote in message
> >> > What can you tell about the level of coformity of this application? Why?
> >>
> >> Well, aside from syntactical problems (you must pass "run" to
> >> pthread_create, not "&run"),
> >
> >I mustn't, see [6.3.3.2] and other places.
>
> What? &run and run mean the same thing.

Absolutely. English as a Second Language strikes again. Anyway, I prefer to use
the "&run" notation to keep it uniform with the way you take addresses of
C++ object and class members.

Dima

Dima Volodin

unread,

Jul 13, 2001, 2:11:35 PM7/13/01

to

David Schwartz wrote:

Go on, please.

> DS

Dima

Alexander Terekhov

unread,

Jul 13, 2001, 2:33:31 PM7/13/01

to

David Schwartz wrote:

another memory address != another memory location (memory granule).

regards,
alexander.

Kaz Kylheku

unread,

Jul 13, 2001, 2:49:23 PM7/13/01

to

In article <3B4F347E...@web.de>, Alexander Terekhov wrote:
>Kaz Kylheku wrote:
>
>[...]
>> >#include <stdio.h>
>> >#include <pthread.h>
>> >
>> >char twoBytes [2];
>>
>> Note that in the abstract C semantics, twoBytes is considered a single
>> object, an array of 2 char.
>
>ok. lets make two "objects":
>
>char byte1;
>char byte2;
>
>is it safe now? i do not think so.

Yes.

>> To be safe, you have to ensure mutual exclusion on the entire C object,
>> not portions thereof. That applies to arrays and structs.
>
>to be safe i have to ensure mutual exclusion on every "memory
>location"..
>
>the problem is that no standard defines it :(

If no standard defines it, then where did you get the requirement?

This is the compiler's problem. You have here two separately declared
data objects. If two threads cannot access these without interference,
the language implementation has broken support for multithreading,
because it's violating a fundamental principle! How can you write
multithreaded programs at the higher language level if you cannot be
sure that two distinct variables do not interfere?

A C compiler is free to insert space between byte1 and byte2, to ensure
that they are in different memory cells, so there is no excuse for this
breakage. But it's not free to insert space between two adjacent elements
of an array object. So if concurrent access to array elements is allowed,
it could only be supported by making *no* type smaller than the size of
the memory granule. So, e.g. on a machine with 64 bit granules, chars
would have to be made 64 bits wide to support the kind of MT programming
exemplified by the Dima's twoCell program. But byte1 and byte2
can be 8 bits wide, yet placed in separate 64 bit granules.

>and it could be really ugly.. have a look at:
>
>http://www.openvms.compaq.com:8000/72final/6493/6101pro_007.html#index_x_370

Here, it did not take me long to find this text: ``The only data
objects that are candidates for participating in a word-tearing
race condition are members of composite data objects---that is,
C language structures, unions and arrays''.
(Points off for not using the correct C terminology: aggregate types).

>...volatiles ! ;-)
>
>"(On DIGITAL UNIX systems) For arrays, add the C language volatile
> storage qualifier to the definition of the entire array; for
>structures,
> add volatile to the declaration of only those members that share the
> pertinent memory granule. You must also compile the application's
> modules using the DEC C or DEC C++ compiler's -strong-volatile switch.
> Doing so causes the compiler to produce code that forces all accesses
> to those members to occur as atomic operations. See the description of
> the -strong-volatile switch in the DEC C or DEC C++ documentation and
> on the cc reference page. "

Again, what this is saying is that programs that treat portions of arrays
as separate objects and use them concurrently are broken. Because they
are broken, you have to invoke a cascade of platform-specific workarounds:
modifying the program with volatile (something that POSIX does not
require!) and using special compiler options to adjust the semantics
of volatile.

Dima Volodin

unread,

Jul 13, 2001, 3:02:44 PM7/13/01

to

Kaz Kylheku wrote:

> >http://www.openvms.compaq.com:8000/72final/6493/6101pro_007.html#index_x_370
>
> Here, it did not take me long to find this text: ``The only data
> objects that are candidates for participating in a word-tearing
> race condition are members of composite data objects---that is,
> C language structures, unions and arrays''.

Is it a POSIX feature or a mis-feature (i.e. a bug) of this particular
implementation? If former, what part of the standard allows or mandates this
behavior?

Dima

Timur Aydin

unread,

Jul 13, 2001, 4:15:06 PM7/13/01

to

"Alexander Terekhov" <tere...@web.de> wrote in message

> ok. lets make two "objects":
>
> char byte1;
> char byte2;
>
> is it safe now? i do not think so.
>

Can you please elaborate on why this is not safe? The purpose of the program
is to write values to two separate char's, using two separate threads.
Assuming that the sole purpose of the program is this, isn't it a waste to
use mutual exclusion here?

What type of hardware architecture or configuration would cause a problem
here?

Timur.

Alexander Terekhov

unread,

Jul 13, 2001, 4:13:31 PM7/13/01

to

Kaz Kylheku wrote:

[...]

> >> To be safe, you have to ensure mutual exclusion on the entire C object,
> >> not portions thereof. That applies to arrays and structs.
> >
> >to be safe i have to ensure mutual exclusion on every "memory
> >location"..
> >
> >the problem is that no standard defines it :(
>
> If no standard defines it, then where did you get the requirement?

IEEE P1003.1, Draft 7, June 2001/ Open Group Technical Standard, Issue 6
Memory Synchronization General Concepts
3111 4.10 Memory Synchronization
3112 Applications shall ensure that access to any memory location by
more than one thread of control
3113 (threads or processes) is restricted such that no thread of control
can read or modify a memory
3114 location while another thread of control may be modifying it. Such
access is restricted using
3115 functions that synchronize thread execution and also synchronize
memory with respect to other
3116 threads. The following functions synchronize memory with respect to
other threads: ...

fyi..
http://groups.google.com/groups?as_umsgid=3B0CEA34...@compaq.com

Dave Butenhof wrote:

:> > > PORTABILITY doesn't mean that you can do any bloody thing you
want and have it work
:> > > everywhere. Rather, it means that the standard has specified a
range of uses that ARE
:> > > portable. If you don't do things beyond the scope of the
standard, or things precluded
:> > > by the standard, you're fine.
:> >
:> > what exactly is "beyond the scope of the standard"
:> > and/or "precluded by the standard" with the following:
:> >
:> > char charForThreadA; // r/w thread A _only_
:> > char charForThreadB; // r/w thread B _only_
:>
:> POSIX says you cannot have multiple threads using "a memory location"
without explicit
:> synchronization. POSIX does not claim to know, nor try to specify,
what constitutes "a
:> memory location" or access to it, across all possible system
architectures.
[...]

> This is the compiler's problem. You have here two separately declared
> data objects. If two threads cannot access these without interference,
> the language implementation has broken support for multithreading,
> because it's violating a fundamental principle! How can you write
> multithreaded programs at the higher language level if you cannot be
> sure that two distinct variables do not interfere?

you cannot; with respect to "small" variables.
and that is a real problem, IMHO.

JAVA explicitly says:

"A variable refers to a static variable of a loaded
class, a field of an allocated object, or element of
an allocated array. The system must maintain the
following properties with regards to variables and the
memory manager:
...
The fact that two variables may be stored in adjacent
bytes (e.g., in a byte array) is immaterial.
Two variables can be simultaneously updated by
different threads without needing to use synchronization
to account for the fact that they are ``adjacent''. "

POSIX does not provide such strong guaranties and
AFAIK neither POSIX nor C prohibit sharing of
single memory granule for two or more variables.

> A C compiler is free to insert space between byte1 and byte2, to ensure
> that they are in different memory cells, so there is no excuse for this
> breakage

which breakage? is C compiler _required_ to insert space between
byte1 and byte2 ??

> >and it could be really ugly.. have a look at:
> >
> >http://www.openvms.compaq.com:8000/72final/6493/6101pro_007.html#index_x_370
>
> Here, it did not take me long to find this text: ``The only data
> objects that are candidates for participating in a word-tearing
> race condition are members of composite data objects---that is,
> C language structures, unions and arrays''.

but how about: "For instance, given a multithreaded program that has
been compiled to have longword actual granularity, if any two of the
program's threads can concurrently update different bytes or words
in the same longword, then that program is, in theory, at risk for
encountering a word-tearing race condition" ??

well, you are probably suggesting that this problem is pure
"quality of implementation" issue.. i do not think so.

regards,
alexander.

Alexander Terekhov

unread,

Jul 13, 2001, 5:19:23 PM7/13/01

to

Timur Aydin wrote:

http://groups.google.com/groups?as_umsgid=34FD6950...@40zko.dec.com

regards,
alexander.

ps. fixed size "small" object allocator for MT program...
any ideas how it could be done _portably_?

Alexander Terekhov

unread,

Jul 13, 2001, 5:24:54 PM7/13/01

to

Alexander Terekhov wrote:

> http://groups.google.com/groups?as_umsgid=34FD6950...@40zko.dec.com

err..

http://groups.google.com/groups?as_umsgid=34FD6950...@zko.dec.com

regards,
alexander.

Ken Whaley

unread,

Jul 13, 2001, 7:04:20 PM7/13/01

to

Memory granules aside, what does POSIX say about the memory visibility
of objects modified by a thread when that thread exits/returns or is
joined? Since there is no mutex used to access the memory locations,
if running on a MP system, does pthread_join guarantee (according to
the standard) that the memory stores in the writing thread will be
coherent with subsequent reads in the main thread?

Ken

"Dima Volodin" <d...@dvv.org> wrote in message

news:3B4EE84F...@dvv.org...

Alexander Terekhov

unread,

Jul 14, 2001, 9:10:08 AM7/14/01

to

Ken Whaley wrote:

> Memory granules aside, what does POSIX say about the memory visibility
> of objects modified by a thread when that thread exits/returns or is
> joined? Since there is no mutex used to access the memory locations,
> if running on a MP system, does pthread_join guarantee (according to
> the standard) that the memory stores in the writing thread will be
> coherent with subsequent reads in the main thread?

yes, pthread_join provides execution _and_memory_ synchronization.

3111 4.10 Memory Synchronization
3112 Applications shall ensure that access to any memory location by
more than one thread of control
3113 (threads or processes) is restricted such that no thread of control
can read or modify a memory
3114 location while another thread of control may be modifying it. Such
access is restricted using
3115 functions that synchronize thread execution and also synchronize
memory with respect to other
3116 threads. The following functions synchronize memory with respect to
other threads:

3117 fork ()
3118 pthread_barrier_wait()
3119 pthread_cond_broadcast()
3120 pthread_cond_signal ()
3121 pthread_cond_timedwait()
3122 pthread_cond_wait()
3123 pthread_create()

3124 pthread_join ()
^^^^^^^^^^^^

3125 pthread_mutex_lock()
pthread_mutex_timedlock()
pthread_mutex_trylock()
pthread_mutex_unlock()
pthread_spin_lock()
pthread_spin_trylock()
pthread_spin_unlock()
pthread_rwlock_rdlock()
pthread_rwlock_timedrdlock()
pthread_rwlock_timedwrlock()
pthread_rwlock_tryrdlock()
pthread_rwlock_trywrlock()
pthread_rwlock_unlock()
pthread_rwlock_wrlock()
sem_post()
sem_trywait()
sem_wait()
wait()
waitpid ()
3126 Unless explicitly stated otherwise, if one of the above functions
returns an error, it is unspecified
3127 whether the invocation causes memory to be synchronized.
3128 Applications may allow more than one thread of control to read a
memory location
3129 simultaneously.

regards,
alexander.

Joe Seigh

unread,

Jul 16, 2001, 5:50:21 AM7/16/01

to

Alexander Terekhov wrote:
>
> ps. fixed size "small" object allocator for MT program...
> any ideas how it could be done _portably_?

malloc() by virtue of it claiming to be "thread safe". POSIX
doesn't address or support the issue of granularity directly.

Joe Seigh

unread,

Jul 16, 2001, 6:31:08 AM7/16/01

to

David Butenhof wrote:
>
...

> This condition is similar to, but more strict, than the definition of a
> "strictly conforming" implementation. Even the definition of "strictly
> conforming" allows for some variation, where behavior is undefined or
> unspecified.
>
> You should be careful to write a strictly conforming APPLICATION, that does
> not presume any behavior where the specification says "undefined" or
> "unspecified", or "implementation specified", and does not use any
> extensions or options. That application (assuming you've done everything
> correctly) should behave about the same on any CORRECT implementation. But
> there's no simple way to VALIDATE that you've written a strictly conforming
> application.
>

You should be able to formally prove certain program properties by using POSIX semantics.
Not simple but doable. Those properties should then hold on any conforming POSIX
implementation. You might have to make certain assumptions about stuff not covered
or addressed by POSIX such as word tearing or forward progress, but at least you'd
be aware of those issues and presumably would deal with them by other means.

Proving threaded program correctness should be encouraged more considering how
difficult it is to demonstrate threaded program correctness by runtime testing.
A "most conforming POSIX threads implementation" isn't going to help here.
Though a "pathological" implementation would be really nice, one designed to catch
out all those invalid assumptions such as FIFO locks, forward progress, really
aggressive memory architecture and granularity, etc... That would be useful.
Virtual machine architectures did this in the area of timing assumptions.
If your virtual processors were emulated using time slicing, those time slice
ends could end up in "critical" sections of code that were making invalid
assumptions and expose race conditions where the exposure window was normally
too small to detect through normal testing.

Joe Seigh

Alexander Terekhov

unread,

Jul 16, 2001, 8:21:37 AM7/16/01

to

Joe Seigh wrote:

> Alexander Terekhov wrote:
> >
> > ps. fixed size "small" object allocator for MT program...
> > any ideas how it could be done _portably_?
>
> malloc() by virtue of it claiming to be "thread safe".

i know. thread safety with respect to execution synchronization
is not an issue here; it is well defined. that is not the case
with respect to memory synchronization...

> POSIX doesn't address or support the issue of granularity directly.

IMHO that makes it practically impossible not to break memory
synchronization rules/restrictions (4.10); at least when programming
something like fixed size "small" object allocators and things
using shared memory to store multiple asynch. modifiable "small"
objects/variables per allocated block of shared memory, etc..
the standard does define the rule for "memory locations" and at
the same time "memory location" is left undefined !? i simply
do not get it. how could i follow the rule which is practically
undefined??

regards,
alexander.

Dima Volodin

unread,

Jul 16, 2001, 9:05:42 AM7/16/01

to

Joe Seigh wrote:
>
> Alexander Terekhov wrote:
> >
> > ps. fixed size "small" object allocator for MT program...
> > any ideas how it could be done _portably_?
>
> malloc() by virtue of it claiming to be "thread safe".

That is all and every memory should be allocated via malloc(), and you cannot
really use an allocator that allots memory from its own arena, right?

> POSIX
> doesn't address or support the issue of granularity directly.

Does it address or support the issue indirectly? Could you quote the wording,
please?

> Joe Seigh

Dima

David Schwartz

unread,

Jul 14, 2001, 1:33:06 AM7/14/01

to

Timur Aydin wrote:
>
> "Alexander Terekhov" <tere...@web.de> wrote in message
> > ok. lets make two "objects":

> > char byte1;
> > char byte2;

> > is it safe now? i do not think so.

> Can you please elaborate on why this is not safe? The purpose of the program
> is to write values to two separate char's, using two separate threads.
> Assuming that the sole purpose of the program is this, isn't it a waste to
> use mutual exclusion here?

You can't tell if it's safe or not because you can't tell if they
really are "two separate chars" or not. For example:

int main(void)
{
char a;
char b;
...
}

Here you really do have two separate chars. But:

struct foo
{
char a;
char b;
};

Here you have one structure. Yes, two chars, but not two separate
chars.

> What type of hardware architecture or configuration would cause a problem
> here?

We're talking about coding to a standard here, so hardware architecture
or configuration is not relevant.

DS

Joe Seigh

unread,

Jul 16, 2001, 12:54:23 PM7/16/01

to

Alexander Terekhov wrote:
>
> Joe Seigh wrote:
>
> > Alexander Terekhov wrote:
> > >
> > > ps. fixed size "small" object allocator for MT program...
> > > any ideas how it could be done _portably_?
> >
> > malloc() by virtue of it claiming to be "thread safe".
>
> i know. thread safety with respect to execution synchronization
> is not an issue here; it is well defined. that is not the case
> with respect to memory synchronization...

No, malloc would have to be thread safe with respect to word tearing
or granularity. If it wasn't it would be an absolute nightmare
to support because it have a open bug list miles long. It wouldn't
matter that most of the bugs were "spurious". Dealing with that
list of bugs would kill you.

I suppose you could have a malloc implementation that kept only its internal
allocation structures safe but that would be a strange attitude for an
implementation to take. You wouldn't just be exposing the application
in question but all the "thread safe" libraries as well.

>
> > POSIX doesn't address or support the issue of granularity directly.
>
> IMHO that makes it practically impossible not to break memory
> synchronization rules/restrictions (4.10); at least when programming
> something like fixed size "small" object allocators and things
> using shared memory to store multiple asynch. modifiable "small"
> objects/variables per allocated block of shared memory, etc..
> the standard does define the rule for "memory locations" and at
> the same time "memory location" is left undefined !? i simply
> do not get it. how could i follow the rule which is practically
> undefined??

Apparently the issue does not exist as far as POSIX is concerned.
In other words, POSIX only partially addresses all of the issues
in multi-threaded programming. I'm not defending it. That's
just the reality of it. POSIX suffers because it's addressing
the problem of multithreading programming via a library mechanism
and not so much through a language and other mechanisms. Contrast
that with Java, which has a definition cleanly integrated with
respect to library or class structure, language, and os/architecture.

Joe Seigh

Dima Volodin

unread,

Jul 16, 2001, 12:49:43 PM7/16/01

to

David Schwartz wrote:
> You can't tell if it's safe or not because you can't tell if they
> really are "two separate chars" or not. For example:
>
> int main(void)
> {
> char a;
> char b;
> ...
> }
>
> Here you really do have two separate chars. But:
>
> struct foo
> {
> char a;
> char b;
> };
>
> Here you have one structure. Yes, two chars, but not two separate
> chars.

Either you are mixing structs and unions or your definition of "separate" is
very different from mine. Could you elaborate here, please?

> > What type of hardware architecture or configuration would cause a problem
> > here?
>
> We're talking about coding to a standard here, so hardware architecture
> or configuration is not relevant.

And what does the standard says about stand-alone chars vs. chars as members of
a struct? I've been trying to find it out in vain for quite some time now.

> DS

Dima

Alexander Terekhov

unread,

Jul 16, 2001, 2:13:39 PM7/16/01

to

Joe Seigh wrote:

[...]

> > > malloc() by virtue of it claiming to be "thread safe".
> >
> > i know. thread safety with respect to execution synchronization
> > is not an issue here; it is well defined. that is not the case
> > with respect to memory synchronization...
>
> No, malloc would have to be thread safe with respect to word tearing
> or granularity.

malloc,mmap,shm_xxxx,... themself are 100% thread safe; hopefully :)
however, since memory granularity is not defined in some portable way,
it is simply impossible to portably use the allocated block of memory
for multiple asynch. modifiable "small" objects without invoking the
risk of word tearing race condition -- violation of posix mem.synch
rule (4.10)

regards,
alexander.

Torsten Robitzki

unread,

Jul 16, 2001, 2:14:11 PM7/16/01

to

Hello David,

David Schwartz wrote:
>
> Timur Aydin wrote:
> >
> > "Alexander Terekhov" <tere...@web.de> wrote in message
> > > ok. lets make two "objects":
>
> > > char byte1;
> > > char byte2;
>
> > > is it safe now? i do not think so.
>
> > Can you please elaborate on why this is not safe? The purpose of the program
> > is to write values to two separate char's, using two separate threads.
> > Assuming that the sole purpose of the program is this, isn't it a waste to
> > use mutual exclusion here?
>
> You can't tell if it's safe or not because you can't tell if they
> really are "two separate chars" or not. For example:
>
> int main(void)
> {
> char a;
> char b;
> ...
> }
>
> Here you really do have two separate chars. But:
>
> struct foo
> {
> char a;
> char b;
> };
>
> Here you have one structure. Yes, two chars, but not two separate
> chars.
>

would it make a difference if one wrap the chars in your second example
into structs ? For example:

struct foo
{
struct { char a; } a;
struct { char b; } b;
};

So, char a and b are part of different objects. Hm, but also part of the
outer struct.

If this would ask also for word tearing race conditions, what would
be a good approach to access different parts of an object synchronized
by different mutexs (and than would probably be accessed by different
thread)?

I can only think of using (never changing) pointers to heap allocated
memory instead of real members. For example:

struct foo
{
char* const a;
char* const b;
};

best regards
Torsten

Norman Black

unread,

Jul 16, 2001, 3:31:59 PM7/16/01

to

Generally speaking, if a processor supports a data type directly in
load/store instructions then that instruction will operate atomically with
respect to other processors.

The SPARC, MIPS, PowerPC/POWER, IA-64 architectures all support 8 and 16-bit
writes. This is NOT true for the Alpha however.

--
Norman Black
Stony Brook Software
the reply, fubar => ix.netcom

"Alexander Terekhov" <tere...@web.de> wrote in message

news:3B4F6726...@web.de...

Kaz Kylheku

unread,

Jul 16, 2001, 3:50:07 PM7/16/01

to

In article <3B52DC51...@web.de>, Alexander Terekhov wrote:
>Joe Seigh wrote:
>
>> Alexander Terekhov wrote:
>> >
>> > ps. fixed size "small" object allocator for MT program...
>> > any ideas how it could be done _portably_?
>>
>> malloc() by virtue of it claiming to be "thread safe".
>
>i know. thread safety with respect to execution synchronization
>is not an issue here; it is well defined. that is not the case
>with respect to memory synchronization...
>
>> POSIX doesn't address or support the issue of granularity directly.
>
>IMHO that makes it practically impossible not to break memory
>synchronization rules/restrictions (4.10); at least when programming
>something like fixed size "small" object allocators and things
>using shared memory to store multiple asynch. modifiable "small"
>objects/variables per allocated block of shared memory, etc..

Easy: in your allocator implementation, you can simply assume that there
is a memory granule size. Since you don't know what that is, you make
it a compile time parameter. For each supported platform, you should
be able find out what that granule size is and configure it accordingly.

Allocators can't be written 100% portably anyway because they need to
be able to return memory that is suitably aligned for any type,
without the knowledge of what types the memory will be used for.

>the standard does define the rule for "memory locations" and at
>the same time "memory location" is left undefined !? i simply

A document which uses a term that is not defined is defective. That needs
to be reported so that a future version of the document will be fix.

>do not get it. how could i follow the rule which is practically
>undefined??

By making the weakest possible assumption that seems reasonable, and
having a workaround strategy.

The weakest assumption is that ``word tearing'' can happen between any
two objects. But that is not reasonable, because then sane multithreading
is not possible.

The only dilemma is whether separately declared small objects whose
storage is allocated side by side are subject to this problem. a way
out of the dilemma is to avoid programs which associate a separate mutex
with each such object.

It's not commonplace to see chars or shorts as lone variables at file
scope, but objects type int or long are common. It's even less
commonplace to see such individual small objects assigned to different
mutexes.

Implementation ideas that are far removed from what is common are
likely not to be validated by the test suite that vendors use for their
multithreading compiler and library, and so may break regardless of what
the rules say or do not say.

Dima Volodin

unread,

Jul 16, 2001, 5:27:42 PM7/16/01

to

Kaz Kylheku wrote:

> In article <3B52DC51...@web.de>, Alexander Terekhov wrote:
> >IMHO that makes it practically impossible not to break memory
> >synchronization rules/restrictions (4.10); at least when programming
> >something like fixed size "small" object allocators and things
> >using shared memory to store multiple asynch. modifiable "small"
> >objects/variables per allocated block of shared memory, etc..
>
> Easy: in your allocator implementation, you can simply assume that there
> is a memory granule size. Since you don't know what that is, you make
> it a compile time parameter. For each supported platform, you should
> be able find out what that granule size is and configure it accordingly.

> Allocators can't be written 100% portably anyway because they need to
> be able to return memory that is suitably aligned for any type,
> without the knowledge of what types the memory will be used for.

Using #ifdefs that are not defined by a standard is not a good practice when
you try to remain inside the standard's boundaries. Besides, it's really easy
to make a portable memory allocator that takes care of proper alignment for
allocated chunks without resorting to apriori unknown architecture details.

> >the standard does define the rule for "memory locations" and at
> >the same time "memory location" is left undefined !? i simply
>
> A document which uses a term that is not defined is defective. That needs
> to be reported so that a future version of the document will be fix.

Are you talking about POSIX 1003.1c here?

> >do not get it. how could i follow the rule which is practically
> >undefined??
>
> By making the weakest possible assumption that seems reasonable, and
> having a workaround strategy.
>
> The weakest assumption is that ``word tearing'' can happen between any
> two objects. But that is not reasonable, because then sane multithreading
> is not possible.

We have a lot of examples in comp.programming.threads when "reasonable
assumtions" turn out to be not so reasonable.

Dima

David Schwartz

unread,

Jul 16, 2001, 6:34:05 PM7/16/01

to

Dima Volodin wrote:

> > int main(void)
> > {
> > char a;
> > char b;
> > ...
> > }
> >
> > Here you really do have two separate chars. But:
> >
> > struct foo
> > {
> > char a;
> > char b;
> > };
> >
> > Here you have one structure. Yes, two chars, but not two separate
> > chars.

> Either you are mixing structs and unions or your definition of "separate" is
> very different from mine. Could you elaborate here, please?

If you define a struct, you have defined one struct. If, at function
level, you define two char's, you have defined two char's.

> And what does the standard says about stand-alone chars vs. chars as members of
> a struct? I've been trying to find it out in vain for quite some time now.

The standard says that a struct is considered a single object for
purposes of memory visibility.

DS

Joe Seigh

unread,

Jul 17, 2001, 6:26:09 AM7/17/01

to

If malloc'd memory was not thread safe, i.e. safe from word tearing,
then any library which malloc'd memory would not be thread safe
unless it had a safety margin on both ends of the malloc'd buffer.
Is anyone aware of any thread safe libraries on any platform that
actually do this?

Joe Seigh

David Butenhof

unread,

Jul 17, 2001, 7:02:46 AM7/17/01

to

Norman Black wrote:

> Generally speaking, if a processor supports a data type directly in
> load/store instructions then that instruction will operate atomically with
> respect to other processors.
>
> The SPARC, MIPS, PowerPC/POWER, IA-64 architectures all support 8 and
> 16-bit writes. This is NOT true for the Alpha however.

That hasn't been true since the EV56, which is now pretty old. However, the
compiler will still generally try to write code that can be executed by any
old Alpha unless you use the -arch switch. (Ironically, the addition of
byte and word instructions was mostly a concession to the economic
realities of porting NT device drivers... which may have already become
irrelevant by the time EV56 shipped, but certainly soon after.)

Still, more importantly, the C and C++ languages don't guarantee what
operations will be used to read or write your data, even when the hardware
happens to support an operation that would do the job atomically. So
whether the hardware can do it is really irrelevant unless you're writing
in assembler.

/------------------[ David.B...@compaq.com ]------------------\
| Compaq Computer Corporation POSIX Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/

Dima Volodin

unread,

Jul 17, 2001, 9:17:51 AM7/17/01

to

David Schwartz wrote:
>
> Dima Volodin wrote:
>
> > > int main(void)
> > > {
> > > char a;
> > > char b;
> > > ...
> > > }
> > >
> > > Here you really do have two separate chars. But:
> > >
> > > struct foo
> > > {
> > > char a;
> > > char b;
> > > };
> > >
> > > Here you have one structure. Yes, two chars, but not two separate
> > > chars.
>
> > Either you are mixing structs and unions or your definition of "separate" is
> > very different from mine. Could you elaborate here, please?
>
> If you define a struct, you have defined one struct. If, at function
> level, you define two char's, you have defined two char's.

When you define a struct with two chars, you also define two chars - being a
member of a struct doesn't make an object less of an object.

Now if we get back to our bytes: here's a little snippet in C:

#include <stdio.h>

int
main ()
{
char a, b;
struct {char a, b;} s;

printf ("a %p\n", &a);
printf ("b %p\n", &b);
printf ("s.a %p\n", &s.a);
printf ("s.b %p\n", &s.b);

return 0;
}

Now contemplate its output on my sparc box:

a effffc4f
b effffc4e
s.a effffc40
s.b effffc41

> > And what does the standard says about stand-alone chars vs. chars as members of
> > a struct? I've been trying to find it out in vain for quite some time now.
>
> The standard says that a struct is considered a single object for
> purposes of memory visibility.

What clause is it, please?

> DS

Dima

Kaz Kylheku

unread,

Jul 17, 2001, 3:10:56 PM7/17/01

to

In article <3B543B11...@dvv.org>, Dima Volodin wrote:
>When you define a struct with two chars, you also define two chars - being a
>member of a struct doesn't make an object less of an object.

Yes it does, because the members of the struct must be allocated in a single
larger object, which consits of a contiguous sequence of individually
addressable bytes (as far as the C program can tell).

There is a C language level difference, because the language allows the
ptrdiff_t pointer displacement between two members of the same aggregate
to be computed, and poiners to members of the same aggregate can be
compared using the relational operators:

int x;
int y;

int a[2];

if (&a[0] < &a[1]) { /* Correct */
/* Will execute this */
}

if (&x < &y) /* Undefined behavior */
{
}

This allows objects which are created by separate declarations, or
separate requests to the storage allocator, to be placed into distinct
units of memory, such as memory segments on a segmented architecture.

This is the area of the language where it would be easy to introduce
a POSIX requirement that distinct primary objects (my term) do not
share memory granules, and so may be concurrently accessed without
interference. Since there is already some kind of difference,
introducing another one isn't a big deal.

Norman Black

unread,

Jul 17, 2001, 4:28:59 PM7/17/01

to

> > The SPARC, MIPS, PowerPC/POWER, IA-64 architectures all support 8 and
> > 16-bit writes. This is NOT true for the Alpha however.
>
> That hasn't been true since the EV56, which is now pretty old. However,
the

The only book I have is the original Alpha architecture manual. We never
ported our compilers to the processor. I still remember quotes when the
processor came out that the Alpha architecture supported 32-bit load/store
because of "good business reasons". In other words the Alpha architects did
not want any shifters (proper term?) in the memory access path. They really
tried to strip things down to the bone, too much I think with regards to
supported data types. Problem was, IMO, that most existing code used a
multitude of data sizes and not just the "native" size. A lot of code used
smaller data types to save memory/disk space. The processor did not support
those smaller types efficiently. Yes I know the Alpha had some instructions
to allow writing kick ass string functions, but that is good for support
routines and not general purpose code.

> Still, more importantly, the C and C++ languages don't guarantee what
> operations will be used to read or write your data, even when the hardware
> happens to support an operation that would do the job atomically. So
> whether the hardware can do it is really irrelevant unless you're writing
> in assembler.

As a general statement that is true. I think it is too easy to say that you
should not expect anything to be efficient. I would have to strongly
disagree with the last sentence. Language definitions exist is a
hypothetical world, whereas we work in the real world. There are certain
things you can expect to be efficient in hardware, and therefore demand be
efficient in a compiler for said hardware. The original Alpha did break an
expectation (efficient access to 8, 16-bit data), but it stood alone in this
regard.

--
Norman Black
Stony Brook Software
the reply, fubar => ix.netcom

"David Butenhof" <David.B...@compaq.com> wrote in message
news:qXU47.813$rc5....@news.cpqcorp.net...

Kaz Kylheku

unread,

Jul 17, 2001, 4:38:59 PM7/17/01

to

In article <9j26vq$i0s$1...@slb6.atl.mindspring.net>, Norman Black wrote:
>> > The SPARC, MIPS, PowerPC/POWER, IA-64 architectures all support 8 and
>> > 16-bit writes. This is NOT true for the Alpha however.
>>
>> That hasn't been true since the EV56, which is now pretty old. However,
>the
>
>The only book I have is the original Alpha architecture manual. We never
>ported our compilers to the processor. I still remember quotes when the
>processor came out that the Alpha architecture supported 32-bit load/store
>because of "good business reasons". In other words the Alpha architects did
>not want any shifters (proper term?) in the memory access path.

The proper term is ``alignment network'' or something like that. :)

Dima Volodin

unread,

Jul 17, 2001, 5:15:45 PM7/17/01

to

[I cross-posted it to compt.std.c in attempt to make sure that I don't miss
anything as far as C is concerned]

Kaz Kylheku wrote:

> In article <3B543B11...@dvv.org>, Dima Volodin wrote:
> >When you define a struct with two chars, you also define two chars - being a
> >member of a struct doesn't make an object less of an object.
>
> Yes it does, because the members of the struct must be allocated in a single
> larger object,

Let me rephrase it - the way memory is allocated for an object does not make
this object less of an object.

> which consits of a contiguous sequence of individually
> addressable bytes (as far as the C program can tell).

Contiguous? All the language requires is that addresses that "Each non-bit-field
member of a structure or union object is aligned in an implementation-defined
manner appropriate to its type" and "Within a structure object, the
non-bit-field members [...] have addresses that increase in the order in which
they are declared", but it doesn't require them to be contiguous (in fact, it
doesn't even define what "increase" means for objects that are not array
members), so I don't see any problems for a hypothetical POSIX standard to
require that named objects - be it stand-alone objects or struct or union
members - were placed in separate memory granules. Also, MHO is that
additionally or alternatively, a standard should introduce something like
pthread_memorygranule_t - a type that shall be used in unions to guarantee an
object's allocation in a separate memory granule (the same way unions shall be
used to guarantee a particular alignment for, e.g., a char array). And, of
course, it must be spelled out that no two memory areas allocated by malloc()
shall have common memory granules.

> There is a C language level difference, because the language allows the
> ptrdiff_t pointer displacement between two members of the same aggregate
> to be computed, and poiners to members of the same aggregate can be
> compared using the relational operators:
>
> int x;
> int y;
>
> int a[2];
>
> if (&a[0] < &a[1]) { /* Correct */
> /* Will execute this */
> }
>
> if (&x < &y) /* Undefined behavior */
> {
> }
>
> This allows objects which are created by separate declarations, or
> separate requests to the storage allocator, to be placed into distinct
> units of memory, such as memory segments on a segmented architecture.

The language doesn't know anything about "memory granules", so there's nothing
that would prevent an implementation from placing x and y into the same memory
granule. AFAIK, the cuurent POSIX standard doesn't require any kind of placement
either (beside the one that is dictated by C, of course) and it is, IMHO, a
defect.

> This is the area of the language where it would be easy to introduce
> a POSIX requirement that distinct primary objects (my term) do not
> share memory granules, and so may be concurrently accessed without
> interference. Since there is already some kind of difference,
> introducing another one isn't a big deal.

See my proposals above.

Dima

Dima Volodin

unread,

Jul 17, 2001, 5:19:25 PM7/17/01

to

[I cross-posted it to compt.std.c in attempt to make sure that I don't
miss
anything as far as C is concerned]

Kaz Kylheku wrote:

> In article <3B543B11...@dvv.org>, Dima Volodin wrote:
> >When you define a struct with two chars, you also define two chars - being a
> >member of a struct doesn't make an object less of an object.
>
> Yes it does, because the members of the struct must be allocated in a single
> larger object,

Let me rephrase it - the way memory is allocated for an object does not
make
this object less of an object.

> which consits of a contiguous sequence of individually
> addressable bytes (as far as the C program can tell).

Contiguous? All the language requires is that "Each non-bit-field

member of a structure or union object is aligned in an
implementation-defined
manner appropriate to its type" and "Within a structure object, the
non-bit-field members [...] have addresses that increase in the order in
which
they are declared", but it doesn't require them to be contiguous (in
fact, it

doesn't even define what "increase" means for addresses of objects that

are not array
members), so I don't see any problems for a hypothetical POSIX standard
to
require that named objects - be it stand-alone objects or struct or
union
members - were placed in separate memory granules. Also, MHO is that
additionally or alternatively, a standard should introduce something
like
pthread_memorygranule_t - a type that shall be used in unions to
guarantee an
object's allocation in a separate memory granule (the same way unions
shall be
used to guarantee a particular alignment for, e.g., a char array). And,
of
course, it must be spelled out that no two memory areas allocated by
malloc()
shall have common memory granules.

> There is a C language level difference, because the language allows the

> ptrdiff_t pointer displacement between two members of the same aggregate
> to be computed, and poiners to members of the same aggregate can be
> compared using the relational operators:
>
> int x;
> int y;
>
> int a[2];
>
> if (&a[0] < &a[1]) { /* Correct */
> /* Will execute this */
> }
>
> if (&x < &y) /* Undefined behavior */
> {
> }
>
> This allows objects which are created by separate declarations, or
> separate requests to the storage allocator, to be placed into distinct
> units of memory, such as memory segments on a segmented architecture.

The language doesn't know anything about "memory granules", so there's

nothing
that would prevent an implementation from placing x and y into the same
memory
granule. AFAIK, the cuurent POSIX standard doesn't require any kind of
placement
either (beside the one that is dictated by C, of course) and it is,
IMHO, a
defect.

> This is the area of the language where it would be easy to introduce

> a POSIX requirement that distinct primary objects (my term) do not
> share memory granules, and so may be concurrently accessed without
> interference. Since there is already some kind of difference,
> introducing another one isn't a big deal.

See my proposals above.

Dima

Dima Volodin

unread,

Jul 17, 2001, 5:25:32 PM7/17/01

to

[I cross-posted it to compt.std.c in attempt to make sure that I don't miss
anything as far as C is concerned]

Kaz Kylheku wrote:

> In article <3B543B11...@dvv.org>, Dima Volodin wrote:
> >When you define a struct with two chars, you also define two chars - being a
> >member of a struct doesn't make an object less of an object.
>
> Yes it does, because the members of the struct must be allocated in a single
> larger object,

Let me rephrase it - the way memory is allocated for an object does not make
this object less of an object.

> which consits of a contiguous sequence of individually
> addressable bytes (as far as the C program can tell).

Contiguous? All the language requires is that "Each non-bit-field member of a

structure or union object is aligned in an implementation-defined manner
appropriate to its type" and "Within a structure object, the non-bit-field
members [...] have addresses that increase in the order in which they are
declared", but it doesn't require them to be contiguous (in fact, it doesn't
even define what "increase" means for addresses of objects that are not array
members), so I don't see any problems for a hypothetical POSIX standard to
require that named objects - be it stand-alone objects or struct or union
members - were placed in separate memory granules. Also, MHO is that
additionally or alternatively, a standard should introduce something like
pthread_memorygranule_t - a type that shall be used in unions to guarantee an
object's allocation in a separate memory granule (the same way unions shall be
used to guarantee a particular alignment for, e.g., a char array). And, of
course, it must be spelled out that no two memory areas allocated by malloc()
shall have common memory granules.

> There is a C language level difference, because the language allows the

> ptrdiff_t pointer displacement between two members of the same aggregate
> to be computed, and poiners to members of the same aggregate can be
> compared using the relational operators:
>
> int x;
> int y;
>
> int a[2];
>
> if (&a[0] < &a[1]) { /* Correct */
> /* Will execute this */
> }
>
> if (&x < &y) /* Undefined behavior */
> {
> }
>
> This allows objects which are created by separate declarations, or
> separate requests to the storage allocator, to be placed into distinct
> units of memory, such as memory segments on a segmented architecture.

The language doesn't know anything about "memory granules", so there's nothing

that would prevent an implementation from placing x and y into the same memory
granule. AFAIK, the cuurent POSIX standard doesn't require any kind of placement
either (beside the one that is dictated by C, of course) and it is, IMHO, a
defect.

> This is the area of the language where it would be easy to introduce

> a POSIX requirement that distinct primary objects (my term) do not
> share memory granules, and so may be concurrently accessed without
> interference. Since there is already some kind of difference,
> introducing another one isn't a big deal.

See my proposals above.

Dima

Dima Volodin

unread,

Jul 17, 2001, 5:50:03 PM7/17/01

to

[I cross-posted it to compt.std.c in attempt to make sure that I don't miss
anything as far as C is concerned]

Kaz Kylheku wrote:

> In article <3B543B11...@dvv.org>, Dima Volodin wrote:
> >When you define a struct with two chars, you also define two chars - being a
> >member of a struct doesn't make an object less of an object.
>
> Yes it does, because the members of the struct must be allocated in a single
> larger object,

Let me rephrase it - the way memory is allocated for an object does not make
this object less of an object.

> which consits of a contiguous sequence of individually
> addressable bytes (as far as the C program can tell).

Contiguous? All the language requires is that "Each non-bit-field member of a

structure or union object is aligned in an implementation-defined manner
appropriate to its type" and "Within a structure object, the non-bit-field
members [...] have addresses that increase in the order in which they are
declared", but it doesn't require them to be contiguous (in fact, it doesn't
even define what "increase" means for addresses of objects that are not array

members), so I don't see any problems in a hypothetical POSIX standard's
requiring that named objects - be it stand-alone objects or struct or union

members - were placed in separate memory granules. Also, MHO is that
additionally or alternatively, a standard should introduce something like
pthread_memorygranule_t - a type that shall be used in unions to guarantee an
object's allocation in a separate memory granule (the same way unions shall be
used to guarantee a particular alignment for, e.g., a char array). And, of
course, it must be spelled out that no two memory areas allocated by malloc()
shall have common memory granules.

> There is a C language level difference, because the language allows the

> ptrdiff_t pointer displacement between two members of the same aggregate
> to be computed, and poiners to members of the same aggregate can be
> compared using the relational operators:
>
> int x;
> int y;
>
> int a[2];
>
> if (&a[0] < &a[1]) { /* Correct */
> /* Will execute this */
> }
>
> if (&x < &y) /* Undefined behavior */
> {
> }
>
> This allows objects which are created by separate declarations, or
> separate requests to the storage allocator, to be placed into distinct
> units of memory, such as memory segments on a segmented architecture.

The language doesn't know anything about "memory granules", so there's nothing

that would prevent an implementation from placing x and y into the same memory
granule. AFAIK, the cuurent POSIX standard doesn't require any kind of placement
either (beside the one that is dictated by C, of course) and it is, IMHO, a
defect.

> This is the area of the language where it would be easy to introduce

> a POSIX requirement that distinct primary objects (my term) do not
> share memory granules, and so may be concurrently accessed without
> interference. Since there is already some kind of difference,
> introducing another one isn't a big deal.

See my proposals above.

Dima

James Kuyper Jr.

unread,

Jul 17, 2001, 8:14:40 PM7/17/01

to

Dima Volodin wrote:
>
> [I cross-posted it to compt.std.c in attempt to make sure that I don't
> miss
> anything as far as C is concerned]
>
> Kaz Kylheku wrote:

...

> > which consits of a contiguous sequence of individually
> > addressable bytes (as far as the C program can tell).
>
> Contiguous? All the language requires is that "Each non-bit-field
> member of a structure or union object is aligned in an
> implementation-defined
> manner appropriate to its type" and "Within a structure object, the
> non-bit-field members [...] have addresses that increase in the order in
> which
> they are declared", but it doesn't require them to be contiguous (in
> fact, it

6.2.5p20: "A structure type describes a sequentially allocated nonempty
set of member objects"

"Sequentially allocated" would in itself seem to be sufficient to
establish this point, but there's also:

6.2.6.1p2: "... objects are composed of contiguous sequences of one or
more bytes ..."

Structures are composed of objects, but they are also objects in their
own right, and hence must occupy contiguous bytes. I can't find any
place where that's directly stated in the standard, but it is indirectly
implied in literally dozens of places. The clearest statement I've been
able to find in support of that concept is the following:

6.7.2.1p12: "Each non-bit-field member of a structure or union object
..."

Note the reference to a "structure ... object". The phrase "structure
object" appears in a half dozen other places as well, and I see no way
to interpret it at any of those places as meaning anything other than
"an object of structure type".

Dima Volodin

unread,

Jul 18, 2001, 8:52:24 AM7/18/01

to

"James Kuyper Jr." wrote:

>
> Dima Volodin wrote:
> > Contiguous? All the language requires is that "Each non-bit-field
> > member of a structure or union object is aligned in an
> > implementation-defined
> > manner appropriate to its type" and "Within a structure object, the
> > non-bit-field members [...] have addresses that increase in the order in
> > which
> > they are declared", but it doesn't require them to be contiguous (in
> > fact, it
>
> 6.2.5p20: "A structure type describes a sequentially allocated nonempty
> set of member objects"
>
> "Sequentially allocated" would in itself seem to be sufficient to
> establish this point, but there's also:

Of course, a struct object shouldn't scattered all across the memory. What I'm
talking about is padding between struct members, and the language readily
allows for that. As I have already qouted, "Each non-bit-field member of a

structure or union object is aligned in an implementation-defined manner

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
appropriate to its type".

Dima

James Kuyper

unread,

Jul 18, 2001, 12:03:27 PM7/18/01

to

Then I'm confused. I traced your discussion back before sending that
message, and came away with the impression that you were arguing for
different members of an array to be stored in different blocks of
memory.

Alexander Terekhov

unread,

Jul 18, 2001, 2:58:42 PM7/18/01

to

James Kuyper wrote:

[...]

> Then I'm confused. I traced your discussion back before sending that
> message, and came away with the impression that you were arguing for
> different members of an array to be stored in different blocks of
> memory.

it seems that there is no portable way to fight word tearing race
condition.. how about yet another 'granularizer' ;-) qualifier:

/* distinct */ char byte1; // should be word
tearing safe
/* distinct */ char byte2; // should be word
tearing safe
distinct char byteArr[] = { 'a','b' }; // should be word
tearing safe
distinct char* bytePtr = byteArr; // should be word
tearing safe
struct { distinct char a,b; } ab = { 'a','b' }; // should be word
tearing safe
char _byteArr[] = { 'a','b' }; // could be word tearing
unsafe
char* _bytePtr = byteArr; // could be word tearing
unsafe
bytePtr = _byteArr; // COMPILE ERROR!!
_bytePtr = byteArr; // COMPILE ERROR!!
bytePtr = _bytePtr; // COMPILE ERROR!!
_bytePtr = bytePtr; // COMPILE ERROR!!
// sizeof( byteArr ) >= sizeof( _byteArr ) // extra space could be
added!

btw, that is actually an 'existing practice' already.
well, sort of..

Compaq uses 'volatile' qualifier to ensure word tearing
safe programming (basically switching over to single
byte granularity which could require software emulation
on older processors):

http://tru64unix.compaq.com/faqs/publications/base_doc/DOCUMENTATION/V51_HTML/ARH9RBTE/DOCU0007.HTM#gran_sec
http://tru64unix.compaq.com/faqs/publications/base_doc/DOCUMENTATION/V51_HTML/ARH9RBTE/DOCU0008.HTM

"(On OpenVMS Alpha or OpenVMS VAX) Compile all application
modules for byte actual granularity. Doing so automatically
prevents word-tearing race conditions for structure or union
members and array elements of size byte or larger that are
accessed concurrently by different threads. No other program
modification is required. This may have a performance penalty
on Alpha EV4 and EV5 processors.
Or,
(On Tru64 UNIX systems) For arrays, add the C language
volatile storage qualifier to the definition of the entire
array; for structures, add volatile to the declaration of
only those members that share the pertinent memory granule.
You must also compile the application's modules using the
Compaq C or Compaq C++ compiler's -strong-volatile switch.
Doing so causes the compiler to produce code that forces
all accesses to those members to occur as atomic operations.
See the description of the -strong-volatile switch in the
Compaq C or Compaq C++ documentation and on the cc reference
page. This may also have a severe performance penalty. "

next step... :) 'very distinct' for fighting cache trashing :) :)

regards,
alexander.

Kaz Kylheku

unread,

Jul 18, 2001, 3:17:11 PM7/18/01

to

In article <3B55DC62...@web.de>, Alexander Terekhov wrote:
>James Kuyper wrote:
>
>[...]
>> Then I'm confused. I traced your discussion back before sending that
>> message, and came away with the impression that you were arguing for
>> different members of an array to be stored in different blocks of
>> memory.
>
>it seems that there is no portable way to fight word tearing race
>condition.. how about yet another 'granularizer' ;-) qualifier:
>
>/* distinct */ char byte1; // should be word
>tearing safe
>/* distinct */ char byte2; // should be word
>tearing safe

The problem is getting all the compiler vendors to put this in.

And some compilers, like GCC, already have features that can do the job,
although not through special type specifiers. Their developers would
rightfully complain.

In GCC, you can enforce alignment like this:

char byte1 __attribute__ ((aligned (32)));
char byte2 __attribute__ ((aligned (32)));

So now byte1 is placed at the start of a 32 byte block, and
byte2 is placed at the start of the next one. So if the granule
size is 32, everything is cool.

All you need is some macro which adds this to your declaration

#define GRANULARIZE(X) X __attribute__ ((aligned (GRANULE_SIZE)));

so you can write your declaration:

GRANULARIZE(char byte1);

This macro can be implemented using the GCC mechanism, or the type specifier
mechanism.

Alexander Terekhov

unread,

Jul 19, 2001, 5:56:12 AM7/19/01

to

Kaz Kylheku wrote:

[...]

> In GCC, you can enforce alignment like this:
>
> char byte1 __attribute__ ((aligned (32)));
> char byte2 __attribute__ ((aligned (32)));
>
> So now byte1 is placed at the start of a 32 byte block, and
> byte2 is placed at the start of the next one. So if the granule
> size is 32, everything is cool.
>
> All you need is some macro which adds this to your declaration
>
> #define GRANULARIZE(X) X __attribute__ ((aligned (GRANULE_SIZE)));

hmm.. your macro controls alignment, fine.. but how about padding?

GRANULARIZE(char byte1); // shared scope 1; OK
GRANULARIZE(char byte2); // shared scope 2; OK??
.
.
.
char byte3;

could easily 'break' byte2 (and byte3) !

regards,
alexander.

Kaz Kylheku

unread,

Jul 19, 2001, 12:03:29 PM7/19/01

to

In article <3B56AEBC...@web.de>, Alexander Terekhov wrote:
>Kaz Kylheku wrote:
>
>[...]
>> In GCC, you can enforce alignment like this:
>>
>> char byte1 __attribute__ ((aligned (32)));
>> char byte2 __attribute__ ((aligned (32)));
>>
>> So now byte1 is placed at the start of a 32 byte block, and
>> byte2 is placed at the start of the next one. So if the granule
>> size is 32, everything is cool.
>>
>> All you need is some macro which adds this to your declaration
>>
>> #define GRANULARIZE(X) X __attribute__ ((aligned (GRANULE_SIZE)));
>
>hmm.. your macro controls alignment, fine.. but how about padding?
>
>GRANULARIZE(char byte1); // shared scope 1; OK
>GRANULARIZE(char byte2); // shared scope 2; OK??

Since both bytes are aligned to a granule, they can't share any granules.

So you need either alignment or padding, but not necessarily both.

>.
>.
>char byte3;
>
>could easily 'break' byte2 (and byte3) !

That's right, but we could attribute the cause of that breakage to
that byte not being wrapped in GRANULARIZE(), rather than to the lack
of padding in the previous wrapped object.

Alexander Terekhov

unread,

Jul 20, 2001, 3:38:06 AM7/20/01

to

Kaz Kylheku wrote:

> In article <3B56AEBC...@web.de>, Alexander Terekhov wrote:
> >Kaz Kylheku wrote:
> >
> >[...]
> >> In GCC, you can enforce alignment like this:
> >>
> >> char byte1 __attribute__ ((aligned (32)));
> >> char byte2 __attribute__ ((aligned (32)));
> >>
> >> So now byte1 is placed at the start of a 32 byte block, and
> >> byte2 is placed at the start of the next one. So if the granule
> >> size is 32, everything is cool.
> >>
> >> All you need is some macro which adds this to your declaration
> >>
> >> #define GRANULARIZE(X) X __attribute__ ((aligned (GRANULE_SIZE)));
> >
> >hmm.. your macro controls alignment, fine.. but how about padding?
> >
> >GRANULARIZE(char byte1); // shared scope 1; OK
> >GRANULARIZE(char byte2); // shared scope 2; OK??
>
> Since both bytes are aligned to a granule, they can't share any granules.

right, they do not. with "shared scope" i meant the following:

b11,b12,b13// shared scope 1 = threads A & B (using some lock L1)
b21,b22,b23// shared scope 2 = threads C & D (using some lock L2)
b31,b32,b33// non-shared scope 3 = thread X
b41,b42,b43// non-shared scope 4 = thread Y
// "special" scope 5 = any thread & signal handler - covered by
// static volatile sig_atomic_t

clearly, with respect to memory access within each serial scope
(shared or non-shared) - bN?<-"in the middle", we do not have
any problems with respect to word tearing; we just need to isolate
data accessed within each serial scope from the data accessed in
all other serial scopes -- here (on the borders - bN1..?,?..bN?)
we potential have a problem of word tearing (correctness) and
a problem of cache trashing on multiprocessors (performance).
for sequentially allocated bXYs it is sufficient to align bX1 and
to pad bX3..

> So you need either alignment or padding, but not necessarily both.

hmm.. in order to isolate all our bytes from "default" non-GRANULARIZEd
data, first one need to be aligned, last one need to be padded,
second, third, ... could be aligned or padded -- result is the
same (but only if the order of actual allocation matches the
order of declaration -- there is no risk to have aligned_or_padded
mixed with non-aligned and non-padded "default" data). it is just
much more robust to align & pad, IMHO.

regards,
alexander.