Is this code correct ?

Dmitri Lenev

unread,

Jan 10, 2002, 10:39:51 AM1/10/02

to

Hi!

After I have read "Multi-Processor Concurrency Problem" and "Is this
code as thread safe as it can be?" threads i became curious.
The question that disturbs me is "What happens if two different
threads are accessing two different bytes which are situated say in
one 32bit word ?"
In other words - how correct is this code ?

#include <pthread.h>

char spinner[2];

void* work_thread(void *par)
{
int i = *((int *)par);

while(1)
++spinner[i];
}

int thrid[2] = {0,1};

int
main(int argc, char** argv)
{
pthread_t thr;
pthread_create(&thr,NULL,work_thread,thrid+0);
pthread_create(&thr,NULL,work_thread,thrid+1);
pthread_exit(NULL);
}

In my opinion it is quite reasonable...

Dmitri Lenev

Simon Turner

unread,

Jan 11, 2002, 7:04:36 AM1/11/02

to

Dmitri Lenev wrote:

> Hi!
>
> After I have read "Multi-Processor Concurrency Problem" and "Is
> this code as thread safe as it can be?" threads i became curious.
> The question that disturbs me is "What happens if two different
> threads are accessing two different bytes which are situated say in
> one 32bit word ?"

Depending on the platform (and possibly the compiler), I'd guess that
you'd stand a good chance of getting a memory conflict (as described
in Programming with POSIX Threads, by David Butenhof).

> In other words - how correct is this code ?

Since it doesn't really do anything, it can't really be wrong :-)

>
> #include <pthread.h>
>
> char spinner[2];
>
> void* work_thread(void *par)
> {
> int i = *((int *)par);
>
> while(1)
> ++spinner[i];
> }

If, for example,
thread 1 reads the whole 16-bit value of spinner into a register, and
increments the first 8 bits of it,
while thread 2 reads the whole 16-bit value of spinner into a
register, and increments the second 8 bits of it simultaneously,
then whichever thread happens to write the register back to memory
_last_ will overwrite the other's change.

>
> int thrid[2] = {0,1};
>
> int
> main(int argc, char** argv)
> {
> pthread_t thr;
> pthread_create(&thr,NULL,work_thread,thrid+0);
> pthread_create(&thr,NULL,work_thread,thrid+1);
> pthread_exit(NULL);
> }
>
> In my opinion it is quite reasonable...

... but whether or not it works will depend on the memory alignment
used, the bus width, etc., etc.

>
> Dmitri Lenev

--
Simon Turner
(doesn't speak for) RiverSoft

David Butenhof

unread,

Jan 11, 2002, 8:30:19 AM1/11/02

to

Dmitri Lenev wrote:

> After I have read "Multi-Processor Concurrency Problem" and "Is this
> code as thread safe as it can be?" threads i became curious.
> The question that disturbs me is "What happens if two different
> threads are accessing two different bytes which are situated say in
> one 32bit word ?"
> In other words - how correct is this code?

"Correct" by what definition? Here are a couple of possible answers:

1. It is correct ANSI C.

2. It's very bad FORTRAN. (Probably won't even compile. ;-) )

3. It's correct POSIX thread SYNTAX. (It'll compile with full prototype
checking, without errors. It'll link. The program will activate.)

4. It's illegal DCE thread syntax. (And UI threads, too.)

5. It won't work on any machine that doesn't do byte access, or with any
compiler that doesn't choose to generate byte instructions for your
operations.

In summary, it's not portable code. On many systems (e.g., on any early
generation Alpha), there may be no byte instructions. For such systems,
your "char spinner[2];" might as well be "short spinner;" and (presuming
little endian ordering, a characteristic as we all know of all rational
hardware ;-) ;-) ), "++spinner[0];" might as well be "short t1 = spinner;
char t2 = spinner & 0xff; spinner = (t1 & ~0xff) | (++t2);". Obviously if
one thread does this for spinner[0] while the other's doing it
simultaneously for spinner[1], they'll "disagree" on the proper results.

(And this is not as simple as "bad hardware". ANSI C does not require that
a compiler generate byte instructions, even when they're available, to
implement access to "char" variables. For many reasons, it might decide to
optimize your program using other instructions. This is particularly likely
if the vector was larger and each thread dealt with a SET of elements in
that vector rather than a single element.)

This is an easily misunderstood aspect of the POSIX memory model. POSIX
being a source standard, it's hard to talk about such stuff concisely or in
detail. (And the fact that the working group roundly rejected a rigorous
symbolic model in favor of a short English description doesn't help any.)

The model requires that "Applications shall ensure that access to any
memory location by more than one thread of control (threads or processes)
shall be restricted such that no thread of control can read or modify a
memory location while another thread of control may be modifying it."

That's all fine, and you might easily be excused for believing that it
means POSIX says writing to spinner[0] and spinner[1] simultaneously, and
without synchronization, is just fine. But that's NOT what it says; because
without byte instruction access, both variables may require writes to the
same set of memory locations. (Depending, of course, on the alignment of
spinner. Because bytes don't require special alignment on most machines,
the alignment is arbitrary. As a standalone declaration, it's likely to be
aligned to the hardware word size -- 4-byte aligned on a 32-bit machine or
8-byte aligned on a 64-bit machine. In a structure, though, I'd expect it
to start at the next byte after the previous field, without any automatic
alignment padding.)

If two adjacent variables may involve the same "memory location", then they
must be synchronized using a (and the same) mutex. That's easy in this
case; it just means treating "spinner" as a single variable. If you had
made two separate declarations (char spinner1, spinner2;) the relationship
would be less obvious, but equally important. That's just life, and you've
got to be aware of it. The problem goes deeper, too, if you care about the
PERFORMANCE of your two threads. Even if they've been separated (say, by
making them 'long'), they likely share a cache line. Two threads writing to
the same cache line (on separate processors) will cause ownership of that
cache line to "bounce" back and forth between them, with a lot of expensive
communications traffic and cache updates. Putting each counter in a
separate cache line could make the threads run a hundred times faster on
some architectures. If the loop is critical to your application
performance, that can be pretty important.

Important rule of threaded programming: Memory is your enemy. You can't
avoid it entirely, but the less you use it the better you'll perform. We
used to estimate performance by counting instructions -- but with modern
architectures that's completely irrelevant. You count memory accesses, each
of which counts for tens to hundreds of processor cycles. In this
particular case, if spinner[0] is really private to one thread, and
spinner[1] is really private to the other, why the heck are you
incrementing the counters in memory anyway? Making the counter a local
variable in each thread would allow the compiler to keep everything inside
the processor... and that's even better than avoiding cache thrashing.

> #include <pthread.h>
>
> char spinner[2];
>
> void* work_thread(void *par)
> {
> int i = *((int *)par);
>
> while(1)
> ++spinner[i];
> }
>
> int thrid[2] = {0,1};
>
> int
> main(int argc, char** argv)
> {
> pthread_t thr;
> pthread_create(&thr,NULL,work_thread,thrid+0);
> pthread_create(&thr,NULL,work_thread,thrid+1);

OK, I just gotta say it. While this is perfectly legal C, &thrid[0] and
&thrid[1] would be a heck of a lot easier to read than "thrid+0" and
"thrid+1".

> pthread_exit(NULL);
> }
>
> In my opinion it is quite reasonable...

Sure, as long as you're willing to bet your life on always having a machine
with byte access instructions and a compiler that will always choose to use
them in this situation. Or, of course, if you don't care about the results.
And, as someone else pointed out, in this program it doesn't matter a bit
since there's absolutely no way to determine what happens unless you step
through with a debugger. If a bit falls in the middle of a thread and
there's nobody there to examine it...

/------------------[ David.B...@compaq.com ]------------------\
| Compaq Computer Corporation POSIX Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/

Daniel Hams

unread,

Jan 11, 2002, 8:56:20 AM1/11/02

to

> 2. It's very bad FORTRAN. (Probably won't even compile. ;-) )
>

Spits coffee,

Chris Smith

unread,

Jan 11, 2002, 10:18:24 AM1/11/02

to

David Butenhof wrote ...

> That's all fine, and you might easily be excused for believing that it
> means POSIX says writing to spinner[0] and spinner[1] simultaneously, and
> without synchronization, is just fine. But that's NOT what it says; because
> without byte instruction access, both variables may require writes to the
> same set of memory locations. (Depending, of course, on the alignment of
> spinner. Because bytes don't require special alignment on most machines,
> the alignment is arbitrary. As a standalone declaration, it's likely to be
> aligned to the hardware word size -- 4-byte aligned on a 32-bit machine or
> 8-byte aligned on a 64-bit machine. In a structure, though, I'd expect it
> to start at the next byte after the previous field, without any automatic
> alignment padding.)

Okay, I'm going to prove my ignorance for a second... please excuse me.

Is there ever any *real* guarantee in POSIX that any two memory locations
of global variables are far enough apart to be safely modified at the
same time without locking? As I understand it, a C/C++ compiler would be
free to reorder the locations of global variables in memory however it
likes, even if it were to move two variables, perhaps in different files,
into the same machine word.

Granted, it would take a braindead compiler to do such a thing... but am
I wrong that it's legal?

My concern is that, with a structure or array, it's simple to say "lock
before accessing any of it", but if we're talking about, say, statically
scoped variables in different source files, that is a lot more difficult
to do.

Chris Smith

Alexander Terekhov

unread,

Jan 11, 2002, 11:01:59 AM1/11/02

to

Dmitri Lenev wrote:
[...]

> #include <pthread.h>
>
> char spinner[2];

do a search on "word tearing" and "false sharing".
the first time it may be even funny.

regards,
alexander.

David Schwartz

unread,

Jan 11, 2002, 4:45:10 PM1/11/02

to

Chris Smith wrote:

> Is there ever any *real* guarantee in POSIX that any two memory locations
> of global variables are far enough apart to be safely modified at the
> same time without locking? As I understand it, a C/C++ compiler would be
> free to reorder the locations of global variables in memory however it
> likes, even if it were to move two variables, perhaps in different files,
> into the same machine word.

The answer to your question is, yes and no.

Yes, in the sense that POSIX guarantees freedom from problems like word
tearing between distinct objects. So, for example, if you have:

int i, j;

You are assured that 'i' and 'j' will not interfere with each other.
However, if you do:

struct foo
{
int i;
int j;
};

You are *not* guaranteed that 'i' and 'j' will not interfere with each
other. If you substitute 'char' for 'int', you'll see why this is
obviously so. Fortunately, this guarantee is easy to meet on most
platforms because their padding and packing rules already do this.

The answer is no in the sense that you still must strictly follow all
locking rules when accessing either of the two objects. In other words,
you only get this guarantee if you strictly follow the locking
requirements in the POSIX standard. In other words, even in the 'int i,
j;' case, failure to properly lock accesses to 'i' could, in principle,
corrupt 'j'.

DS

Chris Smith

unread,

Jan 11, 2002, 8:57:44 PM1/11/02

to

Perhaps it's just my brain being cloudy, but I'm going to ask again to
verify that I understood you correctly...

David Schwartz wrote ...

> The answer to your question is, yes and no.
>
> Yes, in the sense that POSIX guarantees freedom from problems like word
> tearing between distinct objects. So, for example, if you have:
>
> int i, j;
>
> You are assured that 'i' and 'j' will not interfere with each other.

Okay. This is true for all types? That is, if "int" were "byte", the
same guarantees would be made?

> However, if you do:
>
> struct foo
> {
> int i;
> int j;
> };

This, I realize is a problem; but it's one that it would take dumb code
to run into.

> The answer is no in the sense that you still must strictly follow all
> locking rules when accessing either of the two objects. In other words,
> you only get this guarantee if you strictly follow the locking
> requirements in the POSIX standard. In other words, even in the 'int i,
> j;' case, failure to properly lock accesses to 'i' could, in principle,
> corrupt 'j'.

Okay, but if 'i' and 'j' were declared at global or static scope, and I
did properly synchronize access to each with a mutex, then I *could* use
different mutexes for each one, correct?

Chris Smith

David Schwartz

unread,

Jan 11, 2002, 9:48:07 PM1/11/02

to

Chris Smith wrote:

> > Yes, in the sense that POSIX guarantees freedom from problems like word
> > tearing between distinct objects. So, for example, if you have:
> >
> > int i, j;
> >
> > You are assured that 'i' and 'j' will not interfere with each other.
>
> Okay. This is true for all types? That is, if "int" were "byte", the
> same guarantees would be made?

I thought so, but it doesn't seem to be so on some platforms I checked.
Maybe that rule is just for structures? Anyone know?

> > However, if you do:
> >
> > struct foo
> > {
> > int i;
> > int j;
> > };
>
> This, I realize is a problem; but it's one that it would take dumb code
> to run into.

Not really, the difference is subtle.

> Okay, but if 'i' and 'j' were declared at global or static scope, and I
> did properly synchronize access to each with a mutex, then I *could* use
> different mutexes for each one, correct?

Correct. And you wouldn't have to use a mutex for one of them (or
either of them) if you met the POSIX memory visibility rules another
way. For example, if you set 'i' to a particular value before creating
any threads that accessed it and only changed the value after those
threads terminated, no locking around 'i' would be needed at all.

DS

Arnold Meijster

unread,

Jan 12, 2002, 5:26:42 AM1/12/02

to

Hi, a variation on this theme:

int a[10000];

We want to compute the sum of all elements.

I would like to use two threads doing Sum(0,5000) and Sum (5000, 10000) and
sum the results,
where

int Sum (int lwb, int upb)
{
int i, s=0;
for (i=lwb; i<upb; i++)
s+= a[i];
return s;
}

According to your reply this should be

int Sum (int lwb, int upb)
{
int i, s=0;
for (i=lwb; i<upb; i++)
{
Lock(m);
s+= a[i];
Unlock(m);
}
return s;
}

This seems hardly reasonable to me. I do a lot of image processing
and have many single point operators, like thresholding, where you
don't want to lock each pixel. how to do this?
At this point I simply ignore the tearing effect, by choosing ints for my
pixels,
and on a linux PC you're fine (actually also on the suns, and SGI's I'm
working),
although I realize it not standard.

Cheers,

Arnold

Dmitri Lenev

unread,

Jan 12, 2002, 8:41:27 AM1/12/02

to

Thanks, David!

But now I have the whole bunch of questions...

>
> "Correct" by what definition? Here are a couple of possible answers:
>

[skip]

Generally I thought of portability and correctness in sense of POSIX
semantics...

[skip]

> In summary, it's not portable code. On many systems (e.g., on any early
> generation Alpha), there may be no byte instructions. For such systems,
> your "char spinner[2];" might as well be "short spinner;" and (presuming
> little endian ordering, a characteristic as we all know of all rational
> hardware ;-) ;-) ), "++spinner[0];" might as well be "short t1 = spinner;
> char t2 = spinner & 0xff; spinner = (t1 & ~0xff) | (++t2);". Obviously if
> one thread does this for spinner[0] while the other's doing it
> simultaneously for spinner[1], they'll "disagree" on the proper results.

[skip]

Now I have understood the problem with my code ( By the way it was
written only as illustration for my question)..

[skip]

> The model requires that "Applications shall ensure that access to any
> memory location by more than one thread of control (threads or processes)
> shall be restricted such that no thread of control can read or modify a
> memory location while another thread of control may be modifying it."

[skip]

>
> If two adjacent variables may involve the same "memory location", then they
> must be synchronized using a (and the same) mutex. That's easy in this
> case; it just means treating "spinner" as a single variable. If you had
> made two separate declarations (char spinner1, spinner2;) the relationship
> would be less obvious, but equally important. That's just life, and you've
> got to be aware of it.

So POSIX says that we must protect accesses to the "same memory
location", but leaves definition of this term to compiler and system
architecture?

But then what constructs could be portably treated as different memory
locations ? In other words how then could we write portable programs ?

There are some obvious cases. For example two chunks of memory
allocated through different malloc() calls probably are different
locations...

But how about next construction which was already mentioned in this
thread:

X i;Y j;

Where both i and j are protected by own mutexes and we have two
threads which are modifying both variables from time to time (but not
always both in one time). For what X and Y this construction is safe
enough ?
(I guess for long, may be for int but what about structures ?)

What about X arr[N]; construction ? When could be array members
treated as various memory locations ?

How it all depends on context (is it true only for global objects, or
for structure members too)?

> The problem goes deeper, too, if you care about the
> PERFORMANCE of your two threads. Even if they've been separated (say, by
> making them 'long'), they likely share a cache line. Two threads writing to
> the same cache line (on separate processors) will cause ownership of that
> cache line to "bounce" back and forth between them, with a lot of expensive
> communications traffic and cache updates. Putting each counter in a
> separate cache line could make the threads run a hundred times faster on
> some architectures. If the loop is critical to your application
> performance, that can be pretty important.

As I've mentioned before the whole question was of theoretical sort so
it has no connection to my real-life programs. But obviously we can
produce some
pretty reasonable piece of code that wouldn't suffer from this
drawback
(say accesses to spinner would be rare) but still It will demonstrate
the same problem. Anyway, it always goog to recollect about such
things, so thank you again...

> Important rule of threaded programming: Memory is your enemy. You can't
> avoid it entirely, but the less you use it the better you'll perform. We
> used to estimate performance by counting instructions -- but with modern
> architectures that's completely irrelevant. You count memory accesses, each
> of which counts for tens to hundreds of processor cycles. In this
> particular case, if spinner[0] is really private to one thread, and
> spinner[1] is really private to the other, why the heck are you
> incrementing the counters in memory anyway? Making the counter a local
> variable in each thread would allow the compiler to keep everything inside
> the processor... and that's even better than avoiding cache thrashing.

Again I should underline that this code was intended solely for
demonstration. Of course your suggestion will suit in many cases, but
in my opinion not for all.

Dmitri Lenev

Alexander Terekhov

unread,

Jan 14, 2002, 2:41:52 AM1/14/02

to

Dmitri Lenev wrote:
>
> Thanks, David!
>
> But now I have the whole bunch of questions...

:) :) :)

[...]

> So POSIX says that we must protect accesses to the "same memory
> location", but leaves definition of this term to compiler and system
> architecture?

Yeah... you 'just' have to check your PTHREAD-impl
docu and maybe even add some non-portable stuff to
your sources.

> But then what constructs could be portably treated as different memory
> locations ? In other words how then could we write portable programs ?

Hey, Dmitri, I've told you that this might be even
funny... Personally, I think that having just
PORTABILITY (things defined in the standard) in
mind, this whole topic could be characterized along
the lines of the remark the author of the article
you have replied to once wrote about recursive
trylock()ing (on recursive mutex):

"In a way, the defined semantics are as undefined
as the undefined semantics. That always makes for
a fun game."

For example, you might want to have a look at
"distinct"/"#define GRANULARIZE(X)" sub-thread:

http://groups.google.com/groups?as_umsgid=3B55DC62.2CCE3B26%40web.de

The other extreme (IMHO) is the JVM's:

http://www.cs.umd.edu/~pugh/java/memoryModel/semantics.pdf

"The fact that two variables may be stored in ad-
jacent bytes (e.g., in a byte array) is immaterial.
Two variables can be simultaneously updated by
different threads without needing to use synchro-
nization to account for the fact that they are
'adjacent'. Any word-tearing must be invisible
to the programmer."

And even they (VM folks) do NOT solve the problem
of "false sharing", AFAIK.

regards,
alexander.