Changing single bytes without a mutex

Felix Morley Finch

unread,

Feb 23, 1998, 3:00:00 AM2/23/98

to

I've read the FAQ and the online book (which I've ordered). I've got
10-15 years of various OS work, real time diagnostics, and related
programming, assembler and C, to understand why changing shared
multibyte data is a real no-no. But I don't understand why this
applies to single byte data. I have just started learning about
threads, so I am not even dangerous yet - give me the chance :-)

My current need for a thread is a subroutine which ges away for a
second or two to calculate a 16 bit number (CRC a bunch of files). It
seems like a perfect opportunity for a thread - long calculation, no
interaction with the main task. One problem is that the main task may
be given a new seed and told to do it again while still waiting for
the current CRC calculation to finish. It is supposed to discard the
current operation and start the new one.

Is it safe to simply set a single byte flag, which the CRC thread
peridically polls, and exits if set? Is a mutex really necessary
around this single byte write? It's not the time or code I'm worried
about, it's understanding how this particular could go wrong.

A bit more complex variation: Supposing this single byte operation is
safe. Can I use the single byte flag to indicate that the seed has
changed and the calculation should restart, rather than simply exit
prematurely? This CRC calculation uses the seed at the beginning
only. Suppose I write a new seed, then write the single byte flag -
is there any chance that various different hardware implementations
would write the data asynchrounously and not in the same order, and
this is why such operations need mutex protection?

--
... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
Felix Finch: scarecrow repairman & rocket surgeon / fe...@crowfix.com
PGP = 91 B3 94 7C E9 E8 76 2D E1 63 51 AA A0 48 89 2F
I've found a solution to Fermat's Last Theorem but I see I've run out of room o

ruddock

unread,

Feb 23, 1998, 3:00:00 AM2/23/98

to Felix Morley Finch

In article <slrn6f3gn1...@crowfix.com>,
fe...@crowfix.com (Felix Morley Finch) writes:
.....

|> My current need for a thread is a subroutine which ges away for a
|> second or two to calculate a 16 bit number (CRC a bunch of files). It
|> seems like a perfect opportunity for a thread - long calculation, no
|> interaction with the main task. One problem is that the main task may
|> be given a new seed and told to do it again while still waiting for
|> the current CRC calculation to finish. It is supposed to discard the
|> current operation and start the new one.
|>
|> Is it safe to simply set a single byte flag, which the CRC thread
|> peridically polls, and exits if set? Is a mutex really necessary
|> around this single byte write? It's not the time or code I'm worried
|> about, it's understanding how this particular could go wrong.

You should use the cancellation routines in POSIX threads (assuming that is
your platform). Using these routines one thread may signal to kill
another thread using pthread_cancel() and the other thread can use
pthread_testcancel() to kill itself when it has been canceled.
(or terminate at some other defined cancellation point.
RTM for a list of these routines.8^)
Of course you will need to enable cancellation on the worker thread.

David Ruddock --- 732.699.3597
druddock @ notes.cc.bellcore.com

Bil Lewis

unread,

Feb 23, 1998, 3:00:00 AM2/23/98

to

Felix,

Several parts to your question...

1: Writing a byte on most modern machines is non-atomic. It goes:
read word, mask out byte, add in new byte, write word.

> My current need for a thread is a subroutine which ges away for a
> second or two to calculate a 16 bit number (CRC a bunch of files). It
> seems like a perfect opportunity for a thread - long calculation, no
> interaction with the main task.

Perfect!

> Is it safe to simply set a single byte flag, which the CRC thread
> peridically polls, and exits if set? Is a mutex really necessary
> around this single byte write? It's not the time or code I'm worried
> about, it's understanding how this particular could go wrong.

Maybe... but... You set the flag to EXIT in CPU1. But CPU1 doesn't
write that word out to main memory right away. It could take many uSecs.
Is that important? (Probably not.)

The CRC threads sees the flag and exits. That's cool. But now you
want to set the flag back to DONT_EXIT and start again. How do you know
that the CRC threads have all exited? (If you set it back too soon...)

So you wait for the theads to exit. You're cool again. But now you
have to coordinate the setter thread and the waiter thread. What if you
want them to be the same?

> A bit more complex variation: Supposing this single byte operation is
> safe. Can I use the single byte flag to indicate that the seed has
> changed and the calculation should restart, rather than simply exit
> prematurely?

No. This time you've gone too far. (See DONT_EXIT above.)

> This CRC calculation uses the seed at the beginning
> only. Suppose I write a new seed, then write the single byte flag -
> is there any chance that various different hardware implementations
> would write the data asynchrounously and not in the same order, and
> this is why such operations need mutex protection?

Yup. As a matter of fact I think all the modern RISCs will do this.
Certainly SPARC, Alpha, and MIPS will reorder writes. For this part
you absolutely require a mutex.

The kind of trick you can get away with is like this:

if (restart) // This is fast
{lock(M);
if (restart) // This is correct
{seed = new_seed;
restart = FALSE
unlock(M);
goto restarter;
}
else
unlock(M);// Ignore and continue
}

Be careful. Be very, very careful! Maybe there's a better way to do this?
Remember: Mutexes are your friends!

Good luck!

-Bil
--
================
B...@LambdaCS.com

http://www.LambdaCS.com
Lambda Computer Science
555 Bryant St. #194
Palo Alto, CA,
94301

Phone/FAX: (650) 328-8952

Tim Beckmann

unread,

Feb 24, 1998, 3:00:00 AM2/24/98

to

Actually, Bill has a couple errors in his reply...

Bil Lewis wrote:
>
> Felix,
>
> Several parts to your question...
>
> 1: Writing a byte on most modern machines is non-atomic. It goes:
> read word, mask out byte, add in new byte, write word.

Eventhough I don't believe this is correct, it doesn't matter with
a single thread writing the byte and a single thread reading the byte.

My reasoning for thinking this statement just isn't correct is the
following:
If 4 bytes are packed into a 32 bit word with one byte that is shared by
a pair of threads for the situation described and 3 bytes that are used
by
just one of the threads, the program would function randomly. If thread
1
modified the "shared" byte, with or without locking a mutex, the other 3
bytes
could be modified by thread 2 and have their values wiped out by the
read-modify-write sequence Bill describes since thread 2 won't lock the
mutex
for the 3 bytes it "owns". The only way this sequence could
work is for (1)the bus to lock out accesses by other processors during
this cycle
which would be bad for multiprocessing efficiency or (2) the compiler to
not
pack bytes in this fashion which would increase memory use drastically.

Eventhough I haven't dug into Risc processor architecture enough to say
this
for sure, I'd guess that the on board cache has the ability to perform
any size
of write from a single byte to a whole machine sized word. And all
writes from
the cache occur as a machine word.

> > Is it safe to simply set a single byte flag, which the CRC thread
> > peridically polls, and exits if set? Is a mutex really necessary
> > around this single byte write? It's not the time or code I'm worried
> > about, it's understanding how this particular could go wrong.
>
> Maybe... but... You set the flag to EXIT in CPU1. But CPU1 doesn't
> write that word out to main memory right away. It could take many uSecs.
> Is that important? (Probably not.)

Whether a mutex is used or not, this is the behavior anyway. Depending
on
the caching algorithm used by the processor of course.

Later,
Tim

Boris Goldberg

unread,

Feb 24, 1998, 3:00:00 AM2/24/98

to

Felix Morley Finch wrote:
>
> I've read the FAQ and the online book (which I've ordered). I've got
> 10-15 years of various OS work, real time diagnostics, and related
> programming, assembler and C, to understand why changing shared
> multibyte data is a real no-no. But I don't understand why this
> applies to single byte data. I have just started learning about
> threads, so I am not even dangerous yet - give me the chance :-)
>

> My current need for a thread is a subroutine which ges away for a
> second or two to calculate a 16 bit number (CRC a bunch of files). It
> seems like a perfect opportunity for a thread - long calculation, no

> interaction with the main task. One problem is that the main task may
> be given a new seed and told to do it again while still waiting for
> the current CRC calculation to finish. It is supposed to discard the
> current operation and start the new one.
>

> Is it safe to simply set a single byte flag, which the CRC thread
> peridically polls, and exits if set? Is a mutex really necessary
> around this single byte write? It's not the time or code I'm worried
> about, it's understanding how this particular could go wrong.
>

> A bit more complex variation: Supposing this single byte operation is
> safe. Can I use the single byte flag to indicate that the seed has
> changed and the calculation should restart, rather than simply exit

> prematurely? This CRC calculation uses the seed at the beginning

> only. Suppose I write a new seed, then write the single byte flag -
> is there any chance that various different hardware implementations
> would write the data asynchrounously and not in the same order, and
> this is why such operations need mutex protection?
>

> --
> ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.

I heard that changing one byte is safe w/out mutex.

Achim Gratz

unread,

Feb 24, 1998, 3:00:00 AM2/24/98

to

Tim Beckmann <tbec...@edcmail.cr.usgs.gov> writes:
[how he believes memory works in an MP system]

You need a mutex. Even if the write and read were both atomic,
there's no guarantee who can and can not see the changed value until
you force the change to be visible (i.e. order the reads and writes)
by other processors through the use of a mutex. If you know that a
more restricted memory model is implemented by a particular system,
you might get away with it, but it's inherently unportable to rely on
such behaviours.

Achim Gratz.

--+<[ It's the small pleasures that make life so miserable. ]>+--
WWW: http://www.inf.tu-dresden.de/~ag7/{english/}
E-Mail: gr...@ite.inf.tu-dresden.de
Phone: +49 351 463 - 8325

Patrick TJ McPhee

unread,

Feb 25, 1998, 3:00:00 AM2/25/98

to

In article <34F2D1...@edcmail.cr.usgs.gov>,
Tim Beckmann <tbec...@edcmail.cr.usgs.gov> wrote:
% Actually, Bill has a couple errors in his reply...
Don't we all...

%
% Bil Lewis wrote:

% > Maybe... but... You set the flag to EXIT in CPU1. But CPU1 doesn't
% > write that word out to main memory right away. It could take many uSecs.
% > Is that important? (Probably not.)
%
% Whether a mutex is used or not, this is the behavior anyway. Depending
% on
% the caching algorithm used by the processor of course.

POSIX guarantees that it will be written out if you use a mutex, but not
if you don't. You could have an architecture which works like this:
set flag in cpu1 -- stays in processor cache
read flag from cpu2's processor cache -- it isn't set
read flag from cpu2's processor cache -- it isn't set
...
two days pass
read flag from cpu2's processor cache -- it isn't set

It's never safe not to use mutexes because only when you use mutexes
are you guaranteed that the program will operate correctly.
--

Patrick TJ McPhee
East York Canada
pt...@interlog.com

Tim Beckmann

unread,

Feb 25, 1998, 3:00:00 AM2/25/98

to

Patrick TJ McPhee wrote:
>
> In article <34F2D1...@edcmail.cr.usgs.gov>,

>
> POSIX guarantees that it will be written out if you use a mutex, but not
> if you don't. You could have an architecture which works like this:
> set flag in cpu1 -- stays in processor cache
> read flag from cpu2's processor cache -- it isn't set
> read flag from cpu2's processor cache -- it isn't set
> ...
> two days pass
> read flag from cpu2's processor cache -- it isn't set

Not true. Have you ever heard of cache coherency? It invalidates the
copy
of the flag in cpu2's cache when a write to the flag occurs in cpu1's
cache.
If cpu2 reads the flag again it needs to read the copy from cpu1's cache
(probably implemented as cpu1 being requested to write the data to
memory).

However, if you aren't careful with how you declare the flag this could
happen with some of the optimizing compilers. Some compilers will copy
the
flag to an internal register and keep its value there and not write it
back
to the cache until a later time. So to avoid this,
the flag must be declared to be "volatile" (in C anyway) so the compiler
knows not to assume that something else won't access the flag without
its
knowledge.

A mutex may appear to solve this problem in some cases because
optimizing
compilers don't typically keep data in an internal register across
function
calls...so unlocking the mutex may appear to "fix" this but it isn't
guaranteed to.

> It's never safe not to use mutexes because only when you use mutexes
> are you guaranteed that the program will operate correctly.

I still don't believe this is true. Although, if you have any doubt you
should use a mutex.

Tim

Tim Beckmann

unread,

Feb 25, 1998, 3:00:00 AM2/25/98

to

Achim Gratz wrote:
>
> You need a mutex. Even if the write and read were both atomic,
> there's no guarantee who can and can not see the changed value until
> you force the change to be visible (i.e. order the reads and writes)
> by other processors through the use of a mutex. If you know that a
> more restricted memory model is implemented by a particular system,
> you might get away with it, but it's inherently unportable to rely on
> such behaviours.
>

Not true. See my reply to Patrick TJ McPhee.

Achim Gratz

unread,

Feb 25, 1998, 3:00:00 AM2/25/98

to

Tim Beckmann <tbec...@edcmail.cr.usgs.gov> writes:

Of course you can believe whatever you want. I suggest you read about
memory models, e.g. the SPARC Architecture Manual, Version 9,
Appendices D and J. Systems which exhibit the behavior that you think
is impossible already exist, so the discussion is not at all
theoretical. And no, cache coherency doesn't help you with the
problem of memory order and visibility [hint: think about pipelines
and load/store buffers].

Your reply to P. TJ McPhee relies on a few more unsupported
assumptions that I'll leave to others to take apart.

Patrick TJ McPhee

unread,

Feb 26, 1998, 3:00:00 AM2/26/98

to

In article <34F41F...@edcmail.cr.usgs.gov>,
Tim Beckmann <tbec...@edcmail.cr.usgs.gov> wrote:

% However, if you aren't careful with how you declare the flag this could
% happen with some of the optimizing compilers. Some compilers will copy
% the
% flag to an internal register and keep its value there and not write it
% back
% to the cache until a later time.
[...]

% A mutex may appear to solve this problem in some cases because
% optimizing
% compilers don't typically keep data in an internal register across
% function
% calls...so unlocking the mutex may appear to "fix" this but it isn't
% guaranteed to.

If the mutex doesn't solve this, then the implementation does not conform
to POSIX. (I'm assuming we're talking about POSIX threads here).

Tim Beckmann

unread,

Feb 26, 1998, 3:00:00 AM2/26/98

to

Patrick TJ McPhee wrote:
>
>
> If the mutex doesn't solve this, then the implementation does not conform
> to POSIX. (I'm assuming we're talking about POSIX threads here).
>

I haven't read the full POSIX spec (even though the sources I've read
say nothing about a mutex guaranteeing this behavior), but yes we're
talking about POSIX threads (or any other type of multiprocessing).
But I think you missed my point in that you don't need the mutex to
guarantee this behavior if you use your compiler correctly.

Tim

Tim Beckmann

unread,

Feb 26, 1998, 3:00:00 AM2/26/98

to

Achim Gratz wrote:
>
> Of course you can believe whatever you want. I suggest you read about
> memory models, e.g. the SPARC Architecture Manual, Version 9,
> Appendices D and J. Systems which exhibit the behavior that you think
> is impossible already exist, so the discussion is not at all
> theoretical. And no, cache coherency doesn't help you with the
> problem of memory order and visibility [hint: think about pipelines
> and load/store buffers].

Everyone can believe what they want :) I have read a lot about memory
models - mainly Intel and MIPS processors. They both behave exactly as
I described. I haven't worked on SPARC Architectures however so maybe
they use a method I haven't run across yet. Do you have a web link
for the SPARC Architecture Manual? I'd like to look at it. (and I
have thought about pipelines and load/store buffers)

> > Your reply to P. TJ McPhee relies on a few more unsupported
> assumptions that I'll leave to others to take apart.

Please enlighten me as to what my unsupported assumptions are. It's
the only way I'll learn the error I'm making here :) (which of
course I still don't believe I am ;)

Tim

Joe Seigh

unread,

Feb 26, 1998, 3:00:00 AM2/26/98

to

Good point. This is news to me and is contrary to what I thought was
the general trend of all the various loads and stores being atomic.

This is also contrary to what the Java architects believed when they
specified that accesses of operands up to a 32 bit wordsize were
atomic. This might be a good reason to use Java. You can make
certain assumptions since it's architected. It's the problem of the
implementer of the java virtual machine as how to deal with
non-atomic accesses. So, for example, on architectures with
non-atomic byte access, a java byte array is either going to have
sucky performance or be space inefficient because the bytes are
spaced out far enough so as to not interfere with one another. But,
hey, your program will work.

Maybe Microsoft can convince Intel to not have atomic anything? :-)

Speaking of which, how far apart is safe? If we can't assume byte
accesses are atomic, we cannot assume word accesses are atomic for
the same reason. For malloc'ed storage, malloc returns storage
"suitably aligned for any use", so we know we're safe as long as we
control all access to that piece of storage with a single mutex. But
for things like shared memory where explicit allocation by program
control is generally the rule, no guidelines appear to exist. Is it
necessary to control all access a shared memory region with a single
mutex if you want your program to be portable in the strict sense?

Joe Seigh

Bryan O'Sullivan

unread,

Feb 26, 1998, 3:00:00 AM2/26/98

to

t> Have you ever heard of cache coherency? It invalidates the copy of
t> the flag in cpu2's cache when a write to the flag occurs in cpu1's
t> cache.

There are a number of different cache coherency protocols in use on
modern systems, and if you believe that many or all have the property
of immediately invalidating both in-cache copies of a line and
flushing in-progress loads when a write is detected, you are mistaken.

t> If cpu2 reads the flag again it needs to read the copy from cpu1's
t> cache

Not necessarily true.

t> Some compilers will copy the flag to an internal register and keep
t> its value there and not write it back to the cache until a later
t> time.

All compilers do this, all the time.

t> So to avoid this, the flag must be declared to be "volatile" (in C
t> anyway) so the compiler knows not to assume that something else
t> won't access the flag without its knowledge.

Declaring a variable as volatile doesn't help in any useful way,
because the semantics of the volatile keyword only apply to local
memory, and say nothing about multiprocessing environments.

t> A mutex may appear to solve this problem in some cases [...]

A mutex solves this problem in all cases.

<b

--
Let us pray:
What a Great System. b...@eng.sun.com
Please Do Not Crash. b...@serpentine.com
^G^IP@P6 http://www.serpentine.com/~bos

David Holmes

unread,

Feb 27, 1998, 3:00:00 AM2/27/98

to

Bil Lewis <B...@LambdaCS.com> wrote in article
<34F22F...@LambdaCS.com>...
> Felix,

> 1: Writing a byte on most modern machines is non-atomic. It goes:
> read word, mask out byte, add in new byte, write word.

Most modern machines? Surely many modern machines are byte addressable and
thus the individual byte can be written independent of the rest of the
word?

David

Felix Morley Finch

unread,

Feb 27, 1998, 3:00:00 AM2/27/98

to

On 26 Feb 1998 15:26:06 GMT, Joe Seigh <se...@bose.com> scrawled:

>In article <34F22F...@LambdaCS.com>, Bil Lewis <B...@LambdaCS.com> writes:
>|> Felix,
>|>
>|> Several parts to your question...
>|>

>|> 1: Writing a byte on most modern machines is non-atomic. It goes:
>|> read word, mask out byte, add in new byte, write word.
>|>

I've got a lousy news feed and didn't see the post Joe referenced. I
feel like Homer Simpson on the non-atomic byte writes. Doh! Makes me
think that even a "word size" object might not be atomic; Pentiums
have a 32 bit word but 64 bit data bus, right? Don't know how they do
32 bit writes, but it doesn't matter, others could screw it up.

Anyway, now I'm convinced. My brain was thinking along the lines of
new fangled cache writes being out of order, but the catch is much
simpler.

Thanks.

--
... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.

David Holmes

unread,

Feb 27, 1998, 3:00:00 AM2/27/98

to

Joe Seigh <se...@bose.com> wrote in article <1998Feb2...@bose.com>...

> In article <34F22F...@LambdaCS.com>, Bil Lewis <B...@LambdaCS.com>
writes:

> |> 1: Writing a byte on most modern machines is non-atomic. It goes:
> |> read word, mask out byte, add in new byte, write word.
> |>
>

> Good point. This is news to me and is contrary to what I thought was
> the general trend of all the various loads and stores being atomic.
>
> This is also contrary to what the Java architects believed when they
> specified that accesses of operands up to a 32 bit wordsize were
> atomic.

I think there are two different issues in this discussion.

Firstly is writing a byte atomic?

The answer to that I firmly believe is generally yes - on machines that are
byte addressable. Only if the machine is not byte addressable would the
hardware need to read/mask/write as Bil suggested.

Note that this involves simply writing the byte - it doesn't mean updating
the value based on its current value ie.
b = 1; // atomic
b++; // non-atomic

The second part is whether the writing of that byte will necessarily be
visible to other threads reading that byte. The answer to that is generally
no. Although some hardware systems may enforce a cache coherency protocol
that makes this true, there are many that will require memory barrier
instructions to be executed. This is where posix mutexes come into play
because they are required to issue the necessary memory barrier
instructions.

The bottom line is to explicitly synchronise using the mutex.

David

Tim Beckmann

unread,

Feb 27, 1998, 3:00:00 AM2/27/98

to

David Holmes wrote:
> I think there are two different issues in this discussion.
>
> Firstly is writing a byte atomic?
>
> The answer to that I firmly believe is generally yes - on machines that are
> byte addressable. Only if the machine is not byte addressable would the
> hardware need to read/mask/write as Bil suggested.
>
> Note that this involves simply writing the byte - it doesn't mean updating
> the value based on its current value ie.
> b = 1; // atomic
> b++; // non-atomic
>

David,

Well stated! I agree with you fully on the first part. And I think
that
even on machines that are not byte addressable that the compilers won't
combine bytes declared as single bytes into the same word as other
single byte flags. But if a byte array is declared, it will combine
the bytes into a single word.

For example:
unsigned char flag1;
unsigned char flag2;

will actually result in the compiler putting each of these flags in
their own 32 bit word (assuming the machine has 32 bit words :)

however, declaring
unsigned char flag[2];
will most likely result in two bytes packed into the same word.

Of course this is guesswork on my part, but it seems the only
logical way to do it for C code to be portable between byte addressable
and non-byte addressable machines.

> The second part is whether the writing of that byte will necessarily be
> visible to other threads reading that byte. The answer to that is generally
> no. Although some hardware systems may enforce a cache coherency protocol
> that makes this true, there are many that will require memory barrier
> instructions to be executed. This is where posix mutexes come into play
> because they are required to issue the necessary memory barrier
> instructions.

This also makes sense. The memory barrier concept is one I'm not very
familiar
with. How does the memory barrier keep track of what changed in the
cache
in order to maintain coherency? The programmer knows what memory is
being
locked by the mutex but the machine doesn't really know. Does the
machine
maintain a list of changed lines in the cache that are flushed when the
barrier is released?

I wonder if you know of any modern general purpose machines that do
this?
Or all they all highly specialized machines that you generally need not
worry about porting code to?

>
> The bottom line is to explicitly synchronise using the mutex.
>

When in doubt, use the mutex :)

Tim

Tim Beckmann

unread,

Feb 27, 1998, 3:00:00 AM2/27/98

to

Bryan O'Sullivan wrote:
>
> t> Have you ever heard of cache coherency? It invalidates the copy of
> t> the flag in cpu2's cache when a write to the flag occurs in cpu1's
> t> cache.
>
> There are a number of different cache coherency protocols in use on
> modern systems, and if you believe that many or all have the property
> of immediately invalidating both in-cache copies of a line and
> flushing in-progress loads when a write is detected, you are mistaken.
>
> t> If cpu2 reads the flag again it needs to read the copy from cpu1's
> t> cache
>
> Not necessarily true.

Bryan,

I've thought this through a little and realize you're right. There
probably are machines that do not automatically enforce cache
coherency. Do you know of any modern, non-specialized systems that
don't? It would be helpful to know if assuming cache coherency is
present is generally safe to do for the machines my software may
run on.

>
> t> Some compilers will copy the flag to an internal register and keep
> t> its value there and not write it back to the cache until a later
> t> time.
>
> All compilers do this, all the time.

No, all compilers don't do this all the time. Look at some assembly
code generated by compilers for intel processors (with their woefully
inadequate number of registers :).

> t> So to avoid this, the flag must be declared to be "volatile" (in C
> t> anyway) so the compiler knows not to assume that something else
> t> won't access the flag without its knowledge.
>
> Declaring a variable as volatile doesn't help in any useful way,
> because the semantics of the volatile keyword only apply to local
> memory, and say nothing about multiprocessing environments.

Using volatile works fine in the multiprocessing systems I've worked on.
In fact, just recently I had a little glitch that was fixed by using
volatile on a variable. The code looked like the following:

if (variable < threshold) /* check if a variable has exceeded a
threshold that is seldom exceeded */
{
/* the threshold was exceeded so lock the mutex
so the condition can be dealt with */
lock mutex

/* make sure the condition still exists since it may
have been dealt with by another thread between the
first check and the mutex lock */
if (variable < threshold)
{
/* do stuff */
}
release mutex
}

This code would occasionally glitch because the compiler had loaded
the value of variable into a register and did not reload after the
mutex was locked. Using volatile is the way this was fixed. Of
course, I could have always locked the mutex before checking the
variable, but the penalty would have been huge in performance.

Seems "volatile" was used in a very useful way here!

Later,
Tim

Joe Seigh

unread,

Feb 27, 1998, 3:00:00 AM2/27/98

to

|> The second part is whether the writing of that byte will necessarily be
|> visible to other threads reading that byte. The answer to that is generally
|> no. Although some hardware systems may enforce a cache coherency protocol
|> that makes this true, there are many that will require memory barrier
|> instructions to be executed. This is where posix mutexes come into play
|> because they are required to issue the necessary memory barrier
|> instructions.
|>

|> The bottom line is to explicitly synchronise using the mutex.
|>

|> David

Actually, I think the issue Bil raised isn't about atomicity.
Atomic access means that either the old value or the new value is
seen if a store occurs "concurrently" with a fetch from that
location and nothing else. For a byte, that seems to be true,
whether or not that adjacent memory locations are inadvertently
changed.

The issue seems to be, what granularity of locality is safe for
concurrent storage updates.

So if we have two adjacent byte locations, each one "protected"
by a different mutex, there is no way that we can guarantee safe
concurrent updates of those two locations. POSIX threads don't
even address this issue.

Joe Seigh

Bryan O'Sullivan

unread,

Feb 27, 1998, 3:00:00 AM2/27/98

to

t> There probably are machines that do not automatically enforce cache
t> coherency.

I think you may have misunderstood my meaning. Of the cache coherency
protocols currently in use, none that I know of requires a write to
memory by one processor to be immediately visible by other processors.

t> Some compilers will copy the flag to an internal register and keep
t> its value there and not write it back to the cache until a later
t> time.

b> All compilers do this, all the time.

t> No, all compilers don't do this all the time.

Yes they do. The fact that, say, an increment of a value stored in a
memory location on an Intel processor does not mean that the value
isn't first loaded from that processor, then incremented, then written
back. This just happens to be an atomic bus transaction.

g> Declaring a variable as volatile doesn't help [...]

t> Using volatile works fine in the multiprocessing systems I've
t> worked on.

It "works" because it's a legal storage attribute in ANSI C, but that
doesn't mean that it "works" in the way you think.

t> In fact, just recently I had a little glitch that was fixed by
t> using volatile on a variable.

The glitch you fixed, if your pseudocode is anything to go by, was
caused by a bug in your compiler. You shouldn't have needed to use
the volatile keyword as you did, at least if you were writing on a
system with POSIX threads.

David Holmes

unread,

Mar 2, 1998, 3:00:00 AM3/2/98

to

Joe Seigh <se...@bose.com> wrote in article <1998Feb2...@bose.com>...

> The issue seems to be, what granularity of locality is safe for
> concurrent storage updates.
>
> So if we have two adjacent byte locations, each one "protected"
> by a different mutex, there is no way that we can guarantee safe
> concurrent updates of those two locations. POSIX threads don't
> even address this issue.

I thought about this after posting. An architecture such as Bil describes
which requires non-atomic read/mask/write sequences to update variables of
a smaller size than the natural word size, would be a multi-threading
nightmare. As you note above two adjacent byte values would need a common
mutex to protect access to them and this applies even if they were each
used by only a single thread! On such a system I'd only want to program
with a thread-aware language/compiler/run-time.

David

David Holmes

unread,

Mar 2, 1998, 3:00:00 AM3/2/98

to

Tim Beckmann <tbec...@edcmail.cr.usgs.gov> wrote in article
<34F6C4...@edcmail.cr.usgs.gov>...

> Well stated! I agree with you fully on the first part. And I think
> that even on machines that are not byte addressable that the compilers
won't
> combine bytes declared as single bytes into the same word as other
> single byte flags.

Why not? If the compiler is for a non-concurrent language then it shouldn't
have to worry about such things and would just define variables and types
in a 'natural' and efficient way. However a compiler/run-time for a
concurrent language, such as Java, would have to take account of this
architecture.

> Of course this is guesswork on my part, but it seems the only
> logical way to do it for C code to be portable between byte addressable
> and non-byte addressable machines.

I don't see portability as an issue. The compiler for the non-byte
addressable machine is responsible for enforcing the rules of 'C'. If
there is a way to accommodate taking the address of char types and of
performing pointer arithmetic to work through char arrays etc, even with
multiple chars 'packed' into each word - then the compiler can do that. Of
course the compiler can also decide that a char is a machine word. Because
'C' is not thread-aware the compiler would not have to consider atomicity
issues.

As I mention in another post this architecture would require synchronized
access even to unshared adjacent bytes which would be a nightmare for the
threads programmer.

> This also makes sense. The memory barrier concept is one I'm not very
> familiar with. How does the memory barrier keep track of what changed in
the
> cache in order to maintain coherency?

I'm no expert on this so hopefully an expert will respond. I don't know the
details but basically the memory barrier instruction causes the cache to be
flushed so that pending writes go to main memory and subsequent reads come
from main memory. I don't know the level of granularity at which this is
achieved. One of the books for finding out about these sorts of issues on
MP systems is "UNIX Systems for Modern Architectures: Symmetric
Multiprocessing and Caching for Kernel Programmers" by Curt Schimmel,
Addison-Wesley ISBN 0.201-63338-8 (which I have only recently started
reading)

I do think however that these cache issues need to be considered of most
mainstream SMP systems.

David

David Holmes

unread,

Mar 2, 1998, 3:00:00 AM3/2/98

to

Tim Beckmann <tbec...@edcmail.cr.usgs.gov> wrote in article

<34F6C9...@edcmail.cr.usgs.gov>...

> Using volatile works fine in the multiprocessing systems I've worked on.
> In fact, just recently I had a little glitch that was fixed by using
> volatile on a variable. The code looked like the following:

Someone can correct me if I'm wrong here but 'volatile' in 'C' or 'C++' is
not really defined other than that it prevents the compiler from performing
certain optimisations on the variable eg. not storing it in a register. Now
C/C++ are not thread-aware so although a compiler could extend the notion
of what volatile meant to cover caching issues, it is certainly not obliged
to do so and thus simply using volatile may not work in MP environments.

On the other hand Java's volatile keyword is defined in terms of atomicity
of access and memory visibility and so will work in MP environments.

In your case using volatile happened to fix a problem but it is not
something you can rely on across different compilers and systems.

David

David Holmes

unread,

Mar 2, 1998, 3:00:00 AM3/2/98

to

Bryan O'Sullivan <b...@serpentine.com> wrote in article
<87afbct...@serpentine.com>...

> The glitch you fixed, if your pseudocode is anything to go by, was
> caused by a bug in your compiler. You shouldn't have needed to use
> the volatile keyword as you did, at least if you were writing on a
> system with POSIX threads.

POSIX requires the mutex to provide memory barrier instructions to ensure
that shared variables are written/read to/from memory. BUT the compiler
need know nothing about POSIX so what rules are we relying on that prevent
the compiler from optimising access to a variable in this way? Is it simply
the case that because of the total lack of access control in 'C' a compiler
should never assume anything about the value of variables across function
calls?

David

Casper H.S. Dik - Network Security Engineer

unread,

Mar 2, 1998, 3:00:00 AM3/2/98

to

[[ PLEASE DON'T SEND ME EMAIL COPIES OF POSTINGS ]]

"David Holmes" <dho...@mri.mq.edu.au> writes:

About global variables, yes. It cannot assume that it can cache global
variables (including anything with a reference) can be cached across
function calls.

It can only cache the stack data; it can't even cache any "static data"
because of other function invocation, except in certain cases.

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

Tim Beckmann

unread,

Mar 2, 1998, 3:00:00 AM3/2/98

to

Bryan O'Sullivan wrote:
>
> t> No, all compilers don't do this all the time.
>
> Yes they do. The fact that, say, an increment of a value stored in a
> memory location on an Intel processor does not mean that the value
> isn't first loaded from that processor, then incremented, then written
> back. This just happens to be an atomic bus transaction.

If you read closely, I said "load it into a register". Obviously, what
you state happens all the time, but the compiler generates assembly
instructions that look like an "in memory" increment to the compiler.
Yes it loads the value from memory, runs it through the integer
pipeline, and stores the value. But it doesn't get loaded into a
register the compiler knows about.

>
> g> Declaring a variable as volatile doesn't help [...]
>
> t> Using volatile works fine in the multiprocessing systems I've
> t> worked on.
>
> It "works" because it's a legal storage attribute in ANSI C, but that
> doesn't mean that it "works" in the way you think.

Actually, it works just the way I think :)

>
> t> In fact, just recently I had a little glitch that was fixed by
> t> using volatile on a variable.

>
> The glitch you fixed, if your pseudocode is anything to go by, was
> caused by a bug in your compiler. You shouldn't have needed to use
> the volatile keyword as you did, at least if you were writing on a
> system with POSIX threads.
>

I don't know if that is true or not. I haven't had the opportunity
to read the full POSIX threads spec. I don't view it as a bug in the
compiler at all. But then again, I'm used to working in embedded
systems software where volatile was used all the time. So I expect
to have to do this :)

Tim

Tim Beckmann

unread,

Mar 2, 1998, 3:00:00 AM3/2/98

to

David Holmes wrote:
>
> I thought about this after posting. An architecture such as Bil describes
> which requires non-atomic read/mask/write sequences to update variables of
> a smaller size than the natural word size, would be a multi-threading
> nightmare. As you note above two adjacent byte values would need a common
> mutex to protect access to them and this applies even if they were each
> used by only a single thread! On such a system I'd only want to program
> with a thread-aware language/compiler/run-time.
>
> David

David,

My thoughts exactly!

Does anyone know of a mainstream architecture that does this sort of
thing?

Tim

Joe Seigh

unread,

Mar 2, 1998, 3:00:00 AM3/2/98

to

In article <01bd457a$e903a9e0$1bf56f89@dholmes>, "David Holmes" <dho...@mri.mq.edu.au> writes:
|> Joe Seigh <se...@bose.com> wrote in article <1998Feb2...@bose.com>...
|> > The issue seems to be, what granularity of locality is safe for
|> > concurrent storage updates.
|> >
|> > So if we have two adjacent byte locations, each one "protected"
|> > by a different mutex, there is no way that we can guarantee safe
|> > concurrent updates of those two locations. POSIX threads don't
|> > even address this issue.
|>

|> I thought about this after posting. An architecture such as Bil describes
|> which requires non-atomic read/mask/write sequences to update variables of
|> a smaller size than the natural word size, would be a multi-threading
|> nightmare. As you note above two adjacent byte values would need a common
|> mutex to protect access to them and this applies even if they were each
|> used by only a single thread! On such a system I'd only want to program
|> with a thread-aware language/compiler/run-time.
|>
|> David

Actually, I beginning to think Bil misread that particular
architecture, which is not uncommon in this type of business
considering how most architectures are written. It's certainly
happened to me enough times.

If I remember correctly, IBM's S390 has or had an 8 byte store buffer
which was the actual size of data that the cpu talked to storage
with. It was an engineering tradeoff allowing simpler and thus
faster storage or cache most likely. Even though a store instruction
could cause multiple "accesses" to storage, the update to storage was
atomic w.r.t other processors.

Joe Seigh

David Holmes

unread,

Mar 3, 1998, 3:00:00 AM3/3/98

to

David Holmes <dho...@mri.mq.edu.au> wrote in article
<01bd457e$c00daf00$1bf56f89@dholmes>...

> On the other hand Java's volatile keyword is defined in terms of
atomicity
> of access and memory visibility and so will work in MP environments.

Oops - slip of the keyboard. Delete *atomicity* in the above statement.
volatile has nothing to do with atomicity only memory visibility.

David

Dave Butenhof

unread,

Mar 3, 1998, 3:00:00 AM3/3/98

to

David Holmes wrote:

I really wasn't going to step into this one, because I've gone through this
many times before. Unfortunately, the discussion has deteriorated, and I feel
an obligation. (Oddly, someone dropped by my office to ask about memory
barriers while I was writing this.)

> Bil Lewis <B...@LambdaCS.com> wrote in article
> <34F22F...@LambdaCS.com>...
> > Felix,

> > 1: Writing a byte on most modern machines is non-atomic. It goes:
> > read word, mask out byte, add in new byte, write word.
>

> Most modern machines? Surely many modern machines are byte addressable and
> thus the individual byte can be written independent of the rest of the
> word?

Yes, "most" "modern" machines are byte-addressable. (Both words in quotes
because the concepts are ill-defined and probably meaningless, but I'm not in
a mood to worry about that now.)

However, the previous posting claiming that byte-addressable machines
necessarily provide atomic byte access was simply in error. "Byte
addressable" means that each byte has an address. That doesn't help. SOME
machines may have atomic byte access, but this has nothing to do with whether
the byte is addressable. For example, a machine where each address was
attached to a 48 byte word could provide an instruction to atomically update
an arbitrary bit-field within that word -- while a byte-addressable machine
need not necessarily provide atomic access.

There are (depending on how one breaks these things down) at least three
independent issues here:

1. Atomic access granularity
2. Read/write ordering
3. Memory coherency

This discussion has been trying to address the three as if they were the
same. They're not even connected. (Or, at best, only very loosely connected.)

Most modern machines (yes, I should probably use quotes again) are designed
for fast and efficient multiprocessing. The memory interconnect is the big
bottleneck, and a lot of work has gone into streamlining the memory access
and cache protocols. Memory accesses are usually made in "chunks" not
necessarily related to the size of the data type. In general, a machine has a
few "memory access types", very likely a smaller set than the set of
instruction set types. Usually, only "memory access types" are atomic. On an
Alpha EV4 and EV5, for example, the memory access types are 32-bit "words"
and 64-bit "longwords". Smaller accesses require loading and masking a word
or longword. While some machines might choose to hide the distinction between
"memory types" and "instruction types", Alpha doesn't. To read a byte, you
must load the enclosing word/longword, (I'll use "word" for convenience from
now on), and use special instructions to mask/shift into the desired
position.

That's all fine, except when you get into concurrently executing entities
(threads, or processes using shared memory), and one tries to STORE into the
non-atomic field. To store such a field, you fetch the current word, shift &
mask (field insert) your new data, and then write the word back. If another
entity is simultaneously setting a different field in the same word, only ONE
of the two will supply a new value for the entire word, including the other
thread's field. Worse, a non-atomic field (e.g., a 16-bit short rather than a
byte) might be split between two words. If it does, you can get into trouble
even reading, because you have to read both words and combine them to create
the short value you want. Some other thread might have changed one of the
words between your two reads. That's "word tearing", and it means you've
gotten the wrong data. Of course word tearing can happen on writes, too, in
addition to the normal field access problems. Messy!

Attempting to share information between concurrently executing instruction
streams (that may be on separate processors) also requires read/write
ordering. That is, if you set a flag to signal some change in state (e.g.,
adding an item to a queue), you must be able to know that seeing the change
in the flag means you can see the new queue item. Modern SMP memory systems
frequently allow reordering of operations in the CPU to memory controller
pipeline, for all sorts of reasons (including cache synchronization issues,
speculative execution, etc.) So you may queue an item and then set a flag (or
fill in the fields of a structure and then queue it), but have the data
become visible to another processor in a different order. Unless you're
communicating between concurrently executing entities, reordering doesn't
affect you -- so it's a great performance tradeoff. But it means that when
you require ordering, you need to do something extra. One common way to force
ordering is a "memory barrier" instruction. On Alpha, for example, MB
prevents memory operation reordering across the instruction. (One could
consider a stream of memory requests in a pipe between the CPU and memory,
which can be arbitrarily reordered for implementation convenience; but the
reordering agent can't move anything past an "MB token".)

And then we've got memory coherency. A "write-through" cache may invalidate
other caches, and update main memory. But a "write-back" cache may not write
into main memory for some time. Even if other caches are invalidated, the
processors won't see the new value until it's written. That's OK, though, as
long as both processors make proper use of memory barriers. The writer puts a
memory barrier between writing the data and writing the flag (or pointer),
and the reader puts a memory barrier between reading the flag/pointer and
reading the data. Now, whenever the flag/pointer appears in memory, you know
that the data to which it points is valid -- because you can't have read it
before the flag, and the writer can't have written it after the flag.

For more information without getting too deep into processor implementation
details, see the section "Memory visibility between threads" in my book
(Programming with POSIX Threads, web link in my .sig). Curt Schimmel's
UNIX Systems for Modern Architectures (Addison-Wesley) has a section called
"Other Memory Models" that describes the SPARC architecture's "Partial Store
Ordering" (loose read/write ordering with memory barriers), though it doesn't
address word tearing.

/---------------------------[ Dave Butenhof ]--------------------------\
| Digital Equipment Corporation bute...@zko.dec.com |
| 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof |
| Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ |
\-----------------[ Better Living Through Concurrency ]----------------/

David Holmes

unread,

Mar 4, 1998, 3:00:00 AM3/4/98

to

Dave Butenhof <bute...@zko.dec.com> wrote in article
<34FC0286...@zko.dec.com>...

> I really wasn't going to step into this one, because I've gone through
this
> many times before. Unfortunately, the discussion has deteriorated, and I
feel
> an obligation.

Thanks for doing so. Although there was a general consensus to "use the
mutex" some time back, the architecture discussions have been
thought-provoking if not accurate. ;-)

> However, the previous posting claiming that byte-addressable machines
> necessarily provide atomic byte access was simply in error.

Well it was a question rather than a claim. I am now older and wiser.

> For more information without getting too deep into processor
implementation
> details, see the section "Memory visibility between threads" in my book
> (Programming with POSIX Threads, web link in my .sig). Curt Schimmel's
> UNIX Systems for Modern Architectures (Addison-Wesley) has a section
called
> "Other Memory Models" that describes the SPARC architecture's "Partial
Store
> Ordering" (loose read/write ordering with memory barriers), though it
doesn't
> address word tearing.

A slightly deeper, but very interesting, read is the first part of Chapter
5 from the Alpha Architecture Handbook V3. It discusses many of these
issues, talks about atomic access (the Alpha 21164 can optionally support
atomic byte access), cache issues, ordering issues and memory barrier
instructions.
http://ftp.digital.com/pub/Digital/info/semiconductor/literature/dsc-library
html

Cheers,
David

Dave Butenhof

unread,

Mar 4, 1998, 3:00:00 AM3/4/98

to

Tim Beckmann wrote:

> David Holmes wrote:
> >
> > I thought about this after posting. An architecture such as Bil describes
> > which requires non-atomic read/mask/write sequences to update variables of
> > a smaller size than the natural word size, would be a multi-threading
> > nightmare. As you note above two adjacent byte values would need a common
> > mutex to protect access to them and this applies even if they were each
> > used by only a single thread! On such a system I'd only want to program
> > with a thread-aware language/compiler/run-time.
> >
> > David
>

> David,
>
> My thoughts exactly!
>
> Does anyone know of a mainstream architecture that does this sort of
> thing?

Oh, absolutely. SPARC, MIPS, and Alpha, for starters. I'll bet most other RISC
systems do it, too, because it substantially simplifies the memory subsystem
logic. And, after all, the whole point of RISC is that simplicity means speed.

If you stick to int or long, you'll probably be safe. If you use anything
smaller, be sure they're not allocated next to each other unless they're under
the same lock.

I wrote a long post on most of the issues brought up in this thread, which
appears somewhere down the list due to the whims of news feeds, but I got
interrupted and forgot to address this issue.

If you've got

pthread_mutex_t mutexA = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t mutexB = PTHREAD_MUTEX_INITIALIZER;

char dataA;
char dataB;

And one thread locks mutexA and writes dataA while another locks mutexB and
writes dataB, you risk word tearing, and incorrect results. That's a "platform
issue", that, as someone else commented, POSIX doesn't (and can't) address.

What do you do? I always advise that you keep a mutex and the data it protects
closely associated. As well as making the code easier to understand, it also
addresses problems like this. If the declarations were:

typedef struct dataTag {
pthread_mutex_t mutex;
char data;
} data_t;

data_t dataA = {PTHREAD_MUTEX_INITIALIZER, 0};
data_t dataB = {PTHREAD_MUTEX_INITIALIZER, 1};

You can now pretty much count on having the two data elements allocated in
separate "memory access chunks". Not an absolute guarantee, since a
pthread_mutex_t might be a char as well, and some C compilers might not align
structures on natural memory boundaries. But most compilers on machines that
care WILL align/pad structures to fit the natural data size, unless you
override it with a switch or pragma (which is generally a bad idea even when
it's possible). And, additionally, a pthread_mutex_t is unlikely to be less
than an int, and is likely at least a couple of longs. (On Digital UNIX, for
example, a pthread_mutex_t is 48 bytes, and on Solaris it's 24 bytes.)

There are, of course, no absolute guarantees. If you want to be safe and
portable, you might do well to have a config header that typedefs
"smallest_safe_data_unit_t" to whatever's appropriate for the platform. Then
it's just a quick trip to the hardware reference manual when you start a port.
On a CISC, you can probably use "char". On most RISC systems, you should use
"int" or "long".

Yes, this is one more complication to the process of threading old code. But
then, it's nothing compared to figuring out which data is shared and which is
private, and then getting the locking protocols right.

mma...@dazel.com

unread,

Mar 4, 1998, 3:00:00 AM3/4/98

to

Dave Butenhof <bute...@zko.dec.com> wrote:
> There are, of course, no absolute guarantees. If you want to be safe and
> portable, you might do well to have a config header that typedefs
> "smallest_safe_data_unit_t" to whatever's appropriate for the platform. Then
> it's just a quick trip to the hardware reference manual when you start a
port.
> On a CISC, you can probably use "char". On most RISC systems, you should use
> "int" or "long".

If I'm not mistaken, isn't that spelled:

#include <signal.h>

typedef sig_atomic_t smallest_safe_data_unit_t;

?

Regards,

Mike Martin
mma...@dazel.com

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/ Now offering spam-free web-based newsreading

Tim Beckmann

unread,

Mar 4, 1998, 3:00:00 AM3/4/98

to

Dave Butenhof wrote:
> > David,
> >
> > My thoughts exactly!
> >
> > Does anyone know of a mainstream architecture that does this sort of
> > thing?
>
> Oh, absolutely. SPARC, MIPS, and Alpha, for starters. I'll bet most other RISC
> systems do it, too, because it substantially simplifies the memory subsystem
> logic. And, after all, the whole point of RISC is that simplicity means speed.

MIPS I know :) The latest MIPS processors R10K and R5K are byte addressable.
The whole point of RISC is simplicity of hardware, but if it makes the software
more complex it isn't worth it :)

> If you stick to int or long, you'll probably be safe. If you use anything
> smaller, be sure they're not allocated next to each other unless they're under
> the same lock.

Actually, you can be pretty sure that a compiler will split two declarations
like:
char dataA;
char dataB;
to be in two separate natural machine words. It is much faster and easier for
those RISC processors to digest. However if you declare something as:
char data[2]; /* or more than 2 */
you have to be VERY concerned with the effects of word tearing since the
compiler
will certainly pack them into a single word.

> I wrote a long post on most of the issues brought up in this thread, which
> appears somewhere down the list due to the whims of news feeds, but I got
> interrupted and forgot to address this issue.

Yep, I saw it. It was helpful. So was the later post by someone else who
included a link to a DEC alpha document that explained what a memory barrier
was in this context. I've seen three different definitions over the years.
The definition you described in your previous post agreed with the DEC alpha
description... That a memory barrier basically doesn't allow out of order
memory accesses to cross the barrier. A very important issue if you are
implementing mutexes or semaphores :)

This probably improves you chances on porting to other machines. Although, I
think just as many compilers will keep dataA and dataB in separate natural
machine words. Collecting the mutex and data into a structure is a good idea
just to make it clear what the mutex applies to.

However, I really believe that dataA and dataB should both be declared as
"volatile" to prevent the compiler from being too aggressive on it's
optimization. The mutex still doesn't guarantee that the compiler hasn't
cached the data in an internal register across a function call. My memory
isn't perfect, but I do think this bit me on IRIX.

> There are, of course, no absolute guarantees. If you want to be safe and
> portable, you might do well to have a config header that typedefs
> "smallest_safe_data_unit_t" to whatever's appropriate for the platform. Then
> it's just a quick trip to the hardware reference manual when you start a port.
> On a CISC, you can probably use "char". On most RISC systems, you should use
> "int" or "long".

There never are guarantees are there :)

> Yes, this is one more complication to the process of threading old code. But
> then, it's nothing compared to figuring out which data is shared and which is
> private, and then getting the locking protocols right.

But what fun would it be if it wasn't a challenge :)

However, I would like to revist the original topic of whether it is "safe" to
change a single byte without a mutex. Although, instead of "byte" I'd like to
say "natural machine word" to eliminate the word tearing and non-atomic memory
access concerns. I'm not sure it's safe to go back to the original topic, but
what the heck ;)

If you stick to a "natural machine word" that is declared as "volatile",
you do not absolutely need a mutex (in fact I've done it). Of course, there are
only certain cases where this works and shouldn't be done unless you really know
your hardware architecture and what you're doing! If you have a machine with a
lot of processors, unnecessarily locking mutexes can really kill parallelism.

I'll give one example where this might be used:

volatile int stop_flag = 0; /* assuming an int is atomic */

thread_1
{
/* bunch of code */

if some condition exists such that we wish to stop thread_2
stop_flag = 1;

/* more code - or not :) */
}

thread_2
{
while(1)
{
/* check if thread should stop */
if (stop_flag)
break;

/* do whatever is going on in this loop */
}
}

Of course, this assumes the hardware has some sort of cache coherency
mechanism. But I don't believe POSIX mutex's or memory barriers (as
defined for the DEC alpha) have any impact on cache coherency.

The example is simplistic, but it should work on a vast majority of
systems. In fact the stop_flag could just as easily be a counter
of some sort as long as only one thread is modifying the counter...

Later,
Tim

P.S. Dave, you certainly know what you're doing. I've been looking
for a GOOD book on Posix threads. I may have to check your's out :)
So far I've been making due on a pretty lame book, info I pulled from
the SGI web site, system header files....and 9 years experience of
rolling my own embedded system OS's (the most valuable of all).

David Holmes

unread,

Mar 5, 1998, 3:00:00 AM3/5/98

to

Dave Butenhof <bute...@zko.dec.com> wrote in article

<34FD6950...@zko.dec.com>...

> If you've got
>
> pthread_mutex_t mutexA = PTHREAD_MUTEX_INITIALIZER;
> pthread_mutex_t mutexB = PTHREAD_MUTEX_INITIALIZER;
>
> char dataA;
> char dataB;
>
> And one thread locks mutexA and writes dataA while another locks mutexB
and
> writes dataB, you risk word tearing, and incorrect results. That's a
"platform
> issue", that, as someone else commented, POSIX doesn't (and can't)
address.

That's a pretty serious impediment to threaded programming. How do you know
how things have been allocated? Isn't the compiler free to re-arrange data
declarations to 'optimise' alignment etc? Putting the data and mutex
together in a struct might work for C, but what about C++ where the data
and mutex are already within the object? Do you need to put structs within
the class as well?

When your average programmer sits down at their SparcWorkstation to write a
neat little POSIX pthreads program, how on earth are they supposed to know
about this? This isn't mentioned in your book Dave, nor in others that
discuss using threads on Solaris.

???
David

Achim Gratz

unread,

Mar 5, 1998, 3:00:00 AM3/5/98

to

Tim Beckmann <tbec...@sd.cybernex.net> writes:

> However, I would like to revist the original topic of whether it is "safe" to
> change a single byte without a mutex. Although, instead of "byte" I'd like to
> say "natural machine word" to eliminate the word tearing and non-atomic memory
> access concerns. I'm not sure it's safe to go back to the original topic, but
> what the heck ;)
>
> If you stick to a "natural machine word" that is declared as "volatile",
> you do not absolutely need a mutex (in fact I've done it). Of course, there are
> only certain cases where this works and shouldn't be done unless you really know
> your hardware architecture and what you're doing! If you have a machine with a
> lot of processors, unnecessarily locking mutexes can really kill parallelism.

Sigh. The volatile keyword as defined in the C and upcoming C++
standard is specified to prevent certain optimizations that would
invalidate the intended semantics of single threaded programs. If the
compiler doesn't employ these optimizations, it is allowed to ignore
volatile altogether. There are no guarantees with respect to
visibility, concurrency (not surprising since that is outside the
scope of the language) and data layout. If I'm not mistaken,
atomicity is also not specified by volatile. If a particular compiler
gives you some, fine. But you've just written an unportable program,
period.

This is different in Java where volatile is more tightly specified,
mainly because the concept of threads is part of the language.

[...]

> Of course, this assumes the hardware has some sort of cache coherency
> mechanism. But I don't believe POSIX mutex's or memory barriers (as
> defined for the DEC alpha) have any impact on cache coherency.

As has already been explained, cache coherency, atomicity and memory
order are mostly orthogonal when designing memory architectures. For
example in an SMP without any cache, you don't have to deal with cache
coherency, yet you're still left with the other two issues.

> The example is simplistic, but it should work on a vast majority of
> systems. In fact the stop_flag could just as easily be a counter
> of some sort as long as only one thread is modifying the counter...

Depends on your definition of "vast majority" and the expected program
lifetime. I'm fairly sure it works with many, if not all compilers
targeted for embedded systems, as these tend to be more aware of such
issues (extending the specifications of the C standard and sometimes
even slightly altering them). I wouldn't bet a dime if you tried this
on, say, an Origin2000.

If your example was valid (that is you could hold the mutex
indefinitely and still guarantee visibility of _every_ change in the
right order), I would expect that a good quality implementation
optimized mutex locking in a way that release&relock by the same
thread would essentially become a NOP. Thus your performance concerns
become moot without giving up correctness and portability.

> P.S. Dave, you certainly know what you're doing. I've been looking
> for a GOOD book on Posix threads. I may have to check your's out :)
> So far I've been making due on a pretty lame book, info I pulled from
> the SGI web site, system header files....and 9 years experience of
> rolling my own embedded system OS's (the most valuable of all).

I heartily recommend Dave's book. It's one of the books that not only
combines a comprehensive introduction and reference, but is also fun
to read. BTW, Dave, who's been doing the illustrations? I really
liked them.

Achim Gratz.

--+<[ It's the small pleasures that make life so miserable. ]>+--
WWW: http://www.inf.tu-dresden.de/~ag7/{english/}
E-Mail: gr...@ite.inf.tu-dresden.de
Phone: +49 351 463 - 8325

Bryan O'Sullivan

unread,

Mar 5, 1998, 3:00:00 AM3/5/98

to

t> Actually, you can be pretty sure that a compiler will split two
t> declarations like:
t> char dataA;
t> char dataB;
t> to be in two separate natural machine words.

I'm getting ticked off at the repeated bogus assertions you make that
you obviously haven't bothered to either think about or check.

Precisely all you can be sure of, in the case above, is that some
compilers will do as you say, and some won't; this isn't a very
helpful conclusion.

t> The mutex still doesn't guarantee that the compiler hasn't cached
t> the data in an internal register across a function call.

While this is trivially true, any systems that actually do this may
reasonably considered to be broken for multithreaded programming,
because they are defeating the use of mutexes. In particular, it is
usually the case that only local variables are kept in registers
across function calls, because keeping non-local variables in
registers isn't safe unless you have inter-module optimisation enabled
across most or all modules (not many compilers even let you do this).

t> If you stick to a "natural machine word" that is declared as
t> "volatile", you do not absolutely need a mutex (in fact I've done
t> it).

t> I'll give one example where this might be used: [example deleted]

Both the advice and the example you give here are bad, insofar as:

- You can't know what size a "natural machine word" is without
resorting to some kind of magic. If you think this isn't a problem,
consider all the software that has been moved from 32-bit to 64-bit
platforms during the past four years or so, and all that will be
moved during the next decade or two.

- The "volatile" keyword is not your friend when you are using
threads, no matter what you may think. It's relatively safe for
twiddling memory-mapped device registers, but don't expect it to do
what you want on a multiprocessor.

- The only reason your example works is that you only have a single
writer to the stop_flag variable, and it very likely doesn't matter
if it takes a few extra tens or hundreds of microseconds for the
reader to notice that the flag has been set.

If you introduce a case that is even the tiniest bit more complex,
such as having multiple writers trying to do anything other than set
a flag that will remain set in perpetuity, or perhaps a situation in
which it is important that the setting of the flag be detected ASAP,
you leave yourself screwed, especially if you expect your software
to live long enough that someone who doesn't know what you were
thinking ends up maintaining your code.

t> The example is simplistic, [...]

Which is part of why it is misleading and dangerous, because it
doesn't generalise in any particularly useful way. Real-world
software tends to have a bit more meat on its bones, and working from
simplistic axioms gets you a nice steaming cup of mocha, if and only
if you happen to also have two dollars to spare.

<b

--
Let us pray:
What a Great System.

Please Do Not Crash.
^G^IP@P6

Bryan O'Sullivan

unread,

Mar 5, 1998, 3:00:00 AM3/5/98

to

I wrote:

b> - The only reason your example works is [...]

I forgot to mention that, of course, the example isn't even guaranteed
to work. It will work on most current architectures that I know of,
and probably the majority of same for the foreseeable future, but if
you use some kind of wacky distributed shared memory scheme (as at
least one vendor currently does), all bets are off.

Casper H.S. Dik - Network Security Engineer

unread,

Mar 5, 1998, 3:00:00 AM3/5/98

to

[[ PLEASE DON'T SEND ME EMAIL COPIES OF POSTINGS ]]

Tim Beckmann <tbec...@sd.cybernex.net> writes:

>Actually, you can be pretty sure that a compiler will split two declarations
>like:
> char dataA;
> char dataB;
>to be in two separate natural machine words. It is much faster and easier for
>those RISC processors to digest. However if you declare something as:

Are you sure? When people use "natural" aligment, often these
two will end up on adjacent byte addresses.

(E.g., the SPARC ABI requires that behaviour)

And while you think that these two are more easily digestable for
RISC processors, being able to load both with on instruction and do
some intra instruction parallelism is also a thing to consider.

Bitfields are also a problem, of course.

Tim Beckmann

unread,

Mar 5, 1998, 3:00:00 AM3/5/98

to

Achim Gratz wrote:
>
> Sigh. The volatile keyword as defined in the C and upcoming C++
> standard is specified to prevent certain optimizations that would
> invalidate the intended semantics of single threaded programs. If the
> compiler doesn't employ these optimizations, it is allowed to ignore
> volatile altogether. There are no guarantees with respect to
> visibility, concurrency (not surprising since that is outside the
> scope of the language) and data layout. If I'm not mistaken,
> atomicity is also not specified by volatile. If a particular compiler
> gives you some, fine. But you've just written an unportable program,
> period.

Sigh right back. One of those optimizations is that it prevents the
compiler from cacheing a variable in an internal register. Something
that can easily break threaded code.

> [...]

> > Of course, this assumes the hardware has some sort of cache coherency
> > mechanism. But I don't believe POSIX mutex's or memory barriers (as
> > defined for the DEC alpha) have any impact on cache coherency.
>

> As has already been explained, cache coherency, atomicity and memory
> order are mostly orthogonal when designing memory architectures. For
> example in an SMP without any cache, you don't have to deal with cache
> coherency, yet you're still left with the other two issues.

And your point is??? I can only respond with a "no duh".

> > The example is simplistic, but it should work on a vast majority of
> > systems. In fact the stop_flag could just as easily be a counter
> > of some sort as long as only one thread is modifying the counter...
>

> Depends on your definition of "vast majority" and the expected program
> lifetime. I'm fairly sure it works with many, if not all compilers
> targeted for embedded systems, as these tend to be more aware of such
> issues (extending the specifications of the C standard and sometimes
> even slightly altering them). I wouldn't bet a dime if you tried this
> on, say, an Origin2000.

I'd bet as much as you care to. My primary target is Origin 2000's.
This has worked without a flaw on Origin 2000's with 28 processors
working on the same job.

> If your example was valid (that is you could hold the mutex
> indefinitely and still guarantee visibility of _every_ change in the
> right order), I would expect that a good quality implementation
> optimized mutex locking in a way that release&relock by the same
> thread would essentially become a NOP. Thus your performance concerns
> become moot without giving up correctness and portability.

Huh? I said nothing about holding a mutex indefinitely. I just checked
the assembly code generated for a pthread mutex lock/unlock on an O2000.
It generates a function call, which is certainly not a NOP.

> I heartily recommend Dave's book. It's one of the books that not only
> combines a comprehensive introduction and reference, but is also fun
> to read. BTW, Dave, who's been doing the illustrations? I really
> liked them.

I will.

Tim

Tim Beckmann

unread,

Mar 5, 1998, 3:00:00 AM3/5/98

to

Bryan O'Sullivan wrote:
>
> t> Actually, you can be pretty sure that a compiler will split two
> t> declarations like:
> t> char dataA;
> t> char dataB;
> t> to be in two separate natural machine words.
>
> I'm getting ticked off at the repeated bogus assertions you make that
> you obviously haven't bothered to either think about or check.
>
> Precisely all you can be sure of, in the case above, is that some
> compilers will do as you say, and some won't; this isn't a very
> helpful conclusion.

That's why I said "you can be pretty sure". You need to check the code
generated by your compiler if you don't know how it will act.

> t> The mutex still doesn't guarantee that the compiler hasn't cached
> t> the data in an internal register across a function call.
>
> While this is trivially true, any systems that actually do this may
> reasonably considered to be broken for multithreaded programming,
> because they are defeating the use of mutexes. In particular, it is
> usually the case that only local variables are kept in registers
> across function calls, because keeping non-local variables in
> registers isn't safe unless you have inter-module optimisation enabled
> across most or all modules (not many compilers even let you do this).

I agree, but you a bit of a hypocrit (spelled wrong I'm sure) here. You
make the assumption that a compiler won't cache global variables
across function calls and "it is usually the case"... and you get ticked
when I say "you can be pretty sure".... Whatever...

> t> If you stick to a "natural machine word" that is declared as
> t> "volatile", you do not absolutely need a mutex (in fact I've done
> t> it).
>
> t> I'll give one example where this might be used: [example deleted]
>
> Both the advice and the example you give here are bad, insofar as:
>
> - You can't know what size a "natural machine word" is without
> resorting to some kind of magic. If you think this isn't a problem,
> consider all the software that has been moved from 32-bit to 64-bit
> platforms during the past four years or so, and all that will be
> moved during the next decade or two.

You can. As was pointed out in another post, you can use sig_atomic_t.

> - The "volatile" keyword is not your friend when you are using
> threads, no matter what you may think. It's relatively safe for
> twiddling memory-mapped device registers, but don't expect it to do
> what you want on a multiprocessor.

My experience indicates otherwise.

> - The only reason your example works is that you only have a single
> writer to the stop_flag variable, and it very likely doesn't matter
> if it takes a few extra tens or hundreds of microseconds for the
> reader to notice that the flag has been set.

As I had stated in my post if you bothered to read it all... I quote:
" as long as only one thread is modifying" The few extra tens or
hundreds of microseconds for the reader to notice do not matter one
bit. Even with a mutex it will take at least that long since the
reader thread unlikely to be sitting there waiting for it. And even
if the reader thread were blocked on the mutex, it would likely take
milliseconds or longer before the thread is started again by the
scheduler.

> If you introduce a case that is even the tiniest bit more complex,
> such as having multiple writers trying to do anything other than set
> a flag that will remain set in perpetuity, or perhaps a situation in
> which it is important that the setting of the flag be detected ASAP,
> you leave yourself screwed, especially if you expect your software
> to live long enough that someone who doesn't know what you were
> thinking ends up maintaining your code.

And that's why I said:
"there are
only certain cases where this works and shouldn't be done unless you
really know
your hardware architecture and what you're doing!"

again if you bothered to read a bit more closely. Although I did forget
to mention that such things should be carefully commented for anyone
who has to maintain the code. But that should go without saying.

> t> The example is simplistic, [...]
>
> Which is part of why it is misleading and dangerous, because it
> doesn't generalise in any particularly useful way. Real-world
> software tends to have a bit more meat on its bones, and working from
> simplistic axioms gets you a nice steaming cup of mocha, if and only
> if you happen to also have two dollars to spare.

True. A beginner certainly shouldn't be trying to extrapolate the
simple case to a more complex case. But it was late and I needed to
go to bed so I didn't feel up to coming up with a more complex
example :)

Later,
Tim

Dave Butenhof

unread,

Mar 5, 1998, 3:00:00 AM3/5/98

to

David Holmes wrote:

> A slightly deeper, but very interesting, read is the first part of Chapter
> 5 from the Alpha Architecture Handbook V3. It discusses many of these
> issues, talks about atomic access (the Alpha 21164 can optionally support
> atomic byte access), cache issues, ordering issues and memory barrier
> instructions.
> http://ftp.digital.com/pub/Digital/info/semiconductor/literature/dsc-library
> html

Absolutely. Alpha is a good place to start, because the memory architecture is
particularly "aggressively RISC". You can't take anything for granted, or
you'll get zapped. This is one of the important reasons why it's the fastest.

And, of course, to generalize David's advice, (have you ever noticed that
"David"s almost always give good advice?), if you're planning to do any form of
concurrent communication without using a mutex or semphore, read, and be sure
you understand, the architecture book for each and every machine you plan to
support. If you can't or won't do that, use a mutex -- otherwise you might as
well be coding deliberate bugs.

Casper H.S. Dik - Network Security Engineer

unread,

Mar 5, 1998, 3:00:00 AM3/5/98

to

[[ PLEASE DON'T SEND ME EMAIL COPIES OF POSTINGS ]]

Tim Beckmann <tbec...@edcmail.cr.usgs.gov> writes:

>I agree, but you a bit of a hypocrit (spelled wrong I'm sure) here. You
>make the assumption that a compiler won't cache global variables
>across function calls and "it is usually the case"... and you get ticked

>when I say "you can be pretty sure".... Whatever...

The C standard doesn't allow caching global variables in registers
across function calls, *unless* it knows that the specific function
call won't change the global variable.

Usually, it can't know that, so it will need to do a store
before the function call and reload afterwards.

Tim Beckmann

unread,

Mar 5, 1998, 3:00:00 AM3/5/98

to

Casper H.S. Dik - Network Security Engineer wrote:
> The C standard doesn't allow caching global variables in registers
> across function calls, *unless* it knows that the specific function
> call won't change the global variable.
>
> Usually, it can't know that, so it will need to do a store
> before the function call and reload afterwards.

Or if you use an intel CPU you don't keep anything in registers across
function calls since it lacks any extra registers ;)

What about non-global, non-local variables? i.e. pointers to structures
passed into a routine as follows:

routine_foo(struct *param_ptr)
{
do something that loads param_ptr->member

call a function

do something else that uses the same param_ptr->member
}

Are there any rules in the C standard that apply to param_ptr->member in
respect to caching it in a register? Does the standard specify whether
param_ptr->member must be reloaded or is the compiler free to use a
cached register?

Thanks
Tim

Bryan O'Sullivan

unread,

Mar 5, 1998, 3:00:00 AM3/5/98

to

t> What about non-global, non-local variables?

All non-local variables are treated in the same way for these
purposes.

David Holmes

unread,

Mar 6, 1998, 3:00:00 AM3/6/98

to

Tim Beckmann <tbec...@edcmail.cr.usgs.gov> wrote in article

<34FEAD...@edcmail.cr.usgs.gov>...

> > - The "volatile" keyword is not your friend when you are using
> > threads, no matter what you may think. It's relatively safe for
> > twiddling memory-mapped device registers, but don't expect it to do
> > what you want on a multiprocessor.
>
> My experience indicates otherwise.

Tim, don't confuse two different issues here. Firstly at the start of all
this you gave an example where you stated that without using volatile the
example was broken because the compiler cached the value of the flag in a
register. It's debatable whether the compiler was being overly aggressive
in its optimisation here, but that's not the point. You faced an aggressive
compiler and beat it using volatile - fine.

*BUT* that does not mean that volatile is any kind of useful solution for
multi-threaded programming in general. As has been pointed out all volatile
does is suggest to the compiler not to make certain assumptions about the
value of the variable - and such suggestions can be ignored. volatile has
no meaning in terms of store ordering or cache coherency and thus is not a
useful tool in general in multi-processing environments.

To go to your current argument: yes on some given machine, some given
scenario will not require explicit synchronisation. But the construct is so
fragile in general, and of limited significance performance wise, that even
if you can prove that it is 'safe' its generally not worth the effort.

David

Tim Beckmann

unread,

Mar 6, 1998, 3:00:00 AM3/6/98

to

David Holmes wrote:
>
> Tim, don't confuse two different issues here. Firstly at the start of all
> this you gave an example where you stated that without using volatile the
> example was broken because the compiler cached the value of the flag in a
> register. It's debatable whether the compiler was being overly aggressive
> in its optimisation here, but that's not the point. You faced an aggressive
> compiler and beat it using volatile - fine.
>
> *BUT* that does not mean that volatile is any kind of useful solution for
> multi-threaded programming in general. As has been pointed out all volatile
> does is suggest to the compiler not to make certain assumptions about the
> value of the variable - and such suggestions can be ignored. volatile has
> no meaning in terms of store ordering or cache coherency and thus is not a
> useful tool in general in multi-processing environments.

I never stated or meant to imply that volatile could be used as a
general
solution to multi-threaded programming. Only that it could be useful.

> To go to your current argument: yes on some given machine, some given
> scenario will not require explicit synchronisation. But the construct is so
> fragile in general, and of limited significance performance wise, that even
> if you can prove that it is 'safe' its generally not worth the effort.

That's all I've been trying to get across - you can at times "get away
with"
not using explicit synchronization if you know your hardware and
compiler
well enough. The original post that started this all asked whether you
could... and you can if you are careful. I do agree that in general you
shouldn't do without the explicit synchronization. However, you can see
significant performance improvements by skirting the explicit
synchronization
mechanism in some cases. (email if you want an example of where this has
been the case for me).

Thanks for you input! If I remember correctly you were the individual
who
posted the link to the DEC alpha info on memory barriers. That was
quite
helpful.

Tim

Dave Butenhof

unread,

Mar 9, 1998, 3:00:00 AM3/9/98

to

mma...@dazel.com wrote:

> Dave Butenhof <bute...@zko.dec.com> wrote:
> > There are, of course, no absolute guarantees. If you want to be safe and
> > portable, you might do well to have a config header that typedefs
> > "smallest_safe_data_unit_t" to whatever's appropriate for the platform. Then
> > it's just a quick trip to the hardware reference manual when you start a
> port.
> > On a CISC, you can probably use "char". On most RISC systems, you should use
> > "int" or "long".
>

> If I'm not mistaken, isn't that spelled:
>
> #include <signal.h>
>
> typedef sig_atomic_t smallest_safe_data_unit_t;

You are not mistaken, and thank you very much for pointing that out. While I'd
been aware at some point of the existence of that type, it was far from the top
of my mind.

If you have data that you intend to share without explicit synchronization, you
should be safe in using sig_atomic_t. Additionally, using sig_atomic_t will
protect you against word tearing in adjacent data protected by separate mutexes.

There are additional performance considerations, such as "false sharing" effects
in cache systems, that might dictate larger separations between two shared pieces
of data: but those won't affect program CORRECTNESS, and are therefore more a
matter of tuning for optimal performance on some particular platform.

Dave Butenhof

unread,

Mar 9, 1998, 3:00:00 AM3/9/98

to

David Holmes wrote:

> Dave Butenhof <bute...@zko.dec.com> wrote in article
> <34FD6950...@zko.dec.com>...
> > If you've got
> >
> > pthread_mutex_t mutexA = PTHREAD_MUTEX_INITIALIZER;
> > pthread_mutex_t mutexB = PTHREAD_MUTEX_INITIALIZER;
> >
> > char dataA;
> > char dataB;
> >
> > And one thread locks mutexA and writes dataA while another locks mutexB
> and
> > writes dataB, you risk word tearing, and incorrect results. That's a
> "platform
> > issue", that, as someone else commented, POSIX doesn't (and can't)
> address.
>
> That's a pretty serious impediment to threaded programming. How do you know
> how things have been allocated? Isn't the compiler free to re-arrange data
> declarations to 'optimise' alignment etc? Putting the data and mutex
> together in a struct might work for C, but what about C++ where the data
> and mutex are already within the object? Do you need to put structs within
> the class as well?

You need to avoid word-tearing. You should avoid sharing "potentially
adjacent" data of a size smaller than sig_atomic_t. Is that a serious
imediment to threaded programming? Sure, if you're dependent on using
arbitrary data sizes in arbitrary arrangements, and want your code to be
portable. On the other hand, if that's your most serious concern in porting
between differing machine architectures, then your code is probably pretty
clean -- dealing with this shouldn't be a big deal, and it'll serve to improve
your portability even further. (Surely a worthwhile goal?)

> When your average programmer sits down at their SparcWorkstation to write a
> neat little POSIX pthreads program, how on earth are they supposed to know
> about this? This isn't mentioned in your book Dave, nor in others that
> discuss using threads on Solaris.

It's in my section on "Memory visibility between threads". Or at least, I
discuss what "word tearing" means, and why it's of concern. You are, however,
correct in pointing out that I failed to warn the reader that it could be a
problem even WITH mutexes, if the separate data that share "memory access
logic" are protected by separate mutexes. Well, sorry. Even I'm not perfect
all the time! (As if anyone really needed proof of that statement.) I'll make
a note of this for the next time I update the text. (I'll also add a mention
of Mike Martin's fine advice to make use of the ANSI C "atomic data type",
sig_atomic_t, which will avoid word tearing.)

Dave Butenhof

unread,

Mar 9, 1998, 3:00:00 AM3/9/98

to

> I heartily recommend Dave's book. It's one of the books that not only
> combines a comprehensive introduction and reference, but is also fun
> to read. BTW, Dave, who's been doing the illustrations? I really
> liked them.
>

Thanks for the recommendation. Especially the "fun to read" part,
because that was an important goal!

Unfortunately, I don't know who did the illustrations. I never even
directly corresponded with the person responsible, nor got a name
through Addison-Wesley's art department. (It probably would have been
easier if we could have talked directly -- it took a few iterations
through the intermediaries to get drawings that showed what I wanted to
show.)

Dave Butenhof

unread,

Mar 9, 1998, 3:00:00 AM3/9/98

to

Tim Beckmann wrote:

> Dave Butenhof wrote:
> > > David,
> > >
> > > My thoughts exactly!
> > >
> > > Does anyone know of a mainstream architecture that does this sort of
> > > thing?
> >
> > Oh, absolutely. SPARC, MIPS, and Alpha, for starters. I'll bet most other RISC
> > systems do it, too, because it substantially simplifies the memory subsystem
> > logic. And, after all, the whole point of RISC is that simplicity means speed.
>
> MIPS I know :) The latest MIPS processors R10K and R5K are byte addressable.
> The whole point of RISC is simplicity of hardware, but if it makes the software
> more complex it isn't worth it :)

The whole idea of RISC is *exactly* to make software more complex. That is, by
simplifying the hardware, hardware designers can produce more stable designs that
can be produced more quickly and with more advanced technology to result in faster
hardware. The cost of this is more complicated software. Most of the complexity is
hidden by the compiler -- but you can't necessarily hide everything. Remember that
POSIX took advantage of some loopholes in the ANSI C specification around external
calls to proclaim that you can do threaded programming in C without requiring
expensive and awkward hacks like "volatile". Still, the interpretation of ANSI C
semantics is stretched to the limit. The situation would be far better if a future
version of ANSI C (and C++) *did* explicitly recognize the requirements of threaded
programming.

> > If you stick to int or long, you'll probably be safe. If you use anything
> > smaller, be sure they're not allocated next to each other unless they're under
> > the same lock.
>
> Actually, you can be pretty sure that a compiler will split two declarations
> like:
> char dataA;
> char dataB;
> to be in two separate natural machine words. It is much faster and easier for
> those RISC processors to digest. However if you declare something as:

While that's certainly possible, that's just a compiler optimization strategy. You
shouldn't rely on it unless you know FOR SURE that YOUR compiler does this.

> char data[2]; /* or more than 2 */
> you have to be VERY concerned with the effects of word tearing since the
> compiler will certainly pack them into a single word.

Yes, this packing is required. You've declared an array of "char" sized data, so
each array element had better be allocated exactly 1 char.

> > I wrote a long post on most of the issues brought up in this thread, which
> > appears somewhere down the list due to the whims of news feeds, but I got
> > interrupted and forgot to address this issue.
>
> Yep, I saw it. It was helpful. So was the later post by someone else who
> included a link to a DEC alpha document that explained what a memory barrier
> was in this context. I've seen three different definitions over the years.
> The definition you described in your previous post agreed with the DEC alpha
> description... That a memory barrier basically doesn't allow out of order
> memory accesses to cross the barrier. A very important issue if you are

> implementing mutexes or semaphores :)[...]

>
> However, I really believe that dataA and dataB should both be declared as
> "volatile" to prevent the compiler from being too aggressive on it's
> optimization. The mutex still doesn't guarantee that the compiler hasn't
> cached the data in an internal register across a function call. My memory
> isn't perfect, but I do think this bit me on IRIX.

The existence of the mutex doesn't require this, but the semantics of POSIX and
ANSI C do require it. Remember that you lock a mutex by calling a function, passing
an address. While an extraordinarily aggressive C compiler with a global analyzer
might be able to determine reliably that there's no way that call could access the
data you're trying to protect, such a compiler is unlikely -- and, if it existed,
it would simply violate POSIX 1003.1-1996, failing to support threads.

You do NOT need volatile for threaded programming. You do need it when you share
data between "main code" and signal handlers, or when sharing hardware registers
with a device. In certain restricted situations, it MIGHT help when sharing
unsynchronized data between threads (but don't count on it -- the semantics of
"volatile" are too fuzzy). If you need volatile to share data, protected by POSIX
synchronization objects, between threads, then your implementation is busted.

> > There are, of course, no absolute guarantees. If you want to be safe and
> > portable, you might do well to have a config header that typedefs
> > "smallest_safe_data_unit_t" to whatever's appropriate for the platform. Then
> > it's just a quick trip to the hardware reference manual when you start a port.
> > On a CISC, you can probably use "char". On most RISC systems, you should use
> > "int" or "long".
>
> There never are guarantees are there :)

To reiterate again one more time, ( ;-) ), the correct (ANSI C) portable type for
atomic access is sig_atomic_t.

> > Yes, this is one more complication to the process of threading old code. But
> > then, it's nothing compared to figuring out which data is shared and which is
> > private, and then getting the locking protocols right.
>
> But what fun would it be if it wasn't a challenge :)

Well, yeah. That's my definition of "fun". But not everyone's. Sometimes "boring
and predictable" can be quite comforting.

> However, I would like to revist the original topic of whether it is "safe" to
> change a single byte without a mutex. Although, instead of "byte" I'd like to
> say "natural machine word" to eliminate the word tearing and non-atomic memory
> access concerns. I'm not sure it's safe to go back to the original topic, but
> what the heck ;)

sig_atomic_t.

> If you stick to a "natural machine word" that is declared as "volatile",
> you do not absolutely need a mutex (in fact I've done it). Of course, there are
> only certain cases where this works and shouldn't be done unless you really know
> your hardware architecture and what you're doing! If you have a machine with a
> lot of processors, unnecessarily locking mutexes can really kill parallelism.
>
> I'll give one example where this might be used:
>
> volatile int stop_flag = 0; /* assuming an int is atomic */
>
> thread_1
> {
> /* bunch of code */
>
> if some condition exists such that we wish to stop thread_2
> stop_flag = 1;
>
> /* more code - or not :) */
> }
>
> thread_2
> {
> while(1)
> {
> /* check if thread should stop */
> if (stop_flag)
> break;
>
> /* do whatever is going on in this loop */
> }
> }
>
> Of course, this assumes the hardware has some sort of cache coherency
> mechanism. But I don't believe POSIX mutex's or memory barriers (as
> defined for the DEC alpha) have any impact on cache coherency.

If a machine has a cache, and has no mechanism for cache coherency, then it can't
work as a multiprocessor.

> The example is simplistic, but it should work on a vast majority of
> systems. In fact the stop_flag could just as easily be a counter
> of some sort as long as only one thread is modifying the counter...

In some cases, yes, you can do this. But, especially with your "stop_flag",
remember that, if you fail to use a mutex (or other POSIX-guaranteed memory
coherence operation), a thread seeing stop_flag set CANNOT assume anything about
other program state. Nor can you ensure that any thread will see the changed value
of stop_flag in any particular bounded time -- because you've done nothing to
ensure memory ordering, or coherency.

And remember very carefully that bit about "as long as only one thread is
modifying". You cannot assume that "volatile" will ever help you if two threads
might modify the counter at the same time. On a RISC machine, "modify" still means
load, modify, and store, and that's not atomic. You need special instructions to
protect atomicity across that sequence (e.g., load-lock/store-conditional, or
compare-and-swap).

Am I trying to scare you? Yeah, sure, why not? If you really feel the need to do
something like this, do yourself (and your project) the courtesy of being EXTREMELY
frightened about it. Document it in extreme and deadly detail, and write that
documentation as if you were competing with Stephen King for "best horror story of
the year". I mean to the point that if someone takes over the project from you, and
doesn't COMPLETELY understand the implications, they'll be so terrified of the risk
that they'll rip out your optimizations and use real synchronization. Because this
is just too dangerous to use without full understanding.

There are ways to ensure memory ordering and coherency without using any POSIX
synchronization mechanisms, on any machine that's capable of supporting POSIX
semantics. It's just that you need to be really, really careful, and you need to be
aware that you're writing extremely machine-specific (and therefore inherently
non-portable) code. Some of this is "more portable" than others, but even the
"fairly portable" variants (like your stop_flag) are subject to a wide range of
risks. You need to be aware of them, and willing to accept them. Those who aren't
willing to accept those risks, or don't feel inclined to study and fully understand
the implications of each new platform to which they might wish to port, should
stick with mutexes.

David Holmes

unread,

Mar 10, 1998, 3:00:00 AM3/10/98

to

Dave Butenhof <bute...@zko.dec.com> wrote in article

<3503E333...@zko.dec.com>...

> It's in my section on "Memory visibility between threads". Or at least, I
> discuss what "word tearing" means, and why it's of concern. You are,
however,
> correct in pointing out that I failed to warn the reader that it could be
a
> problem even WITH mutexes, if the separate data that share "memory access
> logic" are protected by separate mutexes.

I didn't mean to pick on your book Dave - I enjoyed your book very much and
found it very informative. You tend to cover more on these issues than
others. However, though you discuss what word-tearing means you don't
discuss that it can bite you even if you use mutexes *nor* if you have
adjacent data items that are not shared between threads. That last part is
important. It's bad enough that someone doing the right thing and using
mutexes to protect shared data can be bitten by this, but its even worse
that corruption can occur of data that is not shared between threads.

Consider an example where someone creates a Thread manager object. This
object is responsible for managing a group of worker threads, each of which
waits for work to appear on a work queue. To gather some statistics about
the work distribution each thread has a counter it updates whenever it gets
a new job. Those counters are maintained by the thread manager as an array
of bytes (anything less than the atomic access unit), where each thread
knows its own index into the array. No data is shared between threads, yet
on a system that allows word-tearing, this setup can fail. How does the
poor old programmer debug this? They've just learnt about threads from the
latest "intro to threads" book and are now completely flummoxed. They seem
to have followed all the rules and yet it didn't work (they even try
sticking in a few mutexes but that doesn't help). Tough luck ??

Maybe it's just me but I find this whole situation rather odd. There is
apparently a real possibility of word-tearing on these systems, yet no one
seems to be complaining that they've run into this problem, nor does it
seem to be sufficiently a problem that it's been well documented. Is it
really no big deal and if not, why not?

As an educator I am quite puzzled.

I'd also be interested in any more information on sig_atomic_t. I have
never heard of this as part of ANSI-C, yet there it is in signal.h. If
anyone could point me to information on how this type was defined, its
purpose and why it is in signal.h, I'd appreciate it. Thanks.

David

Mike Martin

unread,

Mar 10, 1998, 3:00:00 AM3/10/98

to

David Holmes wrote in message <01bd4c8e$82d9c9e0$1bf56f89@dholmes>...

>David Holmes <dho...@mri.mq.edu.au> wrote in article

><01bd4c74$d576ec60$1bf56f89@dholmes>...

>> I'd also be interested in any more information on sig_atomic_t. I have
>> never heard of this as part of ANSI-C, yet there it is in signal.h. If
>> anyone could point me to information on how this type was defined, its
>> purpose and why it is in signal.h, I'd appreciate it. Thanks.
>

>To answer my own question here's an extract from the ANSI 'C' rationale:
>(http://www.lysator.liu.se/c/rat/title.html)
>
>2.2.3 Signals and interrupts
>
>Signals are difficult to specify in a system-independent way. The
>Committee concluded that about the only thing a strictly conforming program
>can do in a signal handler is to assign a value to a volatile static
>variable which can be written uninterruptedly and promptly return. (The
>header <signal.h> specifies a type sig_atomic_t which can be so written.)
>It is further guaranteed that a signal handler will not corrupt the
>automatic storage of an instantiation of any executing function, even if
>that function is called within the signal handler.

It should be noted, however, that the rationale is not officially part of
the standard and the standard is in fact vague on the guarantees made by
sig_atomic_t. The main body of the standard merely says that such a value
can be "accessed" atomically.

In practice, "access" means that you can atomically write a value to the
object by assignment (as the rationale states), and atomically read the
value through direct reference, but nothing else. Most notably, you
can *not* use any of the increment or decrement operators to update the
value, nor is there any other way (safely) to update the value.

Regards,

Mike Martin
mma...@dazel.com

David Holmes

unread,

Mar 11, 1998, 3:00:00 AM3/11/98

to

David Holmes <dho...@mri.mq.edu.au> wrote in article
<01bd4c74$d576ec60$1bf56f89@dholmes>...
> I'd also be interested in any more information on sig_atomic_t. I have
> never heard of this as part of ANSI-C, yet there it is in signal.h. If
> anyone could point me to information on how this type was defined, its
> purpose and why it is in signal.h, I'd appreciate it. Thanks.

To answer my own question here's an extract from the ANSI 'C' rationale:
(http://www.lysator.liu.se/c/rat/title.html)

2.2.3 Signals and interrupts

Signals are difficult to specify in a system-independent way. The
Committee concluded that about the only thing a strictly conforming program
can do in a signal handler is to assign a value to a volatile static
variable which can be written uninterruptedly and promptly return. (The
header <signal.h> specifies a type sig_atomic_t which can be so written.)
It is further guaranteed that a signal handler will not corrupt the
automatic storage of an instantiation of any executing function, even if
that function is called within the signal handler.

David

Joe Seigh

unread,

Mar 11, 1998, 3:00:00 AM3/11/98

to

Strictly speaking, this doesn't solve the granularity of updates at
all. sig_atomic_t is only defined with respect to signals and not
with respect to threads or multi-processing. We try to be very
careful about what we say here, about what is guaranteed and
precisely under what conditions those guarantees hold. I suspect
that you'll find that there is nothing that states that a signal
implementation has to handle signals in a thread concurrent manner,
though it may have to be thread safe. So, it possible that for some
implementations, sig_atomic_t could be safe with respect to signals
but not with respect to threads.

Joe Seigh

mma...@dazel.com

unread,

Mar 11, 1998, 3:00:00 AM3/11/98

to

se...@bose.com (Joe Seigh) wrote:
> Strictly speaking, this doesn't solve the granularity of updates at
> all. sig_atomic_t is only defined with respect to signals and not
> with respect to threads or multi-processing. We try to be very
> careful about what we say here, about what is guaranteed and
> precisely under what conditions those guarantees hold. I suspect
> that you'll find that there is nothing that states that a signal
> implementation has to handle signals in a thread concurrent manner,
> though it may have to be thread safe. So, it possible that for some
> implementations, sig_atomic_t could be safe with respect to signals
> but not with respect to threads.

Possible, but very unlikely. The language of the ANSI standard
states that sig_atomic_t:

"... is the integral type of an object that can be accessed
as an atomic entity, even in the presence of asynchronous
interrupts." (4.7:5)

There are no other statements tying it explicitly to the case of
signals. True, it lives under the umbrella of signals, since
they are the only source of asynchronous interrupt defined within
the standard. But "interrupts" and "signals" *are* discussed as
independent concepts.

Conforming implementations are constrained to produce code in
the "execution environment" that reads and writes a sig_atomic_t
atomically (uninterruptedly), and interruption is assumed to be
possible at any point, including between "sequence points" (2.1.2.3).
So access to a sig_atomic_t must be accomplished atomically, even
within that interim.

Therefore, such access must reduce to a single, uninterruptible
"instruction" to the target machine - where an instruction is the
quantum for interruption.

The two very unlikely scenarios where sig_atomic_t could be unsafe
would be:

1. The machine's interruption quanta for signals and threads are
different.

2. The compiler produces code which accesses sig_atomic_t objects
with multiple instructions, but protects all such code with:

I would argue that both scenarios would be contrary to the "spirit"
of the C standard.

Dave Butenhof

unread,

Mar 12, 1998, 3:00:00 AM3/12/98

to

David Holmes wrote:

> Dave Butenhof <bute...@zko.dec.com> wrote in article
> <3503E333...@zko.dec.com>...
> > It's in my section on "Memory visibility between threads". Or at least, I
> > discuss what "word tearing" means, and why it's of concern. You are,
> however,
> > correct in pointing out that I failed to warn the reader that it could be
> a
> > problem even WITH mutexes, if the separate data that share "memory access
> > logic" are protected by separate mutexes.
>
> I didn't mean to pick on your book Dave - I enjoyed your book very much and
> found it very informative. You tend to cover more on these issues than
> others. However, though you discuss what word-tearing means you don't
> discuss that it can bite you even if you use mutexes *nor* if you have
> adjacent data items that are not shared between threads. That last part is
> important. It's bad enough that someone doing the right thing and using
> mutexes to protect shared data can be bitten by this, but its even worse
> that corruption can occur of data that is not shared between threads.

Hey -- stop "not picking" on my book! Every time you do that, you point out
another omission. ;-)

Yes, you're right -- while this is NOT something that's directly related to
"threads", and your new point is even less direct than the first, I should
have thought to mention it. Sorry. You've now got two entries in my "Errata"
mail folder, OK? Satisfied? ;-)

> Maybe it's just me but I find this whole situation rather odd. There is
> apparently a real possibility of word-tearing on these systems, yet no one
> seems to be complaining that they've run into this problem, nor does it
> seem to be sufficiently a problem that it's been well documented. Is it
> really no big deal and if not, why not?

Speculation: it probably just doesn't happen that often.

> As an educator I am quite puzzled.

Good. We need more puzzled educators, who stop to think about these things.
(And take the time to explain their puzzlement.) I've never trusted educators
who accept too much without question. (And I hope you expect the same of your
students.)

> I'd also be interested in any more information on sig_atomic_t. I have
> never heard of this as part of ANSI-C, yet there it is in signal.h. If
> anyone could point me to information on how this type was defined, its
> purpose and why it is in signal.h, I'd appreciate it. Thanks.

The type exists, and it (and the issues it represents) are often overlooked in
discussing "thread issues", because the problem you've pointed out isn't
unique to threads. It occurs any time you may have asynchronous (not merely
parallel) access to memory. You can easily get that with shared memory between
processes, of course, but it may be less obvious that it happens with
asynchronous signals, too.

If your asynchronous signal handler writes data to memory, and the signal
occurred between the fetch and store of a memory update, to another variable
overlapping the same "memory unit", you've got... word tearing. Look, ma, no
threads. The standard says that, if you may access (and, in particular, write)
memory asynchronously, you must use this special type to avoid that sort of
problem.

Furthermore, even without word tearing, you should try to ensure that unshared
data (in threaded programs) doesn't get placed "close together". The cache
behavior known as "false sharing" can have a dramatic impact on the
performance of threaded code, and it's hard to isolate and measure. (You can
get some information with on-chip counters, but to really figure out what's
going on you need a good hardware-level simulator.) False sharing means that
you've written to a "cache line" (say, within the same 64-byte [or whatever]
chunk of address space) that's also held by another processor. This results in
cross-processor traffic to invalidate the other processor's cache, to prevent
it from reading obsolete data from its cache. Except that you haven't actually
touched data that's being USED by the other processor -- just unrelated data
that happens to be on the same cache line.

How does one avoid false sharing? You tune. And tune. And tune some more. The
cache behavior might vary even across members of a processor family --
specifically if the cache line size changes. You need to pad static data to
avoid the conflicts, and you just need to know by how much to pad them.

Joe Seigh

unread,

Mar 13, 1998, 3:00:00 AM3/13/98

to

I agree, possible but very unlikely. However I did explicity state
I was discussing this in the strict sense of things. In fact I
though I had overstated this point. But yes, given the constraints
of the language spec, having sig_atomic_t be atomic with respect
to threads would be the safest and most practical way to go.
Note, that sig_atomic_t wouldn't necessarily be the smallest
unit of atomicity for a particular platform as that might limit
the amount of information a signal handler could store. It
also might not be the largest if any consideration was given
to compatiblity with other implementations.

But remember, when you say "possible but very unlikely", aren't you
really saying you don't know of or can't think of any counter
examples. That's not proof. And isn't that the reason that this issue of side
effects of non atomic stores arose in the first place, namely
we didn't think that storing a byte could be thread unsafe because
we never imagined that there were computer architectures where
it wasn't safe?

Joe Seigh

Gilbert W Pilz, Jr.

unread,

Mar 13, 1998, 3:00:00 AM3/13/98

to

On Mon, 09 Mar 1998 08:14:24 -0500, Dave Butenhof
<bute...@zko.dec.com> wrote:

>You do NOT need volatile for threaded programming. You do need it when you share
>data between "main code" and signal handlers, or when sharing hardware registers
>with a device. In certain restricted situations, it MIGHT help when sharing
>unsynchronized data between threads (but don't count on it -- the semantics of
>"volatile" are too fuzzy). If you need volatile to share data, protected by POSIX
>synchronization objects, between threads, then your implementation is busted.

What about exception handling ? I've always thought that in something
like the following

int i;

TRY
{
<foodle with i>
proc();
}
CATCH_ALL
{
if (i > 0)

Gilbert W. Pilz Jr. gwp...@pobox.com
Systems Software Consultant www.scruz.net/~gwpilz/

Gilbert W Pilz, Jr.

unread,

Mar 13, 1998, 3:00:00 AM3/13/98

to

On Mon, 09 Mar 1998 08:14:24 -0500, Dave Butenhof
<bute...@zko.dec.com> wrote:

>You do NOT need volatile for threaded programming. You do need it when you share
>data between "main code" and signal handlers, or when sharing hardware registers
>with a device. In certain restricted situations, it MIGHT help when sharing
>unsynchronized data between threads (but don't count on it -- the semantics of
>"volatile" are too fuzzy). If you need volatile to share data, protected by POSIX
>synchronization objects, between threads, then your implementation is busted.

What about exception handlers ? I've always thought that when you had
code like:

int i;

TRY
{
<foodle with i>
. . .

proc();
}
CATCH_ALL
{
if (i > 0)

{
. . .
}
. . .
}

that you needed to declare "i" to be volatile least the code in the
catch block assume that "i" was stored in some register the contents
of which were overwritten by the call to "proc" (and not restored by
whatever mechanism was used to throw the exception).

Have I been wrong all this time ?

mma...@dazel.com

unread,

Mar 13, 1998, 3:00:00 AM3/13/98

to

se...@bose.com (Joe Seigh) wrote:
> I agree, possible but very unlikely. However I did explicity state
> I was discussing this in the strict sense of things. In fact I
> though I had overstated this point. But yes, given the constraints
> of the language spec, having sig_atomic_t be atomic with respect
> to threads would be the safest and most practical way to go.
> Note, that sig_atomic_t wouldn't necessarily be the smallest
> unit of atomicity for a particular platform as that might limit
> the amount of information a signal handler could store. It
> also might not be the largest if any consideration was given
> to compatiblity with other implementations.

Absolutely. sig_atomic_t is merely one such atomic type; not
necessarily the smallest or the largest on the platform.

> But remember, when you say "possible but very unlikely", aren't you
> really saying you don't know of or can't think of any counter
> examples. That's not proof.

Not really. I think what I'm saying is that, according to my
interpretation of the C standard, *any* counterexample would be
a violation of the intent, if not the language, of that standard.

It's what we over in the C language newsgroups often refer to as
a "quality of implementation" issue. Although it might arguably
conform, such an implementation would be universally perceived as
"broken".

Is it proof? No. But there is a preponderance of evidence.

Dave Butenhof

unread,

Mar 16, 1998, 3:00:00 AM3/16/98

to

Gilbert W Pilz, Jr. wrote:

> What about exception handlers ? I've always thought that when you had
> code like:
>
> int i;
>
> TRY
> {
> <foodle with i>
> . . .
> proc();
> }
> CATCH_ALL
> {
> if (i > 0)
> {
> . . .
> }
> . . .
> }
>
> that you needed to declare "i" to be volatile least the code in the
> catch block assume that "i" was stored in some register the contents
> of which were overwritten by the call to "proc" (and not restored by
> whatever mechanism was used to throw the exception).

Since neither ANSI C nor POSIX has any concept remotely resembling "exceptions", this
is all rather moot in the context of our general discussion, isn't it? I mean, it's
got nothing to do with sharing data between threads -- and that's what I thought we
were talking about. But sure, OK, let's digress.

Since there's no standard covering the behavior of anything that uses exceptions, (at
least, not if you use them from C, or even if you use the DCE exception syntax you've
chosen from C++), there's no portable behavior. Your fate is in the hands of the
whimsical (and hypothetically malicious ;-) ) implementation. This situation might
lead a cautious programmer to be unusually careful when treading in these waters, and
to wear boots with thick soles. (One might also say that it could depend on exactly
HOW you "foodle with i", but I'll choose to disregard an entire spectrum of mostly
amusing digressions down that fork.)

Should you use volatile in this case? Sure, why not? It might not be necessary on
many platforms. It might destroy your performance on any platform. And, where it is
necessary, it might not do what you want. But yeah, what the heck -- use it anyway.
It's more likely (by some small margin) to save you than kill you.

Or, even better... don't code cases like this!

David Holmes

unread,

Mar 17, 1998, 3:00:00 AM3/17/98

to

Dave Butenhof <bute...@zko.dec.com> wrote in article

<3507F7A3...@zko.dec.com>...

> Hey -- stop "not picking" on my book! Every time you do that, you point
out
> another omission. ;-)

Sorry. :-)

> Speculation: it probably just doesn't happen that often.

An unsettling answer though I can only speculate the same. An investigation
of the actual behaviour of different platforms would be interesting - but
not right
now. :)

As an aside. For those who thought Java puts the solution to this problem
onto the VM writers - it may not. The Java atomicity guarantees can be
satisfied without dealing with word tearing at all. As someone else wrote
earlier those guarantees only ensure that one thread will only ever read a
value written by another thread, not some partial mix of a number of
values. This doesn't preclude reading an old value. If synchronisation is
used then the visibility guarantees should preclude word-tearing as a lost
update of main memory is not allowed. However this doesn't address the case
of adjacent values used by different threads.

David

Douglas C. Schmidt

unread,

Mar 17, 1998, 3:00:00 AM3/17/98

to

++ Tim Beckmann <tbec...@edcmail.cr.usgs.gov> wrote in article
++ <34FEAD...@edcmail.cr.usgs.gov>...
++ > - The "volatile" keyword is not your friend when you are using
++ > threads, no matter what you may think. It's relatively safe for
++ > twiddling memory-mapped device registers, but don't expect it to do
++ > what you want on a multiprocessor.
++
++ My experience indicates otherwise.

Your experience has been very lucky thus far. But your luck will run
out if you fail to appreciate the semantics of volatile.

Doug
--
Dr. Douglas C. Schmidt (sch...@cs.wustl.edu)
Department of Computer Science, Washington University
St. Louis, MO 63130. Work #: (314) 935-4215; FAX #: (314) 935-7302
http://www.cs.wustl.edu/~schmidt/

Joe Seigh

unread,

Mar 18, 1998, 3:00:00 AM3/18/98

to

In article <01bd5150$6ac1c800$1bf56f89@dholmes>, "David Holmes" <dho...@mri.mq.edu.au> writes:
...

|> As an aside. For those who thought Java puts the solution to this problem
|> onto the VM writers - it may not. The Java atomicity guarantees can be
|> satisfied without dealing with word tearing at all. As someone else wrote
|> earlier those guarantees only ensure that one thread will only ever read a
|> value written by another thread, not some partial mix of a number of
|> values. This doesn't preclude reading an old value. If synchronisation is
|> used then the visibility guarantees should preclude word-tearing as a lost
|> update of main memory is not allowed. However this doesn't address the case
|> of adjacent values used by different threads.
|>
|> David

The java virtual machine implmentations would have to ensure that
data was laid out in manner safe for multi-threaded access. So
in the case of a byte or byte array on a machine with a word as
the smallest atomic store access, each byte would have to be in a
separate word or some serialization mechanism would have to be
provided for safe multi-threaded access. You have a design trade
off decision to make, space efficiency versus time efficiency.
This is what you have to do for java booleans which could be
implemented as bits. No architecture I know of provides thread
safe access to bits.

As as aside, there's this note the in java.util.BitSet docs which
indicates that they're aware of these issues with regard to bits
at least; "A BitSet is not synchronized against simultaneous
access by several threads. If one thread is accessing the BitSet
while another thread is modifying it, mixed results may occur.
The alternative is to clone the BitSet and operate on the clone.
A client may do that itself if it wants, but the basic data type
should be fast." They're allowing themselves complete latitude
on the BitSet implementation.

Joe Seigh

David Holmes

unread,

Mar 19, 1998, 3:00:00 AM3/19/98

to

Joe Seigh <se...@bose.com> wrote in article <1998Mar1...@bose.com>...

> The java virtual machine implmentations would have to ensure that
> data was laid out in manner safe for multi-threaded access.

Well that's what would be needed. The question is does the Java Language
Specification actually require that this be done? I don't think it
addresses this issue. Even my earlier comment about synchronisation
probably making things work right, is wrong; the visibility requirements of
synchronisation would be satisfied even in the presence of word tearing.

> No architecture I know of provides thread safe access to bits.

MCS-51 Lot's of nice little bit-addressable microcontrollers out there.
Running threads can be a bit of a pain though. :-)

> As as aside, there's this note the in java.util.BitSet docs which
> indicates that they're aware of these issues with regard to bits
> at least

I guess in one sense bit-problems are more obvious when dealing with byte
oriented machines and languages.

David

Joe Seigh

unread,

Mar 20, 1998, 3:00:00 AM3/20/98

to

In article <01bd52e0$ab71b630$1bf56f89@dholmes>, "David Holmes" <dho...@mri.mq.edu.au> writes:
|> Joe Seigh <se...@bose.com> wrote in article <1998Mar1...@bose.com>...
|> > The java virtual machine implmentations would have to ensure that
|> > data was laid out in manner safe for multi-threaded access.
|>
|> Well that's what would be needed. The question is does the Java Language
|> Specification actually require that this be done? I don't think it
|> addresses this issue. Even my earlier comment about synchronisation
|> probably making things work right, is wrong; the visibility requirements of
|> synchronisation would be satisfied even in the presence of word tearing.

I looked in the jvm spec on this issue. There's no explicit mention
of this requirement, maybe because it was considered so elementary,
the requirement being, the value read from a variable is the same as
the value last written to the variable. The spec uses the terms
"use", "assign", "load", "store", "read", and "write". The latter
two terms refer to actions on actual memory. Note that I used the
term variable and not the term memory in that requirement statement.
This means that side effects would be forbidden. For instance, if
a write to a variable also wrote into the memory of another variable,
it would be forbidden unless the other variable was just being
rewritten with its present value, or that value was restored before
the next read from that other variable.

It would be up to jvm implementation to ensure that this requirement
was met, no matter what hardware architecture it was done on.

Of course, java rather consciously avoids stating any requirement of
forward progress either. Most of us would consider that rather
important. It's definitely possible to write a java virtual machine
that does nothing useful and have it be totally compliant.

Joe Seigh