Multithreading theory

Corey Cooper

unread,

Jul 9, 2001, 6:43:48 AM7/9/01

to

I have a supposition that I would like confirmation/refutation of. Please,
anyone who can pick apart or add to any part of this, please do so. Given
the following (granted strict) suppositions:

I am writing in Visual C 6.0, on an Intel processor.
The variables I am dealing with are declared “volatile”.
The compiler does all common simple math and assignments with a single
machine code. Even when doing multiple operations on a variable, the memory
that a variable resides in is updated in a single atomic operation, so the
variable is always ‘valid’, that is, one byte is not changed at a different
time than any other. Any constructs that deliberately violate this are
avoided, such as multiple separate bitwise operations. (In reality, I am
only doing increments (val++) and direct assignments (val = variable).)
I am using two or more threads, only one is ever allowed to write or
modify the variable (writer_thread), the other(s) can only read it
(reader_thread).
The state of the variable is non-critical to timing in the
reader_thread. That is, the value matters, but if the value is updated by
the writer_thread immediately after the reader_thread uses the variable,
this condition does not matter to the reader_thread. It is also true that
the variable might not have changed or has changed several times since the
last time the reader_thread has sampled it. The variable is asynchronous
with all inter-thread synchronizing and messaging.
If the reader_thread needs to use the variable in multiple lines and
guarantee the value does not change, it will transfer the value into a local
variable and act on that value. ie.
LocalVal = Val;
if ((LocalVal > 0) && (LocalVal < 5))
something_else = LocalVal;

Given these conditions, my use of the variable is thread-safe, and I can
avoid the overhead of CRITICAL_SECTION’s or mutex’s.

Corey Cooper
Cor...@InnovativeDesign.com

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]

Jake Holland

unread,

Jul 10, 2001, 6:07:58 PM7/10/01

to

"Corey Cooper" <corey...@mindspring.com> wrote in message
news:9ib18d$ne8$1...@slb7.atl.mindspring.net...

AFAICT, it's possible that in some cases it can work reliably.

Note that there is usually more than a single machine code for each of your
operations. For increment, usually it's copy memory to register, increment,
copy register to memory, I think. For an assignment it's usually copy
memory to register, copy register to memory. AFAICT, this observation
doesn't actually break you, however, because only one thread is allowed to
write.

Also, the type of your variable, the specific intel processor (or at least
its word size), the compiler flags, and the target operating system are all
critically important and were not mentioned. If you're using a processor
with a 16 bit word and you use a variable that happens to map to 32 bits for
your target with your flags, you're broken, as the copy-from-x-to-y will be
more than one machine instruction, which could end up with an invalid number
in memory. If you're sure this can't happen, I think you're barely ok.

As far as the C++ standard is concerned, neither assignment nor increment is
guaranteed to be atomic, even if they're volatile.

Personally, I'd almost always recommend taking the overhead hit for a very
brief lock. One day somebody's going to make some completely
innocent-looking change that nobody will notice, and you'll start getting
invalid values once every x hours or so, where x is the exact number that
means it's both too unstable to release and impossible to debug.

Mario Loefler

unread,

Jul 10, 2001, 6:14:07 PM7/10/01

to

Corey Cooper wrote:
>
> I have a supposition that I would like confirmation/refutation of. Please,
> anyone who can pick apart or add to any part of this, please do so. Given
> the following (granted strict) suppositions:
>
> I am writing in Visual C 6.0, on an Intel processor.

> The variables I am dealing with are declared ?volatile?.

> The compiler does all common simple math and assignments with a single
> machine code. Even when doing multiple operations on a variable, the memory
> that a variable resides in is updated in a single atomic operation, so the

> variable is always ?valid?, that is, one byte is not changed at a different

> time than any other. Any constructs that deliberately violate this are
> avoided, such as multiple separate bitwise operations. (In reality, I am
> only doing increments (val++) and direct assignments (val = variable).)
> I am using two or more threads, only one is ever allowed to write or
> modify the variable (writer_thread), the other(s) can only read it
> (reader_thread).
> The state of the variable is non-critical to timing in the
> reader_thread. That is, the value matters, but if the value is updated by
> the writer_thread immediately after the reader_thread uses the variable,
> this condition does not matter to the reader_thread. It is also true that
> the variable might not have changed or has changed several times since the
> last time the reader_thread has sampled it. The variable is asynchronous
> with all inter-thread synchronizing and messaging.
> If the reader_thread needs to use the variable in multiple lines and
> guarantee the value does not change, it will transfer the value into a local
> variable and act on that value. ie.
> LocalVal = Val;
> if ((LocalVal > 0) && (LocalVal < 5))
> something_else = LocalVal;
>
> Given these conditions, my use of the variable is thread-safe, and I can

> avoid the overhead of CRITICAL_SECTION?s or mutex?s.
>
> Corey Cooper
> Cor...@InnovativeDesign.com

Yeah, you should be fine!
Since your variable is of sig_atomic_t ( the biggest data type that is
always modified with one instruction ) it is always consistent.

var++ might be a problem in usual cases, but since you state that it
doesen't matter if it is updated in time again no problem; var++ is a
read-process-write cycle. Depends on the compiler if he gets it right or
not. I'd use ++var .

Regards

Mario

PeteK

unread,

Jul 10, 2001, 6:42:07 PM7/10/01

to

"Corey Cooper" <corey...@mindspring.com> wrote in message
news:<9ib18d$ne8$1...@slb7.atl.mindspring.net>...

> I have a supposition that I would like confirmation/refutation of. Please,
> anyone who can pick apart or add to any part of this, please do so. Given
> the following (granted strict) suppositions:
>
> I am writing in Visual C 6.0, on an Intel processor.

> The variables I am dealing with are declared “volatile”.

> The compiler does all common simple math and assignments with a single
> machine code. Even when doing multiple operations on a variable, the memory
> that a variable resides in is updated in a single atomic operation, so the

> variable is always ‘valid’, that is, one byte is not changed at

a different
> time than any other. Any constructs that deliberately violate this are
> avoided, such as multiple separate bitwise operations. (In reality, I am
> only doing increments (val++) and direct assignments (val = variable).)
> I am using two or more threads, only one is ever allowed to write or
> modify the variable (writer_thread), the other(s) can only read it
> (reader_thread).
> The state of the variable is non-critical to timing in the
> reader_thread. That is, the value matters, but if the value is updated by
> the writer_thread immediately after the reader_thread uses the variable,
> this condition does not matter to the reader_thread. It is also true that
> the variable might not have changed or has changed several times since the
> last time the reader_thread has sampled it. The variable is asynchronous
> with all inter-thread synchronizing and messaging.
> If the reader_thread needs to use the variable in multiple lines and
> guarantee the value does not change, it will transfer the value into a local
> variable and act on that value. ie.
> LocalVal = Val;
> if ((LocalVal > 0) && (LocalVal < 5))
> something_else = LocalVal;
>
>
> Given these conditions, my use of the variable is thread-safe, and I can

> avoid the overhead of CRITICAL_SECTION’s or mutex’s.
>
>
> Corey Cooper
> Cor...@InnovativeDesign.com
>
>

AFAIK this should be thread safe, but may not work as expected on a
multi-processor system. That's because there's nothing which ensures
that the value in the processor's cache is flushed back to main memory
before the other processor attempts to us it.

If this is a problem take a look at InterlockedIncrement and the
associated functions.

PeteK

Emil Pop

unread,

Jul 10, 2001, 6:42:28 PM7/10/01

to

...the scenario you are describing should work fine with all of the built-in
types except
float and double which I believe are fetched/written in two or more steps
from the main
memory into the CPU registers.

I had the same questions quite a while ago when I implemented a similar
scenario on a Solaris
box. With Solaris is the same, fetching/writing from memory to CPU is
atomic for built-in types
with upto 4 bytes memory foot-print.

Hope this helps.

"Corey Cooper" <corey...@mindspring.com> wrote in message
news:9ib18d$ne8$1...@slb7.atl.mindspring.net...

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Jeremy

unread,

Jul 10, 2001, 10:37:00 PM7/10/01

to

Corey Cooper wrote:
> Given these conditions, my use of the variable is thread-safe, and I can
> avoid the overhead of CRITICAL_SECTION’s or mutex’s.

Yes.

Jens Kilian

unread,

Jul 11, 2001, 4:26:15 PM7/11/01

to

pe...@stones.com (PeteK) writes:
> AFAIK this should be thread safe, but may not work as expected on a
> multi-processor system. That's because there's nothing which ensures
> that the value in the processor's cache is flushed back to main memory
> before the other processor attempts to us it.

That's what cache coherency protocols are for. In an SMP system, the CPUs
inform each other about the states of their caches; e.g, using the MESI
(Modified/Exclusive/Shared/Invalid) protocol.
--
mailto:j...@acm.org phone:+49-7031-464-7698 (TELNET 778-7698)
http://www.bawue.de/~jjk/ fax:+49-7031-464-7351
PGP: 06 04 1C 35 7B DC 1F 26 As the air to a bird, or the sea to a fish,
0x555DA8B5 BB A2 F0 66 77 75 E1 08 so is contempt to the contemptible. [Blake]

Max Busch

unread,

Jul 11, 2001, 11:28:41 PM7/11/01

to

I asked a similar question in a different newsgroup once and was seriously
discouraged from omitting the critical section. I've sometimes done it in
practice, but only where the variable in question was non-critical, for
instance the reader thread would merely display it for informational
purposes.

An issue that's not been mentioned by the other contributors is
word-tearing. Supposing you have a 32-bit processor and a 32-bit variable,
then AFAIK the update operation will only be atomic if the variable is
aligned on a 4-byte boundary. This will be true for most variables, but may
not be true for variables in packed structures. In the latter case it's
possible that, say, the lower 16-bits are updated by the writer thread, the
reader thread reads the full 32-bits, then the writer thread updates the
high 16-bits. This would cause the reader thread to read a wrong/invalid
value.

I've also been warned that modern processors break up and reshuffle the
order of instructions so that what was atomic on, say, a 386, is no longer
an atomic instruction today. That is of course a very vague statement to
make and my instinct tells me that a scenario with just 1 writer thread
should still be fine. But if I could afford the time penalty, I'd still go
for the critical section or one of the Interlocked... () functions.

Another problem area already mentioned is the multi-processor system. If I
had to hedge a bet I would guess that you could run into similar effects to
the word-tearing problem described above, but basically I don't know :-). I
have a vague memory that there is such a thing as a bus lock instruction,
which will prevent the other processor(s) from accessing the memory bus
while an atomic operation is an progress. I believe this is the basis for
the Interlocked... () functions. I read an article once which presented it's
own better functions using inline assembly for the bus locks and, of course,
inline functions instead of function calls. This article seems to have
vanished from the web, but the source code and some documentation is still
available at John M. Dlugosz' personal web-site here:

http://www.dlugosz.com/Repertoire/refman/Classics/atomic_counter.html

The timing differences from the article were quite interesting:

Comparison times when incrementing a counter:

regular memory variable: 31.25 nanoseconds
atomic counter: 178.1 nanoseconds
critical section: 504.6 nanoseconds

Regards,
Max Busch.

Gerhard Menzl

unread,

Jul 11, 2001, 11:47:01 PM7/11/01

to

Mario Loefler wrote:

> Yeah, you should be fine!
> Since your variable is of sig_atomic_t ( the biggest data type that is
> always modified with one instruction ) it is always consistent.

There is no such guarantee in the C++ standard. Variables of type
volatile sig_atomic_t will have defined values when "the processing of
the abstract machine is interrupted by receipt of a signal" (1.9/9).
This has nothing to do with threads. From a standard point of view,
threads don't exist.

That implementors are very likely to fulfill the above requirement by
defining sig_atomic_t as a native data type that can be modified
atomically is an altogether different story. But you cannot portably
rely on it.

Gerhard Menzl

unread,

Jul 11, 2001, 11:48:38 PM7/11/01

to

Corey Cooper wrote:

> I am writing in Visual C 6.0, on an Intel processor.
> The variables I am dealing with are declared “volatile”.

[description of non-portable code snipped]

> Given these conditions, my use of the variable is thread-safe, and I
> can avoid the overhead of CRITICAL_SECTION’s or mutex’s.

Not portably. It may work on your particular platform with your
particular compiler, but the next service pack may apply modifications
to the code generation engine that renders your assumptions invalid. The
only reliable way to access data concurrently from more than one thread
is to use the documented locking mechanisms for the platform involved.

Note that Standard C++ does not offer any guarantees regarding
thread-safety. It is wrong and dangerous to assume that the volatile
qualifier makes your program thread-safe.

Gerhard Menzl

matthew.towler

unread,

Jul 12, 2001, 9:18:14 AM7/12/01

to

> Also, the type of your variable, the specific intel processor (or at least
> its word size), the compiler flags, and the target operating system are
all
> critically important and were not mentioned. If you're using a processor
> with a 16 bit word and you use a variable that happens to map to 32 bits
for
> your target with your flags, you're broken, as the copy-from-x-to-y will
be
> more than one machine instruction, which could end up with an invalid
number
> in memory. If you're sure this can't happen, I think you're barely ok.
>
> As far as the C++ standard is concerned, neither assignment nor increment
is
> guaranteed to be atomic, even if they're volatile.
>
> Personally, I'd almost always recommend taking the overhead hit for a very
> brief lock. One day somebody's going to make some completely
> innocent-looking change that nobody will notice, and you'll start getting
> invalid values once every x hours or so, where x is the exact number that
> means it's both too unstable to release and impossible to debug.
>

Another way round this, which works when the writing task is of guaranteed
higher priority (such as an interrupt), and you can use with values too big
to read or write in an atomic operation is a double read and compare. e.g.

volatile uint64 value = 0; // 8 bytes, not atomic

thread1() // interrupt
{
value++;
}

thread2()
{
UINT64 copyOfvalue;

do
{
copyOfValue = value;
}
while( copyOfValue != value );
}

Because the variable is volatile the assignment and test must be two
separate readings of value, if they are the same then value was definitely
not updated during the read, so the value is safe.

Matthew

Gary Chanson

unread,

Jul 12, 2001, 2:08:47 PM7/12/01

to

"Gerhard Menzl" <gerhar...@sea.ericsson.se> wrote in message
news:3B4C5C9C...@sea.ericsson.se...

> Mario Loefler wrote:
>
> > Yeah, you should be fine!
> > Since your variable is of sig_atomic_t ( the biggest data type that is
> > always modified with one instruction ) it is always consistent.
>
> There is no such guarantee in the C++ standard. Variables of type
> volatile sig_atomic_t will have defined values when "the processing of
> the abstract machine is interrupted by receipt of a signal" (1.9/9).
> This has nothing to do with threads. From a standard point of view,
> threads don't exist.
>
> That implementors are very likely to fulfill the above requirement by
> defining sig_atomic_t as a native data type that can be modified
> atomically is an altogether different story. But you cannot portably
> rely on it.

I think you will be very hard pressed to find an implementation that
does not satisfy this requirement. Every C compiler I've used (on several
quite different processors) has defaulted to (or required) long word
alignment (word for 16 bit implementations) at least. I would consider it
bad design to do otherwise because it is so advantageous to use this
technique.

--

-GJC
-gcha...@shore.net

-Abolish Public Schools.

dave porter

unread,

Jul 13, 2001, 10:42:18 AM7/13/01

to

"Gary Chanson" <gcha...@no.spam.shore.net> wrote in message
news:fta37.327$F6.2...@news.shore.net...

> I think you will be very hard pressed to find an implementation that
> does not satisfy this requirement. Every C compiler I've used (on several
> quite different processors) has defaulted to (or required) long word
> alignment (word for 16 bit implementations) at least. I would consider it
> bad design to do otherwise because it is so advantageous to use this
> technique.
>

Yeah, but by this point we're down to arguing the difference
between 'guaranteed to work' and 'seems like it works in
practical cases'.

Me: I have more sympathy with playing by the rules, especially
since I don't know what we're trying to solve. It seems silly
to be debating dodges to improve performance, without knowing
what the performance problem actually is!

Gerhard Menzl

unread,

Jul 13, 2001, 10:24:44 PM7/13/01

to

Gary Chanson wrote:

> > There is no such guarantee in the C++ standard. Variables of type
> > volatile sig_atomic_t will have defined values when "the processing
> > of the abstract machine is interrupted by receipt of a signal"
> > (1.9/9). This has nothing to do with threads. From a standard point
> > of view, threads don't exist.
> >
> > That implementors are very likely to fulfill the above requirement
> > by defining sig_atomic_t as a native data type that can be modified
> > atomically is an altogether different story. But you cannot portably
> > rely on it.
>
> I think you will be very hard pressed to find an implementation
> that does not satisfy this requirement. Every C compiler I've used
> (on several quite different processors) has defaulted to (or required)
> long word alignment (word for 16 bit implementations) at least. I
> would consider it bad design to do otherwise because it is so
> advantageous to use this technique.

If you can live with making the correctness of your program dependent on
mere statistic evidence, go ahead. But it is important to point out that
such a program is inherently unportable because it implicitly relies on
- documented or undocumented - features of the specific platform it runs
on.

Besides, there are more dangers to unguarded concurrent data access than
the possibility of a non-atomic read or write. Multiprocessor issues and
instruction reordering at hardware level introduce pitfalls that most
programmers wouldn't think about in their wildest dreams.

A few months ago there was a long discussion in comp.lang.c++.moderated/
comp.programming.threads about taking a similar synchronization shortcut
in the implementation of the Singleton pattern. Eventually it became
clear that it is impossible to bypass regular locking without invoking
the risk of race conditions.

Gerhard Menzl

Gary Chanson

unread,

Jul 14, 2001, 7:37:22 AM7/14/01

to

"dave porter" <por...@mangozoft.com> wrote in message
news:E0m37.1441$q17.1...@newsread2.prod.itd.earthlink.net...

>
> Yeah, but by this point we're down to arguing the difference
> between 'guaranteed to work' and 'seems like it works in
> practical cases'.
>
> Me: I have more sympathy with playing by the rules, especially
> since I don't know what we're trying to solve. It seems silly
> to be debating dodges to improve performance, without knowing
> what the performance problem actually is!

I completely agree. On the other hand, there are times when performance
is a significant enough requirement that 'seems like it works in practical
cases' is good enough. I do real time embedded systems so I often run into
this.

--

-GJC
-gcha...@shore.net

-Abolish Public Schools.

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Alexander Terekhov

unread,

Jul 14, 2001, 9:05:53 PM7/14/01

to

Gerhard Menzl wrote:

[...]

> A few months ago there was a long discussion in comp.lang.c++.moderated/
> comp.programming.threads about taking a similar synchronization shortcut
> in the implementation of the Singleton pattern. Eventually it became
> clear that it is impossible to bypass regular locking without invoking
> the risk of race conditions.

eventually it became clear that it is POSSIBLE to bypass regular locking
(without invoking the risk of race conditions) USING THREAD LOCALS.

regards,
alexander.

Gerhard Menzl

unread,

Jul 16, 2001, 8:46:09 PM7/16/01

to

Alexander Terekhov wrote:

> > A few months ago there was a long discussion in

> > comp.lang.c++.moderated/comp.programming.threads about taking a

> > similar synchronization shortcut in the implementation of the
> > Singleton pattern. Eventually it became clear that it is impossible
> > to bypass regular locking without invoking the risk of race
> > conditions.
>
> eventually it became clear that it is POSSIBLE to bypass regular
> locking (without invoking the risk of race conditions) USING THREAD
> LOCALS.

I do not recall a solution involving thread locals, but my point was
that you cannot rely on volatile. Thread local storage (which is what I
presume you mean) is just another platform-specific feature.

Gerhard Menzl

Alexander Terekhov

unread,

Jul 17, 2001, 5:27:26 PM7/17/01

to

Gerhard Menzl wrote:

> Alexander Terekhov wrote:
>
> > > A few months ago there was a long discussion in
> > > comp.lang.c++.moderated/comp.programming.threads about taking a
> > > similar synchronization shortcut in the implementation of the
> > > Singleton pattern. Eventually it became clear that it is impossible
> > > to bypass regular locking without invoking the risk of race
> > > conditions.
> >
> > eventually it became clear that it is POSSIBLE to bypass regular
> > locking (without invoking the risk of race conditions) USING THREAD
> > LOCALS.
>
> I do not recall a solution involving thread locals,

here is a compact example in Java (in C++ you would need to have
a "portable" class similar to ThreadLocal or just stick to C/POSIX
pthreads for TSD/TLS):

class Singleton {
private static Singleton theInstance;

private static final ThreadLocal perThreadInstance =
new ThreadLocal() {
public Object initialValue() { return createInstance(); }
};

public static Singleton getInstance() {
return (Singleton)perThreadInstance.get();
}

private static synchronized Singleton createInstance() {
if (theInstance == null)
theInstance = new Singleton();
return theInstance;
}

}

> but my point was that you cannot rely on volatile.

yup; you cannot rely on C volatiles. correct programs should
not violate memory synchronization rules:

POSIX

"Applications shall ensure that access to any memory
location by more than one thread of control (threads
or processes) is restricted such that no thread of
control can read or modify a memory location while
another thread of control may be modifying it. Such
access is restricted using functions that synchronize
thread execution and also synchronize memory with
respect to other threads."

JAVA

"If two threads access a normal variable, and one
of those accesses is a write, then the program should
be synchronized so that the first access is visible to
the second access. When a thread T 1 acquires a lock
on/enters a monitor m that was previously held by
another thread T 2 , all actions that were visible to T 2
at the time it released the lock on m become visible
to T 1"

> Thread local storage (which is what I presume you mean)
> is just another platform-specific feature.

it is quite portable feature.

regards,
alexander.