I am writing in Visual C 6.0, on an Intel processor.
The variables I am dealing with are declared “volatile”.
The compiler does all common simple math and assignments with a single
machine code. Even when doing multiple operations on a variable, the memory
that a variable resides in is updated in a single atomic operation, so the
variable is always ‘valid’, that is, one byte is not changed at a different
time than any other. Any constructs that deliberately violate this are
avoided, such as multiple separate bitwise operations. (In reality, I am
only doing increments (val++) and direct assignments (val = variable).)
I am using two or more threads, only one is ever allowed to write or
modify the variable (writer_thread), the other(s) can only read it
(reader_thread).
The state of the variable is non-critical to timing in the
reader_thread. That is, the value matters, but if the value is updated by
the writer_thread immediately after the reader_thread uses the variable,
this condition does not matter to the reader_thread. It is also true that
the variable might not have changed or has changed several times since the
last time the reader_thread has sampled it. The variable is asynchronous
with all inter-thread synchronizing and messaging.
If the reader_thread needs to use the variable in multiple lines and
guarantee the value does not change, it will transfer the value into a local
variable and act on that value. ie.
LocalVal = Val;
if ((LocalVal > 0) && (LocalVal < 5))
something_else = LocalVal;
Given these conditions, my use of the variable is thread-safe, and I can
avoid the overhead of CRITICAL_SECTION’s or mutex’s.
Corey Cooper
Cor...@InnovativeDesign.com
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]
Note that there is usually more than a single machine code for each of your
operations. For increment, usually it's copy memory to register, increment,
copy register to memory, I think. For an assignment it's usually copy
memory to register, copy register to memory. AFAICT, this observation
doesn't actually break you, however, because only one thread is allowed to
write.
Also, the type of your variable, the specific intel processor (or at least
its word size), the compiler flags, and the target operating system are all
critically important and were not mentioned. If you're using a processor
with a 16 bit word and you use a variable that happens to map to 32 bits for
your target with your flags, you're broken, as the copy-from-x-to-y will be
more than one machine instruction, which could end up with an invalid number
in memory. If you're sure this can't happen, I think you're barely ok.
As far as the C++ standard is concerned, neither assignment nor increment is
guaranteed to be atomic, even if they're volatile.
Personally, I'd almost always recommend taking the overhead hit for a very
brief lock. One day somebody's going to make some completely
innocent-looking change that nobody will notice, and you'll start getting
invalid values once every x hours or so, where x is the exact number that
means it's both too unstable to release and impossible to debug.
Yeah, you should be fine!
Since your variable is of sig_atomic_t ( the biggest data type that is
always modified with one instruction ) it is always consistent.
var++ might be a problem in usual cases, but since you state that it
doesen't matter if it is updated in time again no problem; var++ is a
read-process-write cycle. Depends on the compiler if he gets it right or
not. I'd use ++var .
Regards
Mario
AFAIK this should be thread safe, but may not work as expected on a
multi-processor system. That's because there's nothing which ensures
that the value in the processor's cache is flushed back to main memory
before the other processor attempts to us it.
If this is a problem take a look at InterlockedIncrement and the
associated functions.
PeteK
I had the same questions quite a while ago when I implemented a similar
scenario on a Solaris
box. With Solaris is the same, fetching/writing from memory to CPU is
atomic for built-in types
with upto 4 bytes memory foot-print.
Hope this helps.
"Corey Cooper" <corey...@mindspring.com> wrote in message
news:9ib18d$ne8$1...@slb7.atl.mindspring.net...
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
That's what cache coherency protocols are for. In an SMP system, the CPUs
inform each other about the states of their caches; e.g, using the MESI
(Modified/Exclusive/Shared/Invalid) protocol.
--
mailto:j...@acm.org phone:+49-7031-464-7698 (TELNET 778-7698)
http://www.bawue.de/~jjk/ fax:+49-7031-464-7351
PGP: 06 04 1C 35 7B DC 1F 26 As the air to a bird, or the sea to a fish,
0x555DA8B5 BB A2 F0 66 77 75 E1 08 so is contempt to the contemptible. [Blake]
An issue that's not been mentioned by the other contributors is
word-tearing. Supposing you have a 32-bit processor and a 32-bit variable,
then AFAIK the update operation will only be atomic if the variable is
aligned on a 4-byte boundary. This will be true for most variables, but may
not be true for variables in packed structures. In the latter case it's
possible that, say, the lower 16-bits are updated by the writer thread, the
reader thread reads the full 32-bits, then the writer thread updates the
high 16-bits. This would cause the reader thread to read a wrong/invalid
value.
I've also been warned that modern processors break up and reshuffle the
order of instructions so that what was atomic on, say, a 386, is no longer
an atomic instruction today. That is of course a very vague statement to
make and my instinct tells me that a scenario with just 1 writer thread
should still be fine. But if I could afford the time penalty, I'd still go
for the critical section or one of the Interlocked... () functions.
Another problem area already mentioned is the multi-processor system. If I
had to hedge a bet I would guess that you could run into similar effects to
the word-tearing problem described above, but basically I don't know :-). I
have a vague memory that there is such a thing as a bus lock instruction,
which will prevent the other processor(s) from accessing the memory bus
while an atomic operation is an progress. I believe this is the basis for
the Interlocked... () functions. I read an article once which presented it's
own better functions using inline assembly for the bus locks and, of course,
inline functions instead of function calls. This article seems to have
vanished from the web, but the source code and some documentation is still
available at John M. Dlugosz' personal web-site here:
http://www.dlugosz.com/Repertoire/refman/Classics/atomic_counter.html
The timing differences from the article were quite interesting:
Comparison times when incrementing a counter:
regular memory variable: 31.25 nanoseconds
atomic counter: 178.1 nanoseconds
critical section: 504.6 nanoseconds
Regards,
Max Busch.
There is no such guarantee in the C++ standard. Variables of type
volatile sig_atomic_t will have defined values when "the processing of
the abstract machine is interrupted by receipt of a signal" (1.9/9).
This has nothing to do with threads. From a standard point of view,
threads don't exist.
That implementors are very likely to fulfill the above requirement by
defining sig_atomic_t as a native data type that can be modified
atomically is an altogether different story. But you cannot portably
rely on it.
Gerhard Menzl
[description of non-portable code snipped]
> Given these conditions, my use of the variable is thread-safe, and I
> can avoid the overhead of CRITICAL_SECTION’s or mutex’s.
Not portably. It may work on your particular platform with your
particular compiler, but the next service pack may apply modifications
to the code generation engine that renders your assumptions invalid. The
only reliable way to access data concurrently from more than one thread
is to use the documented locking mechanisms for the platform involved.
Note that Standard C++ does not offer any guarantees regarding
thread-safety. It is wrong and dangerous to assume that the volatile
qualifier makes your program thread-safe.
Gerhard Menzl
volatile uint64 value = 0; // 8 bytes, not atomic
thread1() // interrupt
{
value++;
}
thread2()
{
UINT64 copyOfvalue;
do
{
copyOfValue = value;
}
while( copyOfValue != value );
}
Because the variable is volatile the assignment and test must be two
separate readings of value, if they are the same then value was definitely
not updated during the read, so the value is safe.
Matthew
I think you will be very hard pressed to find an implementation that
does not satisfy this requirement. Every C compiler I've used (on several
quite different processors) has defaulted to (or required) long word
alignment (word for 16 bit implementations) at least. I would consider it
bad design to do otherwise because it is so advantageous to use this
technique.
--
-GJC
-gcha...@shore.net
-Abolish Public Schools.
"Gary Chanson" <gcha...@no.spam.shore.net> wrote in message
news:fta37.327$F6.2...@news.shore.net...
> I think you will be very hard pressed to find an implementation that
> does not satisfy this requirement. Every C compiler I've used (on several
> quite different processors) has defaulted to (or required) long word
> alignment (word for 16 bit implementations) at least. I would consider it
> bad design to do otherwise because it is so advantageous to use this
> technique.
>
Yeah, but by this point we're down to arguing the difference
between 'guaranteed to work' and 'seems like it works in
practical cases'.
Me: I have more sympathy with playing by the rules, especially
since I don't know what we're trying to solve. It seems silly
to be debating dodges to improve performance, without knowing
what the performance problem actually is!
If you can live with making the correctness of your program dependent on
mere statistic evidence, go ahead. But it is important to point out that
such a program is inherently unportable because it implicitly relies on
- documented or undocumented - features of the specific platform it runs
on.
Besides, there are more dangers to unguarded concurrent data access than
the possibility of a non-atomic read or write. Multiprocessor issues and
instruction reordering at hardware level introduce pitfalls that most
programmers wouldn't think about in their wildest dreams.
A few months ago there was a long discussion in comp.lang.c++.moderated/
comp.programming.threads about taking a similar synchronization shortcut
in the implementation of the Singleton pattern. Eventually it became
clear that it is impossible to bypass regular locking without invoking
the risk of race conditions.
Gerhard Menzl
I completely agree. On the other hand, there are times when performance
is a significant enough requirement that 'seems like it works in practical
cases' is good enough. I do real time embedded systems so I often run into
this.
--
-GJC
-gcha...@shore.net
-Abolish Public Schools.
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[...]
> A few months ago there was a long discussion in comp.lang.c++.moderated/
> comp.programming.threads about taking a similar synchronization shortcut
> in the implementation of the Singleton pattern. Eventually it became
> clear that it is impossible to bypass regular locking without invoking
> the risk of race conditions.
eventually it became clear that it is POSSIBLE to bypass regular locking
(without invoking the risk of race conditions) USING THREAD LOCALS.
regards,
alexander.
I do not recall a solution involving thread locals, but my point was
that you cannot rely on volatile. Thread local storage (which is what I
presume you mean) is just another platform-specific feature.
Gerhard Menzl
> Alexander Terekhov wrote:
>
> > > A few months ago there was a long discussion in
> > > comp.lang.c++.moderated/comp.programming.threads about taking a
> > > similar synchronization shortcut in the implementation of the
> > > Singleton pattern. Eventually it became clear that it is impossible
> > > to bypass regular locking without invoking the risk of race
> > > conditions.
> >
> > eventually it became clear that it is POSSIBLE to bypass regular
> > locking (without invoking the risk of race conditions) USING THREAD
> > LOCALS.
>
> I do not recall a solution involving thread locals,
here is a compact example in Java (in C++ you would need to have
a "portable" class similar to ThreadLocal or just stick to C/POSIX
pthreads for TSD/TLS):
class Singleton {
private static Singleton theInstance;
private static final ThreadLocal perThreadInstance =
new ThreadLocal() {
public Object initialValue() { return createInstance(); }
};
public static Singleton getInstance() {
return (Singleton)perThreadInstance.get();
}
private static synchronized Singleton createInstance() {
if (theInstance == null)
theInstance = new Singleton();
return theInstance;
}
}
> but my point was that you cannot rely on volatile.
yup; you cannot rely on C volatiles. correct programs should
not violate memory synchronization rules:
POSIX
"Applications shall ensure that access to any memory
location by more than one thread of control (threads
or processes) is restricted such that no thread of
control can read or modify a memory location while
another thread of control may be modifying it. Such
access is restricted using functions that synchronize
thread execution and also synchronize memory with
respect to other threads."
JAVA
"If two threads access a normal variable, and one
of those accesses is a write, then the program should
be synchronized so that the first access is visible to
the second access. When a thread T 1 acquires a lock
on/enters a monitor m that was previously held by
another thread T 2 , all actions that were visible to T 2
at the time it released the lock on m become visible
to T 1"
> Thread local storage (which is what I presume you mean)
> is just another platform-specific feature.
it is quite portable feature.
regards,
alexander.