Object-oriented multithreading

Alan McKenney

unread,

Sep 6, 2006, 8:16:16 AM9/6/06

to

{ if somebody has links to google groups repository or at least the
subject lines of the threads discussing this in c.p.threads, do post
them please, for the benefit of both the OP and the group, and so we
don't begin discussing what has already been discussed. thanks! -mod }

Over on comp.std.c++, there's a thread about what form
multithreading support in a future C++ language standard
would take.

One of the big issues is synchronization, especially
of memory reads and writes. For example, if we have

int shared_value = 0;
Mutex shared_value_mutex;
...

void thread_a() {
shared_value_mutex.lock();
shared_value += 10;
shared_value_mutex.unlock();
}
void thread_b() {
shared_value_mutex.lock();
shared_value += 20;
shared_value_mutex.unlock();
}

and we assume that thread_b() starts before thread_a()
finishes, it is conceivable that thread_a() will release
the mutex, and thread_b() will get it, before thread_a's
update of shared_value propagates to the processor running
thread_b.

(Keep in mind that in a thread-unaware system, no operation
can be assumed to be atomic.)

The only safe way to deal with this is to:

a. do all updates of shared variables via library functions,
in addition to protecting critical sections with mutexes,
or:

b. provide a language-defined way to tell the compiler that
a given synchronization variable or operation protects
a given set of variables.

In reading and thinking about this, it occurs to me
that, in my (possibly limited) experience, synchronization
always comes down to having a single thread be able to make a
set of updates to a set of related variables without another
thread updating the same variables in the meantime.

Another way to think of it is that we want some set
of operations to be "atomic" with respect to the variables
they reference/update.

If we think in object-oriented terms,

"set of related variables" = object

"set of updates" = method.

Seen this way, the logical unit of data to protect is
an object, and the logical unit of code for an atomic
operation is a method.

Following this approach, we would handle synchronization
of operations by declaring a member function "atomic",
or have some sort of lock function that would apply to
*this.

The compiler would recognize this construct as meaning
that before a thread could start the function, it would
have to wait:

a. to obtain exclusive access to *this (e.g., by locking an
instance-specific mutex), and

b. for all memory writes to *this to have completed,
or at least to have become visible to this thread.

I don't recall seeing this approach mentioned in this group
or comp.std.c++; I don't follow the multithreading groups
closely enough to know if this has come up there or not.

Has this approach been considered? If so, has it been discarded
as a Bad Idea, and, if so, why?

The main objection that I can think of that I can't
easily dispose of (to my own satisfaction, at least)
is that there are forms of synchronization that
don't easily fit into an "atomic member function" model.

(Another objection would be that as presented here,
it only has lock/wait, not lock/fail, but I think that
a way could be found to express this.)

Or am I asking an FAQ?

-- Alan McKenney

[line eater fodder]

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Pete Becker

unread,

Sep 6, 2006, 10:49:45 AM9/6/06

to

Alan McKenney wrote:
>
> One of the big issues is synchronization, especially
> of memory reads and writes. For example, if we have
>
> int shared_value = 0;
> Mutex shared_value_mutex;
> ...
>
> void thread_a() {
> shared_value_mutex.lock();
> shared_value += 10;
> shared_value_mutex.unlock();
> }
> void thread_b() {
> shared_value_mutex.lock();
> shared_value += 20;
> shared_value_mutex.unlock();
> }
>
> and we assume that thread_b() starts before thread_a()
> finishes, it is conceivable that thread_a() will release
> the mutex, and thread_b() will get it, before thread_a's
> update of shared_value propagates to the processor running
> thread_b.
>

Maybe I'm missing the point. If the mutex is properly implemented, this
isn't a problem. Unlocking does a release, and locking does an acquire.
The result is that any values written by the unlocking thread are
visible to the locking thread.

--

-- Pete

Author of "The Standard C++ Library Extensions: a Tutorial and Reference."
For more information about this book, see www.petebecker.com/tr1book.

kanze

unread,

Sep 6, 2006, 1:50:06 PM9/6/06

to

Alan McKenney wrote:
> Over on comp.std.c++, there's a thread about what form
> multithreading support in a future C++ language standard
> would take.

> One of the big issues is synchronization, especially
> of memory reads and writes. For example, if we have

> int shared_value = 0;
> Mutex shared_value_mutex;
> ...

> void thread_a() {
> shared_value_mutex.lock();
> shared_value += 10;
> shared_value_mutex.unlock();
> }
> void thread_b() {
> shared_value_mutex.lock();
> shared_value += 20;
> shared_value_mutex.unlock();
> }

> and we assume that thread_b() starts before thread_a()
> finishes, it is conceivable that thread_a() will release
> the mutex, and thread_b() will get it, before thread_a's
> update of shared_value propagates to the processor running
> thread_b.

What makes you assume this? It depends on the system, but Posix
guarantees full memory synchronization in both the lock and the
unlock.

> (Keep in mind that in a thread-unaware system, no operation
> can be assumed to be atomic.)

Keep in mind, too, that a thread-unaware system doesn't have
mutextes:-).

> The only safe way to deal with this is to:

> a. do all updates of shared variables via library functions,
> in addition to protecting critical sections with mutexes,
> or:

> b. provide a language-defined way to tell the compiler that
> a given synchronization variable or operation protects
> a given set of variables.

> In reading and thinking about this, it occurs to me
> that, in my (possibly limited) experience, synchronization
> always comes down to having a single thread be able to make a
> set of updates to a set of related variables without another
> thread updating the same variables in the meantime.

That's true for some uses. For others, it is a question of one
thread making two series of updates, a first, followed by a
second, and ensuring that no other thread can possibly see the
results of the second series of updates without also seeing
those of the first. In simple terms, if we start out with a and
b equal to zero, then one thread sets a to 1, then b to 1. The
goal is that another thread can never see a equal 0 and b equal
1.

> Another way to think of it is that we want some set of
> operations to be "atomic" with respect to the variables they
> reference/update.

> If we think in object-oriented terms,

> "set of related variables" = object

> "set of updates" = method.

> Seen this way, the logical unit of data to protect is
> an object, and the logical unit of code for an atomic
> operation is a method.

Except that it often doesn't work out that way. You have two or
three objects which must remain coherent amongst themselves, so
your atomicity has to cover several objects.

> Following this approach, we would handle synchronization of
> operations by declaring a member function "atomic", or have
> some sort of lock function that would apply to *this.

> The compiler would recognize this construct as meaning that
> before a thread could start the function, it would have to
> wait:

> a. to obtain exclusive access to *this (e.g., by locking an
> instance-specific mutex), and

> b. for all memory writes to *this to have completed,
> or at least to have become visible to this thread.

> I don't recall seeing this approach mentioned in this group or
> comp.std.c++; I don't follow the multithreading groups closely
> enough to know if this has come up there or not.

> Has this approach been considered? If so, has it been
> discarded as a Bad Idea, and, if so, why?

It's one of the approaches used by Java. One which, in
practice, doesn't seem to be much used, because it is so rare
that an object and a function correspond to the desired
granularity of locking.

> The main objection that I can think of that I can't easily
> dispose of (to my own satisfaction, at least) is that there
> are forms of synchronization that don't easily fit into an
> "atomic member function" model.

It obviously can't be the only possibility. Even in Java, you
can also synchronize a block on a separate object. My
experience suggests that at times, in fact, synchronization
doesn't even respect scope (i.e. scoped_lock doesn't work).

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Lourens Veen

unread,

Sep 6, 2006, 1:52:05 PM9/6/06

to

Alan McKenney wrote:
>
> In reading and thinking about this, it occurs to me
> that, in my (possibly limited) experience, synchronization
> always comes down to having a single thread be able to make a
> set of updates to a set of related variables without another
> thread updating the same variables in the meantime.
>
> Another way to think of it is that we want some set
> of operations to be "atomic" with respect to the variables
> they reference/update.
>
> If we think in object-oriented terms,
>
> "set of related variables" = object
>
> "set of updates" = method.
>
> Seen this way, the logical unit of data to protect is
> an object, and the logical unit of code for an atomic
> operation is a method.
>
> Following this approach, we would handle synchronization
> of operations by declaring a member function "atomic",
> or have some sort of lock function that would apply to
> *this.

That sounds like synchronized methods in Java. The problem with
applying it unchanged to C++ is, as you noted, that you lose a lot of
flexibility (I think that some additions have been made recently to
Java to supplement synchronized methods). What about non-member
functions? These don't exist in Java, but they do in C++, and it
would be nice to be able to lock them as well (but then what do you
lock?). What if you want a single operation that affects multiple
objects to execute atomically? What about including the arguments
passed to the function in the lock?

Since objects and functions are the units of data and code in C++ it
makes sense to make them units for locking protection as well. What
we need to make it powerful enough is a mechanism for combining these
units. Some way to lock multiple objects, to combine functions in a
way that allows them to use each other's locked data, and so on. But
this is hard to do.

Let's assume a "shared" type modifier, and an "atomic" attribute for
functions. When an atomic function is called, all objects of a shared
type that it accesses are locked before the function is executed. The
problem is that the atomic function may call other functions, which
access different (or the same!) data that may be shared. All of those
things have to be locked as well, _before_ the original atomic
function is executed. So the compiler has to analyse the call graph,
and lock any shared variables touched by the subgraph headed by our
atomic function.

That's hard already, and maybe impossible if you take into account
virtual calls. The second problem is, it may introduce deadlocks.
Suppose we have two shared global objects, and we want to compare
them using an atomic function cmp(), from two different threads:

shared SomeObject a;
shared SomeObject b;

atomic bool cmp(shared SomeObject & const x,
shared SomeObject & const y) {
return x < y;
}

void thread_a() {
if (cmp(a, b)) {
/* ... */
}
}

void thread_b() {
if (cmp(b, a)) {
/* ... */
}
}

Now if cmp() locks the variable referenced by x first, and then the
one referenced by y, then it can happen that thread_a locks a, after
which thread_b locks b, and then both of them wait forever to acquire
the next lock. So, this isn't deadlock-safe, and I don't see an easy
way of making it. You could do so by ensuring that all atomic
functions always lock objects in the same global order, but due to
aliasing that seems to be impractical if not impossible.

I suspect that the next level of abstraction up from simple locks is
full-blown transaction support. I've been wondering if you could make
something workable, using exceptions to provoke rollbacks, and clever
template metaprogramming to get the type system to help as much as
possible. I haven't had time to work on this idea unfortunately.

Lourens

Alan McKenney

unread,

Sep 6, 2006, 6:56:42 PM9/6/06

to

kanze wrote:

...

>
> > and we assume that thread_b() starts before thread_a()
> > finishes, it is conceivable that thread_a() will release
> > the mutex, and thread_b() will get it, before thread_a's
> > update of shared_value propagates to the processor running
> > thread_b.
>
> What makes you assume this? It depends on the system, but Posix
> guarantees full memory synchronization in both the lock and the
> unlock.

If by "full memory synchronization" you mean all writes
by all threads, that is one solution to the problem.

But it does not scale well to large numbers of processors,
since each mutex lock or unlock requires all CPUs to
wait until the whole memory system settles down.

If C++ writes such a requirement into a new standard, it
will make C++ multithreading uninteresting for those
who wish to make high-performance applications using
many CPUs.

>
> > (Keep in mind that in a thread-unaware system, no operation
> > can be assumed to be atomic.)
>
> Keep in mind, too, that a thread-unaware system doesn't have
> mutextes:-).

The C++ standard is currently thread-unaware, as has been
mentioned here many, many times.

The question is what form thread-awareness in a future
C++ standard would take.

My comment was intended to remind people that to
write multi-threaded code the way we are used to requires
some form of thread-awareness from the compiler. Just
having mutexes in the library is not enough.

(Though it is possible to do multithreading with no compiler
support at all; it's ugly, but it works.)

> > ... having a single thread be able to make a

> > set of updates to a set of related variables without another
> > thread updating the same variables in the meantime.
>
> That's true for some uses. For others, it is a question of one
> thread making two series of updates, a first, followed by a
> second, and ensuring that no other thread can possibly see the
> results of the second series of updates without also seeing
> those of the first.

Agreed. I think both uses are related, I just can't think
of a good way to describe the issue that includes both,
as well as others I didn't think of.

> Except that it often doesn't work out that way. You have two or
> three objects which must remain coherent amongst themselves, so
> your atomicity has to cover several objects.

I take it that putting the several objects
into a larger object is not reasonable.

> > Has this approach ["atomic" function locks *this]

> > been considered? If so, has it been
> > discarded as a Bad Idea, and, if so, why?
>
> It's one of the approaches used by Java. One which, in
> practice, doesn't seem to be much used, because it is so rare
> that an object and a function correspond to the desired
> granularity of locking.

I had in mind things like queues, or "simple" shared variables
or objects. I haven't needed atomic update of more "scattered"
sets of objects. Evidently, you have.

> > The main objection that I can think of that I can't easily
> > dispose of (to my own satisfaction, at least) is that there
> > are forms of synchronization that don't easily fit into an
> > "atomic member function" model.
>
> It obviously can't be the only possibility. Even in Java, you
> can also synchronize a block on a separate object. My
> experience suggests that at times, in fact, synchronization
> doesn't even respect scope (i.e. scoped_lock doesn't work).

It sounds like I have my answer -- it's been tried, and is
nice, but not sufficient.

I would still like to know if a more C++-centric approach to
multithreading could be found, which might make
multithreaded code cleaner and safer than the current
(C-style) approaches.

-- Alan McKenney

Christopher Merrill

unread,

Sep 6, 2006, 6:56:17 PM9/6/06

to

>>One of the big issues is synchronization, especially
>>of memory reads and writes. For example, if we have

>>int shared_value = 0;
>>Mutex shared_value_mutex;

>>void thread_a() {

>> shared_value_mutex.lock();
>> shared_value += 10;
>> shared_value_mutex.unlock();
>> }
>>
>> void thread_b() {
>> shared_value_mutex.lock();
>> shared_value += 20;
>> shared_value_mutex.unlock();
>> }

Maybe I'm misunderstanding the question, but if you define shared_value as
volatile int instead of just int, doesn't that instruct the compiler to
never cache shared_value in a register? Or is there another way this simple
mutex scheme can be defeated?

The following might be of interest to the OP:

http://www.musicdsp.org/files/ATOMIC.H

Chris Thomasson

unread,

Sep 7, 2006, 9:47:43 AM9/7/06

to

"Alan McKenney" <alan_mc...@yahoo.com> wrote in message
news:1157513749.6...@i3g2000cwc.googlegroups.com...

>{ if somebody has links to google groups repository or at least the
> subject lines of the threads discussing this in c.p.threads, do post
> them please, for the benefit of both the OP and the group, and so we
> don't begin discussing what has already been discussed. thanks! -mod }
>
> Over on comp.std.c++, there's a thread about what form
> multithreading support in a future C++ language standard
> would take.

[...]

before I started on mutexs, I would define how I would solve compiler
reordering and hardware reordering...

http://groups.google.com/group/comp.programming.threads/msg/423df394a0370fa6

Link-time optimizations' aside for a moment, a compiler treats a call to an
external unknown function in a pessimistic fashion... Code motion is usually

halted... It behaves as a compiler barrier of sorts...

So... I would declare a template

.... std::atomic<Type>

And explicitly state that compiler shall not perform any tricky
optimizations' or code motion across calls into this template. Basically,
compiler shall tread std::atomic<anything> as if it were an externally
assembled function...

Well, that seems like it should take care of compiler reordering... The
hardware ordering comes in the form memory barrier instructions... The
barriers could be bound to an atomic operation.

So... I would declare another template, and 4 constants:

std::storeload = 0x1
std::loadstore = 0x2
std::loadload = 0x4
std::storestore = 0x8

.... std::atomic<Type>::op<MemoryBarrier>

Now I could do something like:

static T pVal = 0;

std::atomic<T>::op<std::storeload | std::storestore>(&pVal ... );

std::atomic<T>::op<std::loadstore | std::loadload>(&pVal ... );

std::atomic<T>::op<std::loadstore | std::storestore>(&pVal ... );

std::atomic<T>::op<std::storestore>(&pVal ... );

std::atomic<T>::op<std::loadload>(&pVal ... );

// naked is default
std::atomic<T>::op<>(&pVal ... );

Compiler shall not perform any code motion/tricky optimizations' across
calls to std::atomic these functions... That takes care of compiler
ordering... The memory barrier takes care of the hardware ordering... I
would model this design after the SPARC instruction-set, like the example
shows, because of its "fine granularity"...

Any thoughts?

Chris Thomasson

unread,

Sep 7, 2006, 9:48:50 AM9/7/06

to

{ the poster most likely meant to use the value 0x10. -mod }

> So... I would declare another template, and 4 constants:
>
>
> std::storeload = 0x1
> std::loadstore = 0x2
> std::loadload = 0x4
> std::storestore = 0x8

make that 5 constants... add:

std::depends = 0x01

Got to remember to support loads with data-dependencies... No acquire needed

here!

:)

Joe

unread,

Sep 7, 2006, 9:44:37 AM9/7/06

to

Christopher Merrill wrote:
>
> Maybe I'm misunderstanding the question, but if you define shared_value as
> volatile int instead of just int, doesn't that instruct the compiler to
> never cache shared_value in a register? Or is there another way this
simple
> mutex scheme can be defeated?
>

It's not quite so simple as that. With multiple CPUs, each with their
own read ahead caches, you need to instruct those CPUs that what they
thought was a vaild value for shared_value, is no longer. This
requires a memory barrier of some sort. Most OS's will perform this
action when a mutex is released. However, just declaring a variable
volatile doesn't really do the trick.

joe

kanze

unread,

Sep 7, 2006, 9:55:16 AM9/7/06

to

Alan McKenney wrote:
> kanze wrote:

> ...

> > > and we assume that thread_b() starts before thread_a()
> > > finishes, it is conceivable that thread_a() will release
> > > the mutex, and thread_b() will get it, before thread_a's
> > > update of shared_value propagates to the processor running
> > > thread_b.

> > What makes you assume this? It depends on the system, but Posix
> > guarantees full memory synchronization in both the lock and the
> > unlock.

> If by "full memory synchronization" you mean all writes
> by all threads, that is one solution to the problem.

> But it does not scale well to large numbers of processors,
> since each mutex lock or unlock requires all CPUs to
> wait until the whole memory system settles down.

No, because the guarantee only affects the thread making the
lock. Roughly speaking, there is, conceptually at least, a
"global memory" which is the same for all processors. Locking
or unlocking ensures synchronization between any processor
local memory and that global memory.

If two threads are collaborating, and using common memory to
communicate, the unlock request of the first thread will ensure
that this "global" memory is fully synchronized with local
memory before unlocking, and the the lock of the second thread
will ensure that its local memory is synchronized before
returning (but after having acquired the lock).

(This is, of course, a very informal description; a more formal
description would use the concepts of barriers or fences, and
would separate reading and writing more rigorously.)

> If C++ writes such a requirement into a new standard, it
> will make C++ multithreading uninteresting for those who
> wish to make high-performance applications using many
> CPUs.

It's what you currently get with Posix or Windows.

> > > (Keep in mind that in a thread-unaware system, no operation
> > > can be assumed to be atomic.)

> > Keep in mind, too, that a thread-unaware system doesn't have
> > mutextes:-).

> The C++ standard is currently thread-unaware, as has been
> mentioned here many, many times.

> The question is what form thread-awareness in a future
> C++ standard would take.

> My comment was intended to remind people that to
> write multi-threaded code the way we are used to requires
> some form of thread-awareness from the compiler. Just
> having mutexes in the library is not enough.

This is well known, and the problem of the underlying memory
model is being seriously addressed.

> (Though it is possible to do multithreading with no
> compiler support at all; it's ugly, but it works.)

Yah, you just need a bit of assembler here and there:-). Or you
count on the optimizer not being too intelligent.

> > > ... having a single thread be able to make a
> > > set of updates to a set of related variables without another
> > > thread updating the same variables in the meantime.

> > That's true for some uses. For others, it is a question of one
> > thread making two series of updates, a first, followed by a
> > second, and ensuring that no other thread can possibly see the
> > results of the second series of updates without also seeing
> > those of the first.

> Agreed. I think both uses are related, I just can't think
> of a good way to describe the issue that includes both,
> as well as others I didn't think of.

I think that it's a foregone conclusion that whatever we do,
someone will find a use case that it doesn't cover:-).

> > Except that it often doesn't work out that way. You have
> > two or three objects which must remain coherent amongst
> > themselves, so your atomicity has to cover several objects.

> I take it that putting the several objects
> into a larger object is not reasonable.

It's often possible to use the decorator pattern to create a
fassade which does this. It's often the case, however, that
while possible, this isn't natural, and perverts the design.

> > > Has this approach ["atomic" function locks *this]
> > > been considered? If so, has it been
> > > discarded as a Bad Idea, and, if so, why?

> > It's one of the approaches used by Java. One which, in
> > practice, doesn't seem to be much used, because it is so rare
> > that an object and a function correspond to the desired
> > granularity of locking.

> I had in mind things like queues, or "simple" shared
> variables or objects. I haven't needed atomic update of
> more "scattered" sets of objects. Evidently, you have.

I've had the problem of atomic access being acquired by one
function, which does a look-up of some sort, then returns both a
pointer to the object and a lock. It's not a frequent scenario,
but it's not one that I would say we should reject out of hand.

> > > The main objection that I can think of that I can't easily
> > > dispose of (to my own satisfaction, at least) is that
> > > there are forms of synchronization that don't easily fit
> > > into an "atomic member function" model.

> > It obviously can't be the only possibility. Even in Java,
> > you can also synchronize a block on a separate object. My
> > experience suggests that at times, in fact, synchronization
> > doesn't even respect scope (i.e. scoped_lock doesn't work).

> It sounds like I have my answer -- it's been tried, and is
> nice, but not sufficient.

> I would still like to know if a more C++-centric approach
> to multithreading could be found, which might make
> multithreaded code cleaner and safer than the current
> (C-style) approaches.

I'd say that RAII answers most of the problems. It doesn't work
in the (admittedly rare) cases where synchronization doesn't
respect scope, but I would expect any mutex interface in the
standard to support it, even if I don't think its use should be
mandatory.

I don't see a real need for more than RAII. In my own
experience in Java, I don't think I ever had a synchronized
function; although there were cases where I could have used one,
I felt that it was clearer to make synchronization explicit, and
to use an explicit (and thus visible) synchronization block,
rather than a synchronized function.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

kanze

unread,

Sep 7, 2006, 9:55:40 AM9/7/06

to

Christopher Merrill wrote:
> >>One of the big issues is synchronization, especially
> >>of memory reads and writes. For example, if we have

> >>int shared_value = 0;
> >>Mutex shared_value_mutex;

> >>void thread_a() {
> >> shared_value_mutex.lock();
> >> shared_value += 10;
> >> shared_value_mutex.unlock();
> >> }

> >> void thread_b() {
> >> shared_value_mutex.lock();
> >> shared_value += 20;
> >> shared_value_mutex.unlock();
> >> }

> Maybe I'm misunderstanding the question, but if you define
> shared_value as volatile int instead of just int, doesn't that
> instruct the compiler to never cache shared_value in a
> register?

It instructs the compiler to take some implementation defined
precautions. In most of the compilers I use, it does exactly
what you say. And no more, which makes it pretty useless with
regards to thread safety (or much of anything else, for that
matter---volatile isn't sufficient even for memory mapped IO on
a Sparc, at least not as implemented by Sun CC or g++).

You shouldn't forget, either, that this is a simple example. In
real life, the shared_value might be a much more complex data
structure, and the update might involve many memory accesses.
It would be necessary for all of the accesses to volatile
qualified. And volatile, implemented in a way that has meaning
in a multithreaded environment, has a very high cost,
multiplying access times by 5 or more; this would be
unacceptable (and unnecessary) for most applications.

> Or is there another way this simple mutex scheme can be
> defeated?

With Posix synchronization methods (and I'm pretty sure the same
holds for Windows), you don't need volatile here. Posix
synchronization methods guarantee sufficient memory
synchronization for this to work.

> The following might be of interest to the OP:

> http://www.musicdsp.org/files/ATOMIC.H

Not much help to me---they don't compile on my platform:-).

I'm not sure if the use of volatile in them is necessary; I
suspect that Microsoft would guarantee that their compiler
assumes access in embedded assembler, and so won't optimize
accross it. The critical part which makes these functions work
(if they do work) is the synchronization guarantees of the lock
prefix in Intel's IA-32 architecture. On my own platform (Sun
sparc), I have one or two similar routines, which also use
special instructions (membar, on a Sparc) never generated by the
compiler. (Arguably, accessing an object through a volatile
qualifier should generate such instructions. It doesn't with
the compilers I have access to.)

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

James Hopkin

unread,

Sep 7, 2006, 9:53:59 AM9/7/06

to

Alan McKenney wrote:
>
> If by "full memory synchronization" you mean all writes
> by all threads, that is one solution to the problem.
>
> But it does not scale well to large numbers of processors,
> since each mutex lock or unlock requires all CPUs to
> wait until the whole memory system settles down.
>

Threads don't scale well to a large number of processors.

http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf

I'm also eager to hear more about Herb Sutter's Concur extensions for
C++. I wonder if Mr. Sutter would like to spill a few more beans about
it here, since I can't find much about it other than his talk at Palo
Alto (see http://www.gotw.ca/presentations.htm)

James

Lourens Veen

unread,

Sep 7, 2006, 7:56:59 PM9/7/06

to

James Hopkin wrote:
>
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf
>
> I'm also eager to hear more about Herb Sutter's Concur extensions
> for C++. I wonder if Mr. Sutter would like to spill a few more beans
> about it here, since I can't find much about it other than his talk
> at Palo Alto (see http://www.gotw.ca/presentations.htm)

Seconded. I would especially like to know how data hazards between
those atomic { /* ... */ } blocks are supposed to be dealt with.

Is the programmer responsible for making sure there is no shared data
between them that gets written to? In that case, the atomic keyword
is misleading, and returning futures would be a receipe for disaster
unless you kept track of what data they depend on very very
carefully. That would probably require breaking encapsulation as
well, because internal data structures must be considered as well.
Coding standards that would make this somewhat safe probably also
rule out much of the parallelism. It's not a viable option.

If the programmer does not keep track of data hazards, then the
compiler will have to find some way to prevent problems and make sure
the blocks really execute in some serialisable manner. Executing them
serially would defeat the purpose; as I explained in my other post in
this thread, I don't see an easy way of doing it automatically
either.

Lourens

Chris Thomasson

unread,

Sep 7, 2006, 8:25:50 PM9/7/06

to

{ please take the posting below as an invitation to continue this discussion
in comp.programming.threads. if possible, find the quoted thread (pun is
not intended) and revive it if you have something to say on the subject.
since this direction is taking us away from C++ (or even on a parallel
course), let us not continue unless there is a C++ issue to
debate. -mod }

"James Hopkin" <tasj...@gmail.com> wrote in message
news:1157621589.5...@e3g2000cwe.googlegroups.com...

> Alan McKenney wrote:
>>
>> If by "full memory synchronization" you mean all writes
>> by all threads, that is one solution to the problem.
>>
>> But it does not scale well to large numbers of processors,
>> since each mutex lock or unlock requires all CPUs to
>> wait until the whole memory system settles down.
>>
>
> Threads don't scale well to a large number of processors.
>
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf

YIKES!

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/b1
92c5ffe9b47926/5301d091247a4b16?lnk=raot&hl=en#5301d091247a4b16

http://groups.google.com/group/comp.programming.threads/msg/5d9d09da93024a1c
?hl=en

Please read all!

Now... After you read through that... I would be happy to answer any
questions you have.

Herbs "futures" is a thread pool with queue and producer/consumer message
passing techniques. Read this proposal:

http://groups.google.com/group/comp.lang.c++.moderated/browse_frm/thread/1ee
c07d3d865ef7b/c3788d38fcc15c21?hl=en#c3788d38fcc15c21

I have a extremely scaleable implementation for futures... But, it uses
thread pooling techniques... How can you possibly say that threads don't
scale, when every existing solution have threads underneath the
covers???????

Joe Seigh

unread,

Sep 8, 2006, 12:20:26 PM9/8/06

to

James Hopkin wrote:
> Alan McKenney wrote:
>
>> If by "full memory synchronization" you mean all writes
>> by all threads, that is one solution to the problem.
>>
>> But it does not scale well to large numbers of processors,
>> since each mutex lock or unlock requires all CPUs to
>> wait until the whole memory system settles down.
>>
>
>
> Threads don't scale well to a large number of processors.
>
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf

Threads don't scale well to a large number of programmers. :)

The same could be said of C++ if you don't hire programmers
who actually know how to programm in C++ well. You get what
you pay for. It you throw tons of money at certain skill sets
you will eventually end up with a lot of people having that
skill set. That's why there are lots of people out there with
webby type skills. It's hard to take seriosly any conclusion
based on the current de facto distribution of skills.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

Chris Thomasson

unread,

Sep 8, 2006, 7:14:33 PM9/8/06

to

{ those who encounter problems clicking on the links need to take
them into a text editor and splice them manually, most likely.
while we try not to break them intentionally, the can still end
up broken. -mod }

"Lourens Veen" <lou...@rainbowdesert.net> wrote in message
news:d21d3$44fedccc$8259a2fa$20...@news1.tudelft.nl...
> Alan McKenney wrote:
[...]

> That sounds like synchronized methods in Java.

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/55
22c0af29765e6d/07f62a83425e7b9f?lnk=gst&q=dave+butenhof+java+synchronized&rn
um=7#07f62a83425e7b9f

[...]

> I suspect that the next level of abstraction up from simple locks is
> full-blown transaction support. I've been wondering if you could make
> something workable, using exceptions to provoke rollbacks, and clever
> template metaprogramming to get the type system to help as much as
> possible. I haven't had time to work on this idea unfortunately.

Humm...

Here are some of my thoughts on transactional memory:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/f6
399b3b837b0a40/5f4afc338f3dd221?hl=en#5f4afc338f3dd221

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/9c
572b709248ae64/eefe66fd067bdb67?hl=en#eefe66fd067bdb67

http://groups.google.com/group/comp.programming.threads/msg/7c4f5ba87e36fd79
?hl=en

http://groups.google.com/group/comp.arch/msg/1b9e405080e93149

http://groups.google.com/group/comp.arch/msg/bbbc035cf1a8502c?hl=en

http://groups.google.com/group/comp.arch/msg/11b14c4bda2d5d82?hl=en

http://groups.google.com/group/comp.arch/msg/9b00fda2752966f9?hl=en

http://groups.google.com/group/comp.arch/msg/335aeb22fd6fe526?hl=en

http://groups.google.com/group/comp.arch/msg/1ace9400b1b16cd4

http://groups.google.com/group/comp.arch/msg/995379a16beb3b69
(simply excellent Wheeler post)

If your interested in transactional memory, please read ALL!.

After that, I would be happy to answer any questions any of you guys have.

:)

Chris Thomasson

unread,

Sep 9, 2006, 2:01:13 PM9/9/06

to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:obidnSBRhvU6OmLZ...@comcast.com...

> "Alan McKenney" <alan_mc...@yahoo.com> wrote in message
> news:1157513749.6...@i3g2000cwc.googlegroups.com...

[...]

> So... I would declare another template, and 4 constants:

Template meta-programming can possibly be used to map the type and memory
barriers to the correct implementation details...

template<typename, signed> class atomic_op_detail_meta;

kanze

unread,

Sep 11, 2006, 9:34:06 AM9/11/06

to

Joe Seigh wrote:
> James Hopkin wrote:
> > Alan McKenney wrote:

> >> If by "full memory synchronization" you mean all writes
> >> by all threads, that is one solution to the problem.

> >> But it does not scale well to large numbers of processors,
> >> since each mutex lock or unlock requires all CPUs to
> >> wait until the whole memory system settles down.

> > Threads don't scale well to a large number of processors.

> > http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf

> Threads don't scale well to a large number of programmers. :)

> The same could be said of C++ if you don't hire programmers
> who actually know how to programm in C++ well.

Actually, you can get pretty good results even if only a small
minority of the programmers are "C++ experts" or "threading
experts". You do need someone on the project to define the
ground rules, and create a framework, and good management to
ensure that the framework is used and the ground rules obeyed,
but I've seen one large, multithreaded project that ran
flawlessly, even though only about five of the fifety some
programmers really understood threading.

(I wouldn't go so far as to say that this technique would work
for all applications, but it seems applicable to typical clients
and servers in the business world, where most of the programmers
are concerned with application domain business logic, and only a
very few with the framework which ensures transactional
integrity, thread safety, and the other technical services.)

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Chris Thomasson

unread,

Sep 12, 2006, 6:42:31 AM9/12/06

to

"kanze" <ka...@gabi-soft.fr> wrote in message
news:1157622915.1...@i42g2000cwa.googlegroups.com...
> Christopher Merrill wrote:

[...]

>> The following might be of interest to the OP:
>
>> http://www.musicdsp.org/files/ATOMIC.H

I believe that their atomic reference counting is busted... Seems like its
only as strong as shared_ptr<>...

You might want to read this:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/f2c94118046142e8/0b9072fea09fb64e?lnk=gst&q=lock-free+reference+patent
&rnum=1#0b9072fea09fb64e