Good reference for meaning of acquire/release for C/C++?

Scott Meyers

unread,

Oct 18, 2004, 5:26:06 PM10/18/04

to

Recently I got smacked around in comp.lang.c++.moderated about the
semantics of acquire and release in the release consistency memory model.
I found that none of the books I have here on concurrent programming,

Parallel and Distributed Programming in C++ by Hughes and Hughes, AW,
2003.

Multithreaded Programming iwth Java Technology by Lewis and Berg, PH,
2000.

Principles of Concurrent and Distributed Programming, Ben-Ari, PH, 1990.

Foundations of Multithreaded, Parallel, and Distributed Programming by
Andrews, AW, 2000.

Programming with POSIX Threads by Butenhof, AW, 1997.

describes the release consistency model and the meaning of acquire and
release, at least not if you look up "acquire" and "release" in the
indices. I have enough experience with wretched indices to realize that
just because it's not in the index doesn't mean it's not in the book, but
still, I've found it very hard to come by a decent description of what
acquire and release mean both technically and conceptually. Frankly,
understanding didn't really dawn on me until I read the original ISCA90
paper by Gharchorloo et al, and going to a primary source for what seems
like pretty basic information seems like a lot to ask of practicing
programmers.

I googled for a good online source of information on this topic, but I
didn't really come up with anything. There are various papers and
powerpoint presentations, but all have an academic bent, and I found that I
didn't really understand them until I already understood what they were
saying. Reference-manual-type descriptions explain what acquire/release do
(e.g.,
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/kmarch/
hh/kmarch/Synchro_88127404-a394-403f-a289-d61c45ab81d5.xml.asp), but they
don't explain the conceptual relationship among acquire, release, and e.g.,
having a producer communicate with a consumer.

So here's my question: does anybody know of any good, readable, accessible
source of information on this topic I can refer practicing C/C++
programmers to? (For a variety of reasons, Javacentric references are not
really very good for this.) I expect to be covering this material with
increasing frequency, and as things stand now, there's not really any place
I can point people who want more information or a different treatment of
the same topic.

Thanks,

Scott

Randy

unread,

Oct 18, 2004, 6:17:25 PM10/18/04

to

First of all, I may have misread your question, but here goes anyway...

Number 1: I'm not clear why C++ programmers need to know the details of hardware
memory consistency models...

Number 2: I also think you're going to have trouble finding definition of
'acquire' and 'release' at a software level, since they're really meaningful
only at the hardware level. Other than some abstract hand waving, I don't know
how to tell a C++ programmer how to acquire or release a cache line. And given
all the hidden machinery at work within today's out-of-order CPUs, the
programmer simply can never know if/when a cache line has been released.

Number 3: If you're referring to implementing memory consistency entirely at the
software level, it seems to me that acquire <--> lock and release <--> unlock,
assuming that an atomic test-and-set operation is available. Of course,
semaphores (e.g. pthread_cond_*) or mutexes (e.g. pthread_mutex_*) are the
typical ways to implement locks using pthreads.

Number 4: If all you want is a _description_ of the release consistency model,
there are several of those...

One of the more authoritative sources of a variety of consistency models is
David Culler's textbook "Parallel Computer Architecture: A Hardware/Software
Approach":
http://www.cs.berkeley.edu/%7Eculler/book.alpha/index.html

Secondly, what about the following?

http://www.cs.nmsu.edu/~pfeiffer/classes/573/notes/consistency.html
"
Release Consistency

Having the single synchronization access type requires that, when a
synchronization occurs, we need to globally update memory - our local changes
need to be propagated to all the other processors with copies of the shared
variable, and we need to obtain their changes. Release consistency considers
locks on areas of memory, and propagates only the locked memory as needed. It's
defined as follows:

1. Before an ordinary access to a shared variable is performed, all
previous acquires done by the process must have completed
successfully.
2. Before a release is allowed to be performed, all previous reads and
writes done by the process must have completed.
3. The acquire and release accesses must be sequentially consistent.
"

There's more at:

http://www.cs.panam.edu/~meng/Course/CS6334/Note/master/node87.html
http://www.ipsj.or.jp/members/Trans/Eng/01/1999/4009/article003.html
http://rsim.cs.uiuc.edu/~sadve/Publications/models_tutorial.ps
http://www-ece.rice.edu/~vijaypai/ISCA97/node4.html

Randy

--
Randy Crawford http://www.ruf.rice.edu/~rand rand AT rice DOT edu

Alexander Terekhov

unread,

Oct 18, 2004, 7:05:12 PM10/18/04

to

Randy wrote:
[...]

> Number 1: I'm not clear why C++ programmers need to know the details of hardware
> memory consistency models...

Even C++-castrated-edition programmers need to know acquire/release. Go
read the recent comp.std.c's

http://groups.google.com/groups?threadm=41714B17.F88F58AB%40web.de
(Subject: Re: volatile problem)

thread (I mean the entire thread... embedded links including ;-) ).

regards,
alexander.

Joe Seigh

unread,

Oct 18, 2004, 8:05:07 PM10/18/04

to

Scott Meyers wrote:
>
> Recently I got smacked around in comp.lang.c++.moderated about the
> semantics of acquire and release in the release consistency memory model.
> I found that none of the books I have here on concurrent programming,
>

[...]

>
> describes the release consistency model and the meaning of acquire and
> release, at least not if you look up "acquire" and "release" in the
> indices. I have enough experience with wretched indices to realize that
> just because it's not in the index doesn't mean it's not in the book, but
> still, I've found it very hard to come by a decent description of what
> acquire and release mean both technically and conceptually. Frankly,
> understanding didn't really dawn on me until I read the original ISCA90
> paper by Gharchorloo et al, and going to a primary source for what seems
> like pretty basic information seems like a lot to ask of practicing
> programmers.
>

[...]

>
> So here's my question: does anybody know of any good, readable, accessible
> source of information on this topic I can refer practicing C/C++
> programmers to? (For a variety of reasons, Javacentric references are not
> really very good for this.) I expect to be covering this material with
> increasing frequency, and as things stand now, there's not really any place
> I can point people who want more information or a different treatment of
> the same topic.
>

Unfortunately no. This comment from the Single Unix specification

Formal definitions of the memory model were rejected as unreadable by the vast
majority of programmers. In addition, most of the formal work in the literature has
concentrated on the memory as provided by the hardware as opposed to the application
programmer through the compiler and runtime system. It was believed that a simple
statement intuitive to most programmers would be most effective.
IEEE Std 1003.1-2001 defines functions that can be used to synchronize access to
memory, but it leaves open exactly how one relates those functions to the semantics of
each function as specified elsewhere in IEEE Std 1003.1-2001. IEEE Std 1003.1-2001
also does not make a formal specification of the partial ordering in time that the
functions can impose, as that is implied in the description of the semantics of each
function. It simply states that the programmer has to ensure that modifications do not
occur "simultaneously" with other access to a memory location.

sort of explains it. If you read between the lines you can take it as nobody could
figure out how to do it, so they didn't attempt to. I understand the problem with
memory models as they tend to be tied to a particular hardware definition but I don't
believe you need a memory model to define semantics for various forms of synchronization.
I tend to lean towards Guttag's style of algebraic specification which lets you put things
in terms of strictly program observable effects. There seemed to be a bit of antipathy towards
this approach so I haven't really done too much with it.

The informal meaning of acquire and release is the effect on memory visibility of acquiring
and releasing a lock which presupposes you know what those effects are in the first place,
which doesn't do one much good if they don't know what those are.

There is a problem however in that acquire and release semantics don't necesarily translate
to other synchronization primatives besides locks. The semantics can be different, sometimes
subtlety so and you can get into trouble either in the implementation or in the application.
So you really need more than just acquire and release definitions.

The synchronization contructs I have or would have definitions for are

thread creation and termination
mutexes
signaling (events, semaphores, and condition variables)
smart pointers (atomic_ptr, atomic<T>, atomic<*T>, etc...)
generic memory barriers (platform independent)

The major work is just defining visibility itself without resorting to a
memory model.

Joe Seigh

Alexander Terekhov

unread,

Oct 18, 2004, 10:25:04 PM10/18/04

to

Joe Seigh wrote:
[...]

> The informal meaning of acquire and release is the effect on memory visibility of acquiring
> and releasing a lock which presupposes you know what those effects are in the first place,
> which doesn't do one much good if they don't know what those are.
>
> There is a problem however in that acquire and release semantics don't necesarily translate
> to other synchronization primatives besides locks. The semantics can be different, sometimes
> subtlety so and you can get into trouble either in the implementation or in the application.
> So you really need more than just acquire and release definitions.

Yeah.

msync::none // nothing (e.g. for refcount<T, basic>::increment)
msync::fence // classic fence (acq+rel -- see below)
msync::acq // classic acquire (hlb+hsb -- see below)
msync::ddacq // acquire via data dependency
msync::hlb // hoist-load barrier -- acquire not affecting stores
msync::ddhlb // ...
msync::hsb // hoist-store barrier -- acquire not affecting loads
msync::ddhsb // ...
msync::rel // classic release (slb+ssb -- see below)
msync::slb // sink-load barrier -- release not affecting stores
msync::ssb // sink-store barrier -- release not affecting loads
msync::slfence // store-load fence (ssb+hlb -- see above)
msync::sfence // store-fence (ssb+hsb -- see above)
msync::lfence // load-fence (slb+hlb -- see above)

Note that unidirectional stuff can be used only in conjunction with
certain atomic<> accesses to "label" them. I mean:

atomic<int> X;

/* ... */
int x = X.load(msync::acq);

/* ... */
X.store(x, msync::rel);

Compare it to use of bidirectional fences... something like

atomic<int> X;
atomic<int> Y;

/* ... */
X.store(0, msync::rel);
barrier(msync::slfence);
int y = Y.load(msync::acq);

In a way, barrier() simply translates to a "NOP-access" with a
bidirectional barrier label on it.

It's quite simple.

regards,
alexander.

Scott Meyers

unread,

Oct 18, 2004, 11:41:19 PM10/18/04

to

On Mon, 18 Oct 2004 17:17:25 -0500, Randy wrote:
> Number 1: I'm not clear why C++ programmers need to know the details of hardware
> memory consistency models...

Andrei Alexandrescu and I recently wrote an article in DDJ explaining why
double-checked locking isn't reliable in C++. This is becoming old news to
Java programmers and very old news to readers of this newsgroup, I think,
but it still suprises the heck out of many C++ programmers. You can tell
them, "just use your threading library's function calls when accessing
shared state, and everything will be fine," and they'll nod and smile and
then try to outsmart their compiler. They don't realize the existence of
hardware memory models, much less their importance, and when they do find
out about them, they want to deal with things at the lowest level possible.
These are, after all, C/C++ programmers. Which means they need to know
about acquire and release and what they do and how to use them. As things
stand now, I can do my best to explain what they are and how they work, but
I'm far from an expert, and anyway, it'd be nice to have a place to point
them for an explanation different from mine.

> One of the more authoritative sources of a variety of consistency models is
> David Culler's textbook "Parallel Computer Architecture: A Hardware/Software
> Approach":
> http://www.cs.berkeley.edu/%7Eculler/book.alpha/index.html

I don't have a copy of this, but I have to say that my reaction to anything
1100 pages long is that it's, if nothing else, intimidating. Doe it have a
good description of acquire/release from the point of view of somebody who
just wants to make their program work correctly?

> Secondly, what about the following?
>
> http://www.cs.nmsu.edu/~pfeiffer/classes/573/notes/consistency.html
> "
> Release Consistency
>
> Having the single synchronization access type requires that, when a
> synchronization occurs, we need to globally update memory - our local changes
> need to be propagated to all the other processors with copies of the shared
> variable, and we need to obtain their changes. Release consistency considers
> locks on areas of memory, and propagates only the locked memory as needed. It's
> defined as follows:
>
> 1. Before an ordinary access to a shared variable is performed, all
> previous acquires done by the process must have completed
> successfully.
> 2. Before a release is allowed to be performed, all previous reads and
> writes done by the process must have completed.
> 3. The acquire and release accesses must be sequentially consistent.

Based on a quick perusal, two things, in my view. First, this says what
the SYSTEM must do, but not what programmers must do. For example, bullet
1 says nothing about when a programmer would want to label a read an as
acquire, and bullet 2 says nothing about when a programmer would want to
label a write as a release. Second, I may be naive here, but I'd like to
think that programmers could be trained about how to use acquire/release
pairs without having to learn about the various kinds of consistency. Even
now, my head starts to swoon when confronted with sequential consistency,
processor consistency, weak consistency, and release consistency. (Based
again on only a quick perusal, it looked like the other links you posted,
had similar problems. One described *nine* different consistency models.)

I think of things this way:

- When you want to read the value of a variable giving you permission to
access shared state, label the read as an acquire.

- When you are done accessing shared state, label the write of the
permission variable as a release.

If this is correct, I don't see the need to burden programmers with a
detailed understanding of the various memory models. If this is not
correct, please tell me why, because I don't want to propagate incorrect
information.

I realize that I may be naively hoping that programmers can be shielded
from the details of memory models. If they can't be, feel free to burst
my bubble.

Scott

SenderX

unread,

Oct 19, 2004, 12:31:03 AM10/19/04

to

> There is a problem however in that acquire and release semantics don't
> necesarily translate
> to other synchronization primatives besides locks.

I think acquire/release/full barriers would basically cover everything. They
would not be the most efficient barriers for some sync primitives, but they
would get the job done.

acquire could be thought of as a "consumer" of shared memory, and release
would be a "producer".

So, applying those rules to win32 primitives:

EnterCriticalSection would use acquire because its a consumer of shared
memory.

LeaveCriticalSection would use release because its a producer of shared
memory.

ReleaseSemaphore, ReleaseMutex, and SetEvent would use release because their
producers.

WaitForXXX api uses acquire, consumer

producer and consumer are terms everybody knows?

;)

Jonathan Adams

unread,

Oct 19, 2004, 1:27:50 AM10/19/04

to

In article <aM0dd.151536$He1.80998@attbi_s01>, "SenderX" <x...@xxx.com>
wrote:

> > There is a problem however in that acquire and release semantics don't
> > necesarily translate
> > to other synchronization primatives besides locks.
>
> I think acquire/release/full barriers would basically cover everything. They
> would not be the most efficient barriers for some sync primitives, but they
> would get the job done.
>
> acquire could be thought of as a "consumer" of shared memory, and release
> would be a "producer".

As an aside, the Solaris kernel uses for its barriers:

membar_producer(); /* store/store barrier */
membar_consumer(); /* load/load barrier */

for acquire/release, it's enter/exit:

member_enter(); /* store/load+store/store */
membar_exit(); /* load/store+store/store */

I find these pretty intuitive, but my introduction to memory models was
the sparcv9 architecture book, so I may be biased.

> producer and consumer are terms everybody knows?

I find them much easier to follow than "hoist/sink", which I'm guessing
is from compiler jargon -- certainly most people doing MT work will have
done producer/consumer problems, so the terminology is more relevant.
The "acquire/release" terminology is so tied to the implementation of
locks that it feels more obscure then "producer/consumer" to me (but
less so than "hoist/sink").

Cheers,
- jonathan

Alexander Terekhov

unread,

Oct 19, 2004, 1:41:13 AM10/19/04

to

SenderX wrote:
[...]

> producer and consumer are terms everybody knows?

Sure. I, for one, sorta know that producers probably need not restrict
loads and consumers probably need not restrict stores. Acquire/release
is way too restrictive... as is the bidirectional load and store bars
(I mean SUN's stuff; see http://tinyurl.com/43tth). Producers need
just sink-store and consumers need just hoist-load (most likely/quite
often just "ddhld" label thing imposed on some fetch{-and-whatever}).

regards,
alexander.

Eyal C

unread,

Oct 19, 2004, 6:16:41 AM10/19/04

to

Unfortunately, there isn't any API standard for covering memory consistency.
Even the Single Unix Specification leaves this area unhandled.
The usual practice for many platforms can be stated like this: every
shared-variable must be enclosed with your threads library's synchronization
objects. The stress here is on *every* of course, even single variables
(used as flags).

Now, since this is only a practice, I am not sure which platforms does it
apply to. It's true for Linux, Solaris, Win32, Java runtime. But what about
.NET for instance, or other flavors of UNIX or even VMS. So I think that if
you want to give your readers a general rule of the thumb, you would have to
restrict it to specific platforms, and warn the reader to explore the issue
for other platforms. I guess that a good source for information would be
cross-platform thread libraries, like ACE.

In addition, if some reader would like to avoid using the platform
synchronization objects, and use some more fine-grained memory APIs, you
should warn him that this is highly platform-dependent, and worse - it's
very much error-prone, both in the "memory-consistency level" and also in
the algorithmic level of the program. For instance, the Win32 exposes the
InterLockedxxx system calls. But it's not obvious from the documentation
whether they do the entire job - they do use memory barries to prevent
compiler/processor instruction re-ordering, but nothing is said about
flusing the memory unit of the processor in a multi-processor machine (so
threads running on other processors can see the modified values). I guess
that the Interlockedxxx do that also, but the point here is that if a reader
wants to mess up with things, he is entering a very problematic area.

One final note - the "volatile" keyword implementation is also highly
platform-dependent, so one should only trust its threads library to
synchronize. For instance, the Microsoft Win32 compiler treats a volatile
variable by ensuring that the compiler doesn't re-order the variable access
instructions. However, it doesn't prevent the processor from doing that,
even on single-processor machines, like the new Intel/AMD processors do.

So to sum up, when it comes to correctly synchronizing, the best advise is
to keep it simple. *Always* use a thread library synchronization objects in
a simple manner (enclosing shared-resource code with lock/unlock APIs). It
works, and the time penalty is negligible comparing to using APIs that mess
with the "memory-consistency" directly. If you want to improve on that,
improve your algorithm to either create a "lock manager" or implement a
"lock-free" style of an algorithm (this is of course a total subject in
itself). In either case, use your thread library synchronization objects as
building blocks.

Eyal C

Joe Seigh

unread,

Oct 19, 2004, 7:24:58 AM10/19/04

to

Eyal C wrote:
>
[...]

> In addition, if some reader would like to avoid using the platform
> synchronization objects, and use some more fine-grained memory APIs, you
> should warn him that this is highly platform-dependent, and worse - it's
> very much error-prone, both in the "memory-consistency level" and also in
> the algorithmic level of the program. For instance, the Win32 exposes the
> InterLockedxxx system calls. But it's not obvious from the documentation
> whether they do the entire job - they do use memory barries to prevent
> compiler/processor instruction re-ordering, but nothing is said about
> flusing the memory unit of the processor in a multi-processor machine (so
> threads running on other processors can see the modified values). I guess
> that the Interlockedxxx do that also, but the point here is that if a reader
> wants to mess up with things, he is entering a very problematic area.

The reason nothing is said about flushing cache is because nothing has to be
done about flushing cache. Thinking cache has anything to do with the memory
model is a common misconception. Most cache is coherent cache and is transparent
by definition. In fact I don't think you can port Posix to a non coherent cache
system due to the fact that false sharing would break even correctly written programs.

[...]

> So to sum up, when it comes to correctly synchronizing, the best advise is
> to keep it simple. *Always* use a thread library synchronization objects in
> a simple manner (enclosing shared-resource code with lock/unlock APIs). It
> works, and the time penalty is negligible comparing to using APIs that mess
> with the "memory-consistency" directly. If you want to improve on that,
> improve your algorithm to either create a "lock manager" or implement a
> "lock-free" style of an algorithm (this is of course a total subject in
> itself). In either case, use your thread library synchronization objects as
> building blocks.
>

The problem is there are no explicit rules for using threads correctly. The
"simple" rule you've stated doesn't describe a correct implementation of DCL
where the global pointer is loaded while holding the lock. If the pointer
is not null, the lock is released and the shared singleton is accessed *without*
a lock. And it's perfectly valid. That's not described by your simple rule.

Joe Seigh