Recently I got smacked around in comp.lang.c++.moderated about the semantics of acquire and release in the release consistency memory model. I found that none of the books I have here on concurrent programming,
Parallel and Distributed Programming in C++ by Hughes and Hughes, AW, 2003.
Multithreaded Programming iwth Java Technology by Lewis and Berg, PH, 2000.
Principles of Concurrent and Distributed Programming, Ben-Ari, PH, 1990.
Foundations of Multithreaded, Parallel, and Distributed Programming by Andrews, AW, 2000.
Programming with POSIX Threads by Butenhof, AW, 1997.
describes the release consistency model and the meaning of acquire and release, at least not if you look up "acquire" and "release" in the indices. I have enough experience with wretched indices to realize that just because it's not in the index doesn't mean it's not in the book, but still, I've found it very hard to come by a decent description of what acquire and release mean both technically and conceptually. Frankly, understanding didn't really dawn on me until I read the original ISCA90 paper by Gharchorloo et al, and going to a primary source for what seems like pretty basic information seems like a lot to ask of practicing programmers.
I googled for a good online source of information on this topic, but I didn't really come up with anything. There are various papers and powerpoint presentations, but all have an academic bent, and I found that I didn't really understand them until I already understood what they were saying. Reference-manual-type descriptions explain what acquire/release do (e.g., http://msdn.microsoft.com/library/default.asp?url=/library/en-us/kmarch/ hh/kmarch/Synchro_88127404-a394-403f-a289-d61c45ab81d5.xml.asp), but they don't explain the conceptual relationship among acquire, release, and e.g., having a producer communicate with a consumer.
So here's my question: does anybody know of any good, readable, accessible source of information on this topic I can refer practicing C/C++ programmers to? (For a variety of reasons, Javacentric references are not really very good for this.) I expect to be covering this material with increasing frequency, and as things stand now, there's not really any place I can point people who want more information or a different treatment of the same topic.
Scott Meyers wrote: > Recently I got smacked around in comp.lang.c++.moderated about the > semantics of acquire and release in the release consistency memory model. > I found that none of the books I have here on concurrent programming,
> Parallel and Distributed Programming in C++ by Hughes and Hughes, AW, > 2003.
> Multithreaded Programming iwth Java Technology by Lewis and Berg, PH, > 2000.
> Principles of Concurrent and Distributed Programming, Ben-Ari, PH, 1990.
> Foundations of Multithreaded, Parallel, and Distributed Programming by > Andrews, AW, 2000.
> Programming with POSIX Threads by Butenhof, AW, 1997.
> describes the release consistency model and the meaning of acquire and > release, at least not if you look up "acquire" and "release" in the > indices. I have enough experience with wretched indices to realize that > just because it's not in the index doesn't mean it's not in the book, but > still, I've found it very hard to come by a decent description of what > acquire and release mean both technically and conceptually. Frankly, > understanding didn't really dawn on me until I read the original ISCA90 > paper by Gharchorloo et al, and going to a primary source for what seems > like pretty basic information seems like a lot to ask of practicing > programmers.
> I googled for a good online source of information on this topic, but I > didn't really come up with anything. There are various papers and > powerpoint presentations, but all have an academic bent, and I found that I > didn't really understand them until I already understood what they were > saying. Reference-manual-type descriptions explain what acquire/release do > (e.g., > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/kmarch/ > hh/kmarch/Synchro_88127404-a394-403f-a289-d61c45ab81d5.xml.asp), but they > don't explain the conceptual relationship among acquire, release, and e.g., > having a producer communicate with a consumer.
> So here's my question: does anybody know of any good, readable, accessible > source of information on this topic I can refer practicing C/C++ > programmers to? (For a variety of reasons, Javacentric references are not > really very good for this.) I expect to be covering this material with > increasing frequency, and as things stand now, there's not really any place > I can point people who want more information or a different treatment of > the same topic.
> Thanks,
> Scott
First of all, I may have misread your question, but here goes anyway...
Number 1: I'm not clear why C++ programmers need to know the details of hardware memory consistency models...
Number 2: I also think you're going to have trouble finding definition of 'acquire' and 'release' at a software level, since they're really meaningful only at the hardware level. Other than some abstract hand waving, I don't know how to tell a C++ programmer how to acquire or release a cache line. And given all the hidden machinery at work within today's out-of-order CPUs, the programmer simply can never know if/when a cache line has been released.
Number 3: If you're referring to implementing memory consistency entirely at the software level, it seems to me that acquire <--> lock and release <--> unlock, assuming that an atomic test-and-set operation is available. Of course, semaphores (e.g. pthread_cond_*) or mutexes (e.g. pthread_mutex_*) are the typical ways to implement locks using pthreads.
Number 4: If all you want is a _description_ of the release consistency model, there are several of those...
Having the single synchronization access type requires that, when a synchronization occurs, we need to globally update memory - our local changes need to be propagated to all the other processors with copies of the shared variable, and we need to obtain their changes. Release consistency considers locks on areas of memory, and propagates only the locked memory as needed. It's defined as follows:
1. Before an ordinary access to a shared variable is performed, all previous acquires done by the process must have completed successfully. 2. Before a release is allowed to be performed, all previous reads and writes done by the process must have completed. 3. The acquire and release accesses must be sequentially consistent. "
> Recently I got smacked around in comp.lang.c++.moderated about the > semantics of acquire and release in the release consistency memory model. > I found that none of the books I have here on concurrent programming,
[...]
> describes the release consistency model and the meaning of acquire and > release, at least not if you look up "acquire" and "release" in the > indices. I have enough experience with wretched indices to realize that > just because it's not in the index doesn't mean it's not in the book, but > still, I've found it very hard to come by a decent description of what > acquire and release mean both technically and conceptually. Frankly, > understanding didn't really dawn on me until I read the original ISCA90 > paper by Gharchorloo et al, and going to a primary source for what seems > like pretty basic information seems like a lot to ask of practicing > programmers.
[...]
> So here's my question: does anybody know of any good, readable, accessible > source of information on this topic I can refer practicing C/C++ > programmers to? (For a variety of reasons, Javacentric references are not > really very good for this.) I expect to be covering this material with > increasing frequency, and as things stand now, there's not really any place > I can point people who want more information or a different treatment of > the same topic.
Unfortunately no. This comment from the Single Unix specification
Formal definitions of the memory model were rejected as unreadable by the vast majority of programmers. In addition, most of the formal work in the literature has concentrated on the memory as provided by the hardware as opposed to the application programmer through the compiler and runtime system. It was believed that a simple statement intuitive to most programmers would be most effective. IEEE Std 1003.1-2001 defines functions that can be used to synchronize access to memory, but it leaves open exactly how one relates those functions to the semantics of each function as specified elsewhere in IEEE Std 1003.1-2001. IEEE Std 1003.1-2001 also does not make a formal specification of the partial ordering in time that the functions can impose, as that is implied in the description of the semantics of each function. It simply states that the programmer has to ensure that modifications do not occur "simultaneously" with other access to a memory location.
sort of explains it. If you read between the lines you can take it as nobody could figure out how to do it, so they didn't attempt to. I understand the problem with memory models as they tend to be tied to a particular hardware definition but I don't believe you need a memory model to define semantics for various forms of synchronization. I tend to lean towards Guttag's style of algebraic specification which lets you put things in terms of strictly program observable effects. There seemed to be a bit of antipathy towards this approach so I haven't really done too much with it.
The informal meaning of acquire and release is the effect on memory visibility of acquiring and releasing a lock which presupposes you know what those effects are in the first place, which doesn't do one much good if they don't know what those are.
There is a problem however in that acquire and release semantics don't necesarily translate to other synchronization primatives besides locks. The semantics can be different, sometimes subtlety so and you can get into trouble either in the implementation or in the application. So you really need more than just acquire and release definitions.
The synchronization contructs I have or would have definitions for are
> The informal meaning of acquire and release is the effect on memory visibility of acquiring > and releasing a lock which presupposes you know what those effects are in the first place, > which doesn't do one much good if they don't know what those are.
> There is a problem however in that acquire and release semantics don't necesarily translate > to other synchronization primatives besides locks. The semantics can be different, sometimes > subtlety so and you can get into trouble either in the implementation or in the application. > So you really need more than just acquire and release definitions.
Yeah.
msync::none // nothing (e.g. for refcount<T, basic>::increment) msync::fence // classic fence (acq+rel -- see below) msync::acq // classic acquire (hlb+hsb -- see below) msync::ddacq // acquire via data dependency msync::hlb // hoist-load barrier -- acquire not affecting stores msync::ddhlb // ... msync::hsb // hoist-store barrier -- acquire not affecting loads msync::ddhsb // ... msync::rel // classic release (slb+ssb -- see below) msync::slb // sink-load barrier -- release not affecting stores msync::ssb // sink-store barrier -- release not affecting loads msync::slfence // store-load fence (ssb+hlb -- see above) msync::sfence // store-fence (ssb+hsb -- see above) msync::lfence // load-fence (slb+hlb -- see above)
Note that unidirectional stuff can be used only in conjunction with certain atomic<> accesses to "label" them. I mean:
atomic<int> X;
/* ... */ int x = X.load(msync::acq);
/* ... */ X.store(x, msync::rel);
Compare it to use of bidirectional fences... something like
atomic<int> X; atomic<int> Y;
/* ... */ X.store(0, msync::rel); barrier(msync::slfence); int y = Y.load(msync::acq);
In a way, barrier() simply translates to a "NOP-access" with a bidirectional barrier label on it.
On Mon, 18 Oct 2004 17:17:25 -0500, Randy wrote: > Number 1: I'm not clear why C++ programmers need to know the details of hardware > memory consistency models...
Andrei Alexandrescu and I recently wrote an article in DDJ explaining why double-checked locking isn't reliable in C++. This is becoming old news to Java programmers and very old news to readers of this newsgroup, I think, but it still suprises the heck out of many C++ programmers. You can tell them, "just use your threading library's function calls when accessing shared state, and everything will be fine," and they'll nod and smile and then try to outsmart their compiler. They don't realize the existence of hardware memory models, much less their importance, and when they do find out about them, they want to deal with things at the lowest level possible. These are, after all, C/C++ programmers. Which means they need to know about acquire and release and what they do and how to use them. As things stand now, I can do my best to explain what they are and how they work, but I'm far from an expert, and anyway, it'd be nice to have a place to point them for an explanation different from mine.
I don't have a copy of this, but I have to say that my reaction to anything 1100 pages long is that it's, if nothing else, intimidating. Doe it have a good description of acquire/release from the point of view of somebody who just wants to make their program work correctly?
> Having the single synchronization access type requires that, when a > synchronization occurs, we need to globally update memory - our local changes > need to be propagated to all the other processors with copies of the shared > variable, and we need to obtain their changes. Release consistency considers > locks on areas of memory, and propagates only the locked memory as needed. It's > defined as follows:
> 1. Before an ordinary access to a shared variable is performed, all > previous acquires done by the process must have completed > successfully. > 2. Before a release is allowed to be performed, all previous reads and > writes done by the process must have completed. > 3. The acquire and release accesses must be sequentially consistent.
Based on a quick perusal, two things, in my view. First, this says what the SYSTEM must do, but not what programmers must do. For example, bullet 1 says nothing about when a programmer would want to label a read an as acquire, and bullet 2 says nothing about when a programmer would want to label a write as a release. Second, I may be naive here, but I'd like to think that programmers could be trained about how to use acquire/release pairs without having to learn about the various kinds of consistency. Even now, my head starts to swoon when confronted with sequential consistency, processor consistency, weak consistency, and release consistency. (Based again on only a quick perusal, it looked like the other links you posted, had similar problems. One described *nine* different consistency models.)
I think of things this way:
- When you want to read the value of a variable giving you permission to access shared state, label the read as an acquire.
- When you are done accessing shared state, label the write of the permission variable as a release.
If this is correct, I don't see the need to burden programmers with a detailed understanding of the various memory models. If this is not correct, please tell me why, because I don't want to propagate incorrect information.
I realize that I may be naively hoping that programmers can be shielded from the details of memory models. If they can't be, feel free to burst my bubble.
> There is a problem however in that acquire and release semantics don't > necesarily translate > to other synchronization primatives besides locks.
I think acquire/release/full barriers would basically cover everything. They would not be the most efficient barriers for some sync primitives, but they would get the job done.
acquire could be thought of as a "consumer" of shared memory, and release would be a "producer".
So, applying those rules to win32 primitives:
EnterCriticalSection would use acquire because its a consumer of shared memory.
LeaveCriticalSection would use release because its a producer of shared memory.
ReleaseSemaphore, ReleaseMutex, and SetEvent would use release because their producers.
In article <aM0dd.151536$He1.80998@attbi_s01>, "SenderX" <x...@xxx.com> wrote:
> > There is a problem however in that acquire and release semantics don't > > necesarily translate > > to other synchronization primatives besides locks.
> I think acquire/release/full barriers would basically cover everything. They > would not be the most efficient barriers for some sync primitives, but they > would get the job done.
> acquire could be thought of as a "consumer" of shared memory, and release > would be a "producer".
As an aside, the Solaris kernel uses for its barriers:
I find these pretty intuitive, but my introduction to memory models was the sparcv9 architecture book, so I may be biased.
> producer and consumer are terms everybody knows?
I find them much easier to follow than "hoist/sink", which I'm guessing is from compiler jargon -- certainly most people doing MT work will have done producer/consumer problems, so the terminology is more relevant. The "acquire/release" terminology is so tied to the implementation of locks that it feels more obscure then "producer/consumer" to me (but less so than "hoist/sink").
> producer and consumer are terms everybody knows?
Sure. I, for one, sorta know that producers probably need not restrict loads and consumers probably need not restrict stores. Acquire/release is way too restrictive... as is the bidirectional load and store bars (I mean SUN's stuff; see http://tinyurl.com/43tth). Producers need just sink-store and consumers need just hoist-load (most likely/quite often just "ddhld" label thing imposed on some fetch{-and-whatever}).
Unfortunately, there isn't any API standard for covering memory consistency. Even the Single Unix Specification leaves this area unhandled. The usual practice for many platforms can be stated like this: every shared-variable must be enclosed with your threads library's synchronization objects. The stress here is on *every* of course, even single variables (used as flags).
Now, since this is only a practice, I am not sure which platforms does it apply to. It's true for Linux, Solaris, Win32, Java runtime. But what about .NET for instance, or other flavors of UNIX or even VMS. So I think that if you want to give your readers a general rule of the thumb, you would have to restrict it to specific platforms, and warn the reader to explore the issue for other platforms. I guess that a good source for information would be cross-platform thread libraries, like ACE.
In addition, if some reader would like to avoid using the platform synchronization objects, and use some more fine-grained memory APIs, you should warn him that this is highly platform-dependent, and worse - it's