synchronized and ReentrantLock are REALLY the same?

Francesco Nigro

unread,

May 16, 2018, 3:32:05 AM5/16/18

to mechanical-sympathy

Hi guys!

probably this one should be more a concurrency-interest question, but I'm sure that's will fit with most of the people around here as well :)

I was looking to how ReentrantLock and synchronized are different from a semantic point of view and I've found (there are no experimental proofs on my side TBH) something interesting on the http://gee.cs.oswego.edu/dl/jmm/cookbook.html.

It seems to me that:

normal store;

monitorEnter;

[mutual exclusion zone]

monitorExit;

....

is rather different from a:

normal store;

(loop of...)

volatile load

volatile store (=== CAS)

---

[mutual exclusion zone]

volatile store

With the former to represent a synchronized block and the latter a spin lock acquisition and release, both with a normal store on top.

From anyone coming from other worlds than the JVM i suppose that the volatile load could be translated as a load acquire, while the volatile store as a sequential consistent store.

About the monitorEnter/Exit that's more difficult to find something that fit other known memory models (C++) and that's the reason of my request :)

From the point of view of the compiler guarantees seems that a normal store (ie any store release as well) preceding monitorEnter could be moved inside the mutual exclusion zone (past the monitorEnter), because the compiler isn't enforcing

any memory barrier between them, while with the current implementation of a j.u.c. lock that's can't happen.

That's the biggest difference I could spot on them, but I'm struggling to find anything (beside looking at the JVM source code) that observe/trigger such compiler re-ordering.

What do you think about this? I'm just worried of something that actually isn't implemented/isn't happening on any known JVM implementation?

Thanks,

Franz

Gil Tene

unread,

May 16, 2018, 11:26:59 AM5/16/18

to mechanical-sympathy

Note that Doug' Lea's JMM cookbook is written for implementors of JDKs and related libraries and JITs, NOT for users of those JDKs and libraries. It says so right in the title. It describes rules that would result in a sufficient implementation of the JMM but is not useful for deducing the required or expected behavior of all JMM implementations. Most JMM implementations go beyond the cookbook rules in at least some places and apply JMM-valid transformations that are not included in it and can be viewed as "shortcuts" that bypass some of the rules in the cookbook. There are many examples of this in practice. Lock coarsening and lock biasing optimizations are two good example sets.

This means that you need to read the cookbook very carefully, and (specifically) that you should not interpret it as a promise of what the relationships between various operations are guaranteed to be. If you use the cookbook for the latter, your code will break.

Putting aside the current under-the-hood implementations of monitor enter/exit and of ReentrantLock (which may and will change), the requirements are clear:

from e.g. https://docs.oracle.com/javase/10/docs/api/java/util/concurrent/locks/ReentrantLock.html:

"A reentrant mutual exclusion Lock with the same basic behavior and semantics as the implicit monitor lock accessed using synchronized methods and statements, but with extended capabilities."

from e.g. https://docs.oracle.com/javase/10/docs/api/java/util/concurrent/locks/Lock.html:

"Memory Synchronization
All Lock implementations must enforce the same memory synchronization semantics as provided by the built-in monitor lock, as described in Chapter 17 of The Java™ Language Specification:

A successful lock operation has the same memory synchronization effects as a successful Lock action.
A successful unlock operation has the same memory synchronization effects as a successful Unlock action.

Unsuccessful locking and unlocking operations, and reentrant locking/unlocking operations, do not require any memory synchronization effects."

So based on the spec, I'd say that you cannot make any assumptions about semantic differences between ReentrantLock and synchronized blocks (even if you find current implementation differences).

On Wednesday, May 16,

AFAIK Most JVM implementations will absolutely do this for monitors and will commonly re-order to move stores and loads that precede a monitor enter such that they execute after the monitor enter (but before the associated monitor exit). Most forms of lock coarsening will have this effect in actual emitted code (that any cpu will then see, regardless of architecture and CPU memory model). In addition, lock biasing optimizations (on monitors) will often result in emitted code that does not enforce a storestore or loadstore order between the monitor enter and preceding loads or stores on some architectures, allowing the processor to reorder that (biased to this thread) monitor enter operation (and potentially some operations that follow) with prior loads and stores.

The exact same optimizations would be valid to do for ReentrantLock (since per the spec, it shares the same memory ordering semantics requirements), but I think that in (currently) practice there are fewer JIT optimizations applied to ReentrantLock than to monitor enter/exit.

Thanks,
Franz

Francesco Nigro

unread,

May 18, 2018, 3:30:08 AM5/18/18

to mechanical-sympathy

Thanks Gil!

I will failback to my original (semi-practical) concern, using this renewed knowledge :)

Suppose that we want to perform write operations surrounding both a j.u.c. Lock and synchronized mutual exclusion block and we want:

these writes operations to not being moved inside the block and maintain their relative positions from it
the effects of the writes would be atomically readable from other threads

Given that we can't assume any semantic difference betwenn the j.u.c Lock and intrinsic ones and there aren't clearly listed (but on specific implementations/the Cookbook) any effects on the sorrounding code, how we can implement it correctly?

And...we can do it just with this knowledge?

The only solution I see is by using a volatile store on both (or at least on the first) writes operation, while ordered (aka lazySet) ones can't work as expected.

Cheers,

Franz

Gil Tene

unread,

May 19, 2018, 12:10:45 AM5/19/18

to mechanical-sympathy

On Friday, May 18, 2018 at 2:30:08 AM UTC-5, Francesco Nigro wrote:

Thanks Gil!

I will failback to my original (semi-practical) concern, using this renewed knowledge :)
Suppose that we want to perform write operations surrounding both a j.u.c. Lock and synchronized mutual exclusion block and we want:
these writes operations to not being moved inside the block and maintain their relative positions from it

Preventing your appear-prior-to-lock-acquisition writes from "moving into the block" is subtly different from preventing their re-ordering with writes and reads that are within the block. Are you sure you want the former and not just the latter?

It is easier to see how to prevent the latter. All you have to do is order the earlier writes against those in-block writes and/or reads, ignoring the lock. This is where, for example, a lazySet on the inside-the-block writes will order them against the before-the-block writes, regardless of how the monitor enter is dealt with, which can save you from using volatiles. If there are reads within the block that you need to order against the before-the-block writes, you'd need to use volatiles (e.g. a volatile store for the last before-the-block write, AND a volatile load for the first in-the-block read).

If you actually want to prevent the former (and there are ways to observe whether or not reordering "into" the block occur), you may need more complicated things. But do you really need that? Someone may be able to find some way to detect whether or not such reordering of the writes and the lock-enter happens [I'm actually not sure whether such reordering, without also reordering against writes and reads in the block, is detectable]. An if that detection is possible, it also means that [by definition] someone can build a concurrent algorithm that depends on the reordering behavior not occurring. But it seems [to me] like a pretty complicated and sensitive thing to build a dependency on.

2. the effects of the writes would be atomically readable from other threads

Which sort of atomicity are you referring to here? atomicity within a single store (i.e. no word tearing of a long or a double), or atomicity across multiple such stores?

If it's the first, (no word tearing) a volatile write will ensure that (in a fairly expensive way), but a lazyset will also ensure the same with a much lower cost.

If it's the second (atomicity across multiple such stores, you need something like a synchronized block or a locked code region for the writes that you need atomicity across to live in. Such a block can be coarsened (e.g. joined with a subsequent block, or hoisted out of a loop) block, or it may be optimized in various other ways (e.g. biased locking), but whatever valid things happen to it, the atomicity across the writes within it will remain when seen by other threads.

Gil Tene

unread,

May 19, 2018, 12:21:19 AM5/19/18

to mechanical-sympathy

On Friday, May 18, 2018 at 11:10:45 PM UTC-5, Gil Tene wrote:

On Friday, May 18, 2018 at 2:30:08 AM UTC-5, Francesco Nigro wrote:
Thanks Gil!

I will failback to my original (semi-practical) concern, using this renewed knowledge :)
Suppose that we want to perform write operations surrounding both a j.u.c. Lock and synchronized mutual exclusion block and we want:
these writes operations to not being moved inside the block and maintain their relative positions from it
Preventing your appear-prior-to-lock-acquisition writes from "moving into the block" is subtly different from preventing their re-ordering with writes and reads that are within the block. Are you sure you want the former and not just the latter?

It is also worth noting that since any writes moving into the synchronized (or locked) block WILL appear atomically with any other reads and writes within the block to any other thread that synchronizes (or locks) on the same object (or lock), the only threads that may observe this reorder-into-the-block effects are ones that would also potentially observe the things *within* the block in non-atomic and non-synchronized ways.

So the purpose of preventing the reordering of things into the block looks suspicious, to begin with. I'm not saying there is no possible reason for it, just that it seems suspect. As in "uses non-synchronized accesses to things that are elsewhere intentionally done under a lock", Which spells "many bugs can probably found here" to me.

Francesco Nigro

unread,

May 21, 2018, 3:29:06 AM5/21/18

to mechanica...@googlegroups.com

Preventing your appear-prior-to-lock-acquisition writes from "moving into the block" is subtly different from preventing their re-ordering with writes and reads that are within the block. Are you sure you want the former and not just the latter?

Yes or, at least, it seems the only way I have found to implement a sort of naive "deadlock"( or better, a "suspected" slowness) detector.

Supposing to have a ordered executor API (ie and executor that ensure that all the submitted tasks would be executed in order, but not necessarly by one/the same thread) and

some alien client library with synchronized calls, in order to detect at runtime that the synchronized calls are not deadlocked or simply unable to leave the synchronized block due to

some suspected slowness I was thinking to implement a watchdog service that monitor at interval each Thread used by the ordered executor, polling the last elapsed time before a thread

was seen approaching to enter into a synchronized block.

Translated into pseudo-code:

you hava a

public synchronized foo(){

}

and some code:

...

executor.submit(()->alien.foo());

...

it should became:

...

executor.submit(watchDog->{

watchDog.beforeEnter();

try {

alien.foo();

} finally {

watchDog.exit();

}

});

...

The API is not definitive at all, but the point is that watchDog.beforeEnter(); should not be moved into the synchronized block because that woud not make possibile to compute the elapsed time into the

synchronized block if some other thread is preventing alien.foo() to enter into the synchronized block.

Probably there are better/smarter ways to implement it, but that's the reason behind my question :)

Gil Tene

unread,

May 21, 2018, 11:24:25 AM5/21/18

to mechanical-sympathy

That is indeed an interesting example, involving time-observation, where you do not want the time measurement to be delayed past a synchronization event.

While I'm not sure there is a "guaranteed to be correct" way to do this in the presence of a hyper-capable optimizer, there is a very practical way to do it in the real world: Surround the prior-to-the-block time-measurement with another potentially-blocking synchronization mechanism (a lock of some sort, like a synchronized block on a different object, or a different ReentrantLock). Now you have two locks on two variables. Under "normal conditions", an optimizer will not be allowed to change the in which the order with which the two lock operations are done since that might cause semantically visible effects, like deadlocks. And as long as the optimizer cannot reorder the two locks, you have the order you wanted for the timing operation.

The reason I believe that this is a practical solution but not a stands-up-to-hypothetically-infinitely-smart-optimizer one is that an optimizer CAN reorder the locks if it can prove that no semantically visible effects that violate the JMM or potentially cause deadlocks will occur. A good example of an allowed optimization would be ignoring the added lock operation if it can be proven to be thread-local (e.g. via escape analysis). You can (and should) probably attempt to prevent that to some degree by declaring the extra lock as a static variable. But that doesn't prohibit some super-smart code-and-thread analysis that would show that "under current conditions" only one thread ever actually operates on the lock, and de-optimizes all related code only once a second thread starts interacting with it. This may sound far-fetched, but it is a very possible future optimization for biased locks (if extensive profiling shows a hot lock that is constantly biased to a single thread, optimize all code that relates to that lock to ignore the lock/unlock operations, and make the de-biasing of that lock de-optimize all related code). You can (and should) try to combat that by trying to prevent biasing (e.g. make sure at least two threads lock/unlock the lock every once in a while, performing some state change to a shared variable under the lock), but that while this can help prevent lasting bias, it doesn't prevent the optimization from happening (and then going away) temporarily within a time window. For the specific time-observation purpose you have here, this (two threads locking/unlocking e.g. each second) is probably good enough (as the possible optimization will not last more than e.g. 1 second, and your detector will still work), but I wouldn't rely on such a thing if stronger semantic correctness was at stake.

Note: the reason I'd do some state-changing operation on a static variable under the lock in the "bias-preventing" threads is to make sure that optimizers cannot somehow optimize away the lock block in those threads as effective no-ops for some reason.

Travis Downs

unread,

Jun 2, 2018, 6:11:37 PM6/2/18

to mechanical-sympathy

I'm pretty sure that even a smart optimizer can't move the write that occurs in watchDog.beforeEnter(); call inside the lock at least for some types of critical sections - that could definitely change the observable semantics of the program too: specifically introducing deadlocks where none existed before if something inside the lock waits for something which in turn is waiting for a state changed triggered by watchDog.beforeEnter(); or preventing the visibility of write indefinitely in the case that the thread blocks indefinitely on the critical section.

I think the only writes that are safe to move into the critical section are those which aren't externally visible, e.g., writes to locals. Perhaps also non-volatile variables since the visibility guarantees there are weaker. Loads are fine to move in. In anycase, if you're implementing this as a deadlock detector: it seems likely to work: the kind of non-trivial critical section that can deadlock is also the type of non-trivial lock body that the compiler won't be able to move stores into. As with all intersection-of-optimization-and-the-memory-model issues this very hard to reason about, so I could be totally wrong: but I'd like to see any example where a compiler moves loads into such a critical section (of course, the absence of any examples doesn't mean I'm right, either).

That leaves hardware re-ordering, and of course stores might "move into" critical sections there: probably not on x86 since the method of acquiring a lock generally involve a full barrier (ignoring biased locking), but yes on platforms that have a type of acquire-CAS or acquire-test-and-set. The key difference versus software reorderings is that such reorderings are generally transient: the store can appear out of order (later) with respect to the lock acquire or some accesses inside the lock, but it won't be indefinitely postponed: your deadlock detector will still likely see the store if the lock has been taken, at least after a short delay (e.g., 1 second should be orders of magnitude enough time for the store buffer to become globally visible on any architectures I'm aware of). This is different than software re-ordering: where the time is either unbounded (the thread is waiting on the lock and the write is inside the lock) or potentially very large: (the store is inside the lock and a context switch has occurred after taking the lock but before doing the store, so depending on the load and scheduling policy the write might not occur for a very long time or ever).

The idea that the hardware-level store visibility is bounded leads to some interesting ideas, such as concurrency mechanisms that wait "long enough" for the store buffer to drain on any thread. I'm not aware of any use in practice, perhaps at least partly because hardware vendors are never doing to guarantee this, and also because the worst case bounds might be quite high: consider a store buffer full of stores that exhibit the worst-case behavior such as split-page full-TLB misses: that could be 1000+ cycles per store.

Reply all

Reply to author

Forward