deadlocks and race conditions

99 views
Skip to first unread message

Mark Volkmann

unread,
Aug 17, 2009, 1:56:23 PM8/17/09
to clo...@googlegroups.com
Just checking my understanding.

It seems to me that it is not possible for Clojure to remove the
possibility of writing code that has race conditions. I could write
code in which two threads each change the value of the same Ref inside
a transaction. Let's say the Ref starts with a value of 2, one
transaction adds 3 and the other multiplies by 4.
Depending on which thread runs first, after they both complete the
value of the Ref will either be (2 + 3) * 4 = 20 or (2 * 4) + 3 = 11.
Of course as the developer I could decide to use an Agent instead of a
Ref and then send actions to the Agent in the order in which I want
them to run to fix this problem. But Clojure itself can't prevent this
situation. The developer has to recognize the potential race condition
and write the code in a way to avoid it. Sound correct?

Does Clojure STM eliminate the possibility of deadlock? When two
transactions have a contention for the same Ref, Clojure gives
preference to the transaction that has been running the longest
(unless it has only been running for a very short amount of time - see
bargeTimeElapsed in LockingTransaction.java). Is that alone enough to
guarantee that a deadlock over Refs cannot occur? The transaction that
has been running the longest may never finish, going into an infinite
loop. Other transactions that contend for the same Refs will continue
retrying until they hit the 10,000 retry limit and then an exception
will be thrown, so I supposed technically that isn't deadlock.

--
R. Mark Volkmann
Object Computing, Inc.

CuppoJava

unread,
Aug 17, 2009, 2:12:57 PM8/17/09
to Clojure
I don't have enough knowledge to answer the deadlock question. But I
can offer you my opinion about your question on race conditions.

In your example, the result WILL depend on which transaction is run
first.
However I think practically this isn't a problem, because of the
following:

As a developer, when you split your task into multiple parallel
operations (ie. threads), you're already acknowledging that the order
in which they run shouldn't matter. If order does matter, you probably
wouldn't have written it using multiple threads.

That's my understanding anyway. Which is why haven't run into any
problems with race conditions (as of yet).
-Patrick

Michael Reid

unread,
Aug 17, 2009, 3:47:11 PM8/17/09
to clo...@googlegroups.com
Hi,

Ditto on the ordering example. Clojure can't infer which order your
code needs to run any more than it can figure out what your code is
supposed to do.

On the deadlock question, it is my understanding from a prior post by
Rich that Clojure's STM acquires locks in such a way that deadlocks
cannot occur (as I recall, the locks are sorted by numeric ID an
acquired/released in that order). If a deadlock occurs in the STM, my
understanding is that would be a bug.

So yes, the Clojure STM should guarantee that a deadlock situation
does not occur.

/mike.

Mark Volkmann

unread,
Aug 17, 2009, 4:16:15 PM8/17/09
to clo...@googlegroups.com
On Mon, Aug 17, 2009 at 2:47 PM, Michael Reid<kid....@gmail.com> wrote:
>
> Hi,
>
> Ditto on the ordering example. Clojure can't infer which order your
> code needs to run any more than it can figure out what your code is
> supposed to do.

Okay. I was pretty sure that was the case, but just wanted to verify
that there wasn't something special going on that I was missing.

> On the deadlock question, it is my understanding from a prior post by
> Rich that Clojure's STM acquires locks in such a way that deadlocks
> cannot occur (as I recall, the locks are sorted by numeric ID an
> acquired/released in that order). If a deadlock occurs in the STM, my
> understanding is that would be a bug.
>
> So yes, the Clojure STM should guarantee that a deadlock situation
> does not occur.

I think it's a common misconception that STM in general just manages
locks for you and basically uses them in the same way you would if you
were hand-coding lock-based concurrency. I can say with certainty
after studying the Clojure STM implementation for over a month that
that isn't the case.

Refs in Clojure STM each have a ReentrantReadWriteLock object. Any
number of concurrent transactions can hold the read lock for a Ref OR
a single transaction can hold the write lock for a Ref. The only time
one of these locks is held for the duration of a transaction is when
you call ensure on a Ref. In that case it will hold a read lock until
either the Ref is modified in the transaction or until the transaction
commits. There is no circumstance under which a write lock will be
held for the duration of a transaction. Write locks are obtained only
briefly when certain things occur during a transaction and then
quickly released. They are acquired again when the transaction commits
and then released when the commit completes. The way that one
transaction knows that another one has modified a Ref and hasn't yet
committed is that the tinfo field of the Ref is set to refer to an
object that describes the transaction that modified the Ref (including
the order in which it started relative to others and its current
status).

So locks are not sorted by a numeric ID. It is transaction starts,
tries and commits that are numbered that way. It's a beautiful and
complicated strategy ... but it leaves me wondering whether there is
any possibility of deadlock. I don't see how it could happen. I'm just
asking in case I overlooked something.

> On Mon, Aug 17, 2009 at 2:12 PM, CuppoJava<patrick...@hotmail.com> wrote:
>>
>> I don't have enough knowledge to answer the deadlock question. But I
>> can offer you my opinion about your question on race conditions.
>>
>> In your example, the result WILL depend on which transaction is run
>> first.
>> However I think practically this isn't a problem, because of the
>> following:
>>
>> As a developer, when you split your task into multiple parallel
>> operations (ie. threads), you're already acknowledging that the order
>> in which they run shouldn't matter. If order does matter, you probably
>> wouldn't have written it using multiple threads.
>>
>> That's my understanding anyway. Which is why haven't run into any
>> problems with race conditions (as of yet).
>>  -Patrick

--

J. McConnell

unread,
Aug 17, 2009, 11:16:05 PM8/17/09
to clo...@googlegroups.com
On Mon, Aug 17, 2009 at 1:56 PM, Mark Volkmann <r.mark....@gmail.com> wrote:

Does Clojure STM eliminate the possibility of deadlock? When two
transactions have a contention for the same Ref, Clojure gives
preference to the transaction that has been running the longest
(unless it has only been running for a very short amount of time - see
bargeTimeElapsed in LockingTransaction.java). Is that alone enough to
guarantee that a deadlock over Refs cannot occur? The transaction that
has been running the longest may never finish, going into an infinite
loop. Other transactions that contend for the same Refs will continue
retrying until they hit the 10,000 retry limit and then an exception
will be thrown, so I supposed technically that isn't deadlock.

I beileve the term for this is livelock. Deadlock is when thread A locks mutex M and waits for mutex N while thread B locks mutex N and waits for mutex M. I have not looked deeply into Clojure's STM implementation, but the ordered locking mentioned elsewhere in this thread should prevent deadlock. However, I don't believe it is possible to prevent the kind of livelock you describe. I definitely could be mistaken, but this is my understanding of things.

Regards,

- J.

Rich Hickey

unread,
Aug 19, 2009, 1:18:01 PM8/19/09
to clo...@googlegroups.com
On Mon, Aug 17, 2009 at 4:16 PM, Mark Volkmann<r.mark....@gmail.com> wrote:
>
> On Mon, Aug 17, 2009 at 2:47 PM, Michael Reid<kid....@gmail.com> wrote:
>>
>> Hi,
>>
>> Ditto on the ordering example. Clojure can't infer which order your
>> code needs to run any more than it can figure out what your code is
>> supposed to do.
>
> Okay. I was pretty sure that was the case, but just wanted to verify
> that there wasn't something special going on that I was missing.
>
>> On the deadlock question, it is my understanding from a prior post by
>> Rich that Clojure's STM acquires locks in such a way that deadlocks
>> cannot occur (as I recall, the locks are sorted by numeric ID an
>> acquired/released in that order). If a deadlock occurs in the STM, my
>> understanding is that would be a bug.
>>
>> So yes, the Clojure STM should guarantee that a deadlock situation
>> does not occur.
>
> I think it's a common misconception that STM in general just manages
> locks for you and basically uses them in the same way you would if you
> were hand-coding lock-based concurrency.

How could that possibly be the case? Lock strategies are open and
arbitrary. Also, some STMs don't use locks at all. What an STM that
uses locks should do is use them correctly. Every time you hand-code
lock-based concurrency you risk getting it wrong. If an STM
implementation gets it wrong, it gets fixed and everyone benefits from
the fix.

> I can say with certainty
> after studying the Clojure STM implementation for over a month that
> that isn't the case.
>

What exactly are you saying?

> Refs in Clojure STM each have a ReentrantReadWriteLock object. Any
> number of concurrent transactions can hold the read lock for a Ref OR
> a single transaction can hold the write lock for a Ref. The only time
> one of these locks is held for the duration of a transaction is when
> you call ensure on a Ref. In that case it will hold a read lock until
> either the Ref is modified in the transaction or until the transaction
> commits. There is no circumstance under which a write lock will be
> held for the duration of a transaction. Write locks are obtained only
> briefly when certain things occur during a transaction and then
> quickly released. They are acquired again when the transaction commits
> and then released when the commit completes. The way that one
> transaction knows that another one has modified a Ref and hasn't yet
> committed is that the tinfo field of the Ref is set to refer to an
> object that describes the transaction that modified the Ref (including
> the order in which it started relative to others and its current
> status).
>
> So locks are not sorted by a numeric ID.

This is an oversimplification and simply wrong. You'll need to spend
more time studying the implementation. In particular, commuting uses
an ordered lock strategy.

> It is transaction starts,
> tries and commits that are numbered that way. It's a beautiful and
> complicated strategy ... but it leaves me wondering whether there is
> any possibility of deadlock. I don't see how it could happen. I'm just
> asking in case I overlooked something.
>

Deadlock can be precluded by avoiding mutually incompatible multi-lock
acquisitions, and/or utilizing non-forever-blocking (timeout-based)
lock acquisition. In Clojure's STM several strategies are combined.
ensures use mutually compatible readlocks, and can thus hold them over
time. Writes use write locks, but - a) try to obtain them only with a
timeout, and b) don't hold more than one at a time other than during a
commit, c) use a tagging system during the body of the transaction.
So, they obtain a write lock temporarily, then tag as owned by this
transaction, then release. On commit, everything written to will have
been successfully tagged, and those locks can then be obtained in any
order, since any other writing transaction vying for the same refs
will have failed. Thus there can be no lock A wait for B/lock B wait
for A deadlocks. Commutes always proceed (without tagging), but need
to coordinate with other writes and commutes on commit. Thus commute
locks are obtained in order (with timeouts to avoid clashes with
writes), contrary to what you've said, based upon comparability of
refs utilizing numeric IDs. This ensures that multiple commutes
involving overlapping sets of refs use compatible lock orders.

These are well known techniques for safe manual lock based
concurrency, simply automated by Clojure's STM, so I don't know what
you are talking about above. I've done a ton of manual concurrency and
have used all of these techniques. That said, there are many ways to
combine these and others and the implementation is subject to change.
Marrying your understanding of STM to understanding this particular
implementation is tenuous. Not that there's anything wrong with trying
to understand it, but going from the specific to the general is
dangerous, and many details *are likely to change* as it gets refined.

If a transaction can't succeed in delivering the semantics, there will
be retries, which are not problems but the expected behavior. Livelock
would be possible, but is avoided by the retry limit. If you encounter
the retry limit it means you will have to re-architect your units of
work (sizes, durations, degrees of contention, commute vs set), and
would have to make similar changes with any strategy.

Rich

Mark Volkmann

unread,
Aug 19, 2009, 1:41:32 PM8/19/09
to clo...@googlegroups.com

I agree with everything you said above. I'm explaining what I have
witnessed as a misconception. For example, a C++ guy I know said that
STM implementation "use locks like crazy". I'm trying to explain that
that isn't the case in the Clojure STM implementation.

>> I can say with certainty
>> after studying the Clojure STM implementation for over a month that
>> that isn't the case.
>
> What exactly are you saying?

I'm saying that the Clojure STM implemenation doesn't "use locks like
crazy". For example, a lock is not held for every Ref that is modified
in a transaction. Instead, the Ref gets its tinfo field set to
indicate that a transaction has modified it, but hasn't yet committed
the change.

>> Refs in Clojure STM each have a ReentrantReadWriteLock object. Any
>> number of concurrent transactions can hold the read lock for a Ref OR
>> a single transaction can hold the write lock for a Ref. The only time
>> one of these locks is held for the duration of a transaction is when
>> you call ensure on a Ref. In that case it will hold a read lock until
>> either the Ref is modified in the transaction or until the transaction
>> commits. There is no circumstance under which a write lock will be
>> held for the duration of a transaction. Write locks are obtained only
>> briefly when certain things occur during a transaction and then
>> quickly released. They are acquired again when the transaction commits
>> and then released when the commit completes. The way that one
>> transaction knows that another one has modified a Ref and hasn't yet
>> committed is that the tinfo field of the Ref is set to refer to an
>> object that describes the transaction that modified the Ref (including
>> the order in which it started relative to others and its current
>> status).
>>
>> So locks are not sorted by a numeric ID.
>
> This is an oversimplification and simply wrong. You'll need to spend
> more time studying the implementation. In particular, commuting uses
> an ordered lock strategy.

Is there something specific that I said above that is wrong, because
based on my reading of the code I think it is all correct. I just
discovered what you are talking about with ordered locking for
commutes! See my next comment below.

>> It is transaction starts,
>> tries and commits that are numbered that way. It's a beautiful and
>> complicated strategy ... but it leaves me wondering whether there is
>> any possibility of deadlock. I don't see how it could happen. I'm just
>> asking in case I overlooked something.
>
> Deadlock can be precluded by avoiding mutually incompatible multi-lock
> acquisitions, and/or utilizing non-forever-blocking (timeout-based)
> lock acquisition. In Clojure's STM several strategies are combined.
> ensures use mutually compatible readlocks, and can thus hold them over
> time. Writes use write locks, but - a) try to obtain them only with a
> timeout, and b) don't hold more than one at a time other than during a
> commit, c) use a tagging system during the body of the transaction.
> So, they obtain a write lock temporarily, then tag as owned by this
> transaction, then release. On commit, everything written to will have
> been successfully tagged, and those locks can then be obtained in any
> order, since any other writing transaction vying for the same refs
> will have failed. Thus there can be no lock A wait for B/lock B wait
> for A deadlocks. Commutes always proceed (without tagging), but need
> to coordinate with other writes and commutes on commit. Thus commute
> locks are obtained in order

Ah ... I just saw a detail I had overlooked. The commutes are stored
in a TreeMap, not a HashMap. So that's where the order comes from.
They are sorted on the Ref ids which are assigned sequentially as the
Ref objects are created.

> (with timeouts to avoid clashes with
> writes), contrary to what you've said, based upon comparability of
> refs utilizing numeric IDs. This ensures that multiple commutes
> involving overlapping sets of refs use compatible lock orders.

Cool! I get that now.

> These are well known techniques for safe manual lock based
> concurrency, simply automated by Clojure's STM, so I don't know what
> you are talking about above. I've done a ton of manual concurrency and
> have used all of these techniques. That said, there are many ways to
> combine these and others and the implementation is subject to change.
> Marrying your understanding of STM to understanding this particular
> implementation is tenuous. Not that there's anything wrong with trying
> to understand it, but going from the specific to the general is
> dangerous, and many details *are likely to change* as it gets refined.
>
> If a transaction can't succeed in delivering the semantics, there will
> be retries, which are not problems but the expected behavior. Livelock
> would be possible, but is avoided by the retry limit. If you encounter
> the retry limit it means you will have to re-architect your units of
> work (sizes, durations, degrees of contention, commute vs set), and
> would have to make similar changes with any strategy.

Thanks for the feedback Rich!

Reply all
Reply to author
Forward
0 new messages