Can dosync transaction result computation be parallelized over multiple threads?

22 views
Skip to first unread message

Andy Fingerhut

unread,
Aug 13, 2009, 3:40:25 AM8/13/09
to Clojure
I know that if you have a dosync call in some function executed by a
thread, and then that function calls other functions (which might have
their own dosyncs, which get bundled together with the original
transaction), then everything is fine. It seems common that all of
that work would be done sequentially, in a single thread.

But what if that thread that started the dosync did a pmap call, say,
and the function called on each of the elements of the collection
accessed Refs, and perhaps computed new values for some of them, and
those were not the same Refs that the original thread was accessing.
Do all of those get bundled together with the original transaction,
too?

If so, did it require anything special in the implementation to make
that work?

Thanks,
Andy

Chas Emerick

unread,
Aug 13, 2009, 5:58:54 AM8/13/09
to clo...@googlegroups.com
> I know that if you have a dosync call in some function executed by a
> thread, and then that function calls other functions (which might have
> their own dosyncs, which get bundled together with the original
> transaction), then everything is fine. It seems common that all of
> that work would be done sequentially, in a single thread.
>
> But what if that thread that started the dosync did a pmap call, say,
> and the function called on each of the elements of the collection
> accessed Refs, and perhaps computed new values for some of them, and
> those were not the same Refs that the original thread was accessing.
> Do all of those get bundled together with the original transaction,
> too?

Andy,

No -- absent some var/binding gymnastics (which are currently possible
but not very straightforward to do at the moment) -- a transaction is
associated with a single thread, period.

If your pmap call happens to initialize other transactions on other
threads, those transactions will be completed completely independently
of any transaction that may be in flight that initiated the pmap call
itself (or send or send-off or future, or whatever other asynchronous
mechanism at play).

That said, I would urge caution in circumstances like what you've
described. While you say that the refs being accessed in the other
threads' transactions aren't the same as the refs being accessed in
the parent transaction, you'll be in for some unpleasantness if that
ceases to be true at some point. e.g. the "child" transactions will
see the original value of any refs changed so far in the "parent"
transaction, the parent will not see any changes made to those refs by
the child transactions, and if a parent transaction is waiting on a
value being returned by a child transaction, and they both ensure or
ref-set or alter the same ref, then you're likely in for a deadlock.

> If so, did it require anything special in the implementation to make
> that work?

Migrating transactions across thread boundaries *is* possible by
judiciously copying thread-local bindings from one thread to another.
There is a low-level API on clojure.lang.Var, etc. for doing this, but
it's by no means pretty -- which I think is probably a good sign that
doing so is fundamentally a bad idea at the moment.

If and until a better approach is available, I'd say the best approach
would be to ensure that only *values* are going into your pmap calls
(as opposed to refs), as that's the only way to ensure that you're not
going to get tripped up by the thread-local nature of transactions.
I've done just the opposite in the recent past, and the result was a
lot of confusion and frustration, which ended up being part of
learning a lesson the hard way. :-/

Cheers,

- Chas

Meikel Brandmeyer

unread,
Aug 13, 2009, 6:38:16 AM8/13/09
to Clojure
Hi,

On Aug 13, 11:58 am, Chas Emerick <cemer...@snowtide.com> wrote:

> That said, I would urge caution in circumstances like what you've  
> described.

Additionally, transactions should be kept small. IIRC,
Rich recommended that before. If possible, don't do
complicated calculations inside a transaction.

Sincerely
Meikel

Mark Volkmann

unread,
Aug 14, 2009, 5:17:31 PM8/14/09
to clo...@googlegroups.com

Above you say "if ... they both ensure or ref-set or alter the same
ref". I don't believe that can happen. Once one thread is successful
in doing an ensure, ref-set or alter to a given Ref, that Ref is
marked as having gotten a new in-transaction value in that transaction
(the tinfo field of the Ref object gets set). This happens before the
transaction commits. If another thread tries to do the same then
either it or the original transaction is going to retry. I don't
believe there is a possibility for deadlock here.

Hold on! There was a recent change in LockingTransaction.java that may
have made what I said above not exactly true! The LockingTransaction
lock method is what sets the tinfo field of a Ref. Previously the lock
method was called for an ensure, ref-set or alter. In the new version
it seems that lock is only called for ref-set and alter. Can someone
in the know confirm that this is the intended behavior? It seems that
with this change it is now possible for more than one thread to ensure
the same Ref as long as no thread has yet done a ref-set or alter on
the Ref.

--
R. Mark Volkmann
Object Computing, Inc.

Mark Volkmann

unread,
Aug 14, 2009, 8:53:23 PM8/14/09
to clo...@googlegroups.com

I just studied this further. Before the recent changes to
LockingTransaction.java, only one thread at a time could ensure a
given Ref. With the changes, any number of threads can ensure the same
Ref.

A key thing to know about this is that Ref.java uses a
ReentrantReadWriteLock. Each Ref has an associated
ReentrantReadWriteLock object. Those allow any number of threads to
simultaneously hold a read lock OR a single thread to hold a write
lock.

All the threads that have called ensure on the Ref hold a read lock
for it. If one of them later tries to write to the Ref (by calling
ref-set or alter on it), it will release the read lock it holds for
the Ref and attempt to get a write lock for it. If other threads still
hold that read lock, this attempt will fail and the transaction will
retry.

So it seems the biggest impact of the change can be summarized as
follows. In the past, once you successfully ensured a Ref, you knew
you could write to it later because no other thread could also ensure
it. Now you don't know that. You know you can stop other threads from
writing the Ref, but you won't be able to write the Ref as long as
other threads have also ensured it.

I'm not criticizing the change. There were likely good reasons for it.

Please let me know if I have interpreted anything incorrectly.

Chas Emerick

unread,
Aug 16, 2009, 10:24:59 AM8/16/09
to clo...@googlegroups.com

On Aug 14, 2009, at 8:53 PM, Mark Volkmann wrote:

> So it seems the biggest impact of the change can be summarized as
> follows. In the past, once you successfully ensured a Ref, you knew
> you could write to it later because no other thread could also ensure
> it. Now you don't know that. You know you can stop other threads from
> writing the Ref, but you won't be able to write the Ref as long as
> other threads have also ensured it.
>
> I'm not criticizing the change. There were likely good reasons for it.
>
> Please let me know if I have interpreted anything incorrectly.

Totally aside from the correctness of that analysis (which I've no
reason to doubt a.t.m.), I just want to clarify that I never actually
encountered any deadlocks (though I absolutely did burn a bunch of
cycles trying to figure out why some fn was seeing old values in
certain refs...due to the fn being involved in a pmap over a series of
objects that contained refs).

- Chas

Rich Hickey

unread,
Aug 19, 2009, 12:03:13 PM8/19/09
to clo...@googlegroups.com

While I appreciate that you are trying to understand the
implementation of the STM, understanding the semantics of the STM in
terms of its implementation is wrong-way-around.

The semantics are simpler, and the implementation is subject to
change. Retries will occur as needed to ensure the semantics, and no
one should be thinking in terms of "if I do this and another
transaction does that..." or read and write locks etc.

Rich

Mark Volkmann

unread,
Aug 19, 2009, 12:26:03 PM8/19/09
to clo...@googlegroups.com
On Wed, Aug 19, 2009 at 11:03 AM, Rich Hickey<richh...@gmail.com> wrote:
>
> While I appreciate that you are trying to understand the
> implementation of the STM, understanding the semantics of the STM in
> terms of its implementation is wrong-way-around.
>
> The semantics are simpler, and the implementation is subject to
> change. Retries will occur as needed to ensure the semantics, and no
> one should be thinking in terms of "if I do this and another
> transaction does that..." or read and write locks etc.

I agree in general, but I think there are many reasons that a person
might want to understand what is happening under the covers. Here are
some of them.

1) learn interesting things from the design of Clojure STM
2) understand it well enough to become convinced that it works as advertised
3) implement STM for some other programming language using ideas
borrowed from the Clojure implementation
4) understand it well enough to be able to suggest improvements to it
5) understand it well enough to help with adding tool support (such as
tracking the number of times a transaction retries and why it retries
in order to tune usage)

If feel like I understand commute and ensure much better after
studying the code in LockingTransaction.java.

John Harrop

unread,
Aug 19, 2009, 3:42:17 PM8/19/09
to clo...@googlegroups.com
(def *dosync-counts* (atom {})

(defn avg-in [map key val]
  (let [[avg cnt] (get map key [0 0])]
    (assoc map key [(/ (+ (* avg cnt) val) (inc cnt)) (inc cnt)])))

(defmacro logged-dosync [& body]
  `(let [count# (atom 0)]
     (dosync
       (swap! count# inc)
       ~@body)
     (swap! *dosync-counts* avg-in (quote ~body) (deref count#))))

(defn crude-print-dosync-log []
  (doseq [[code [avg cnt]] @*dosync-counts*]
    (println code)
    (println avg "retries on average over" cnt "total transaction(s)")
    (println)))

These tools should suffice to simply discover how many retries some transaction is taking on average. For more complex dosync profiling you'd probably need hooks deeper into Clojure's STM, particularly for logging what ref value changed to cause a retry or similarly.

(A cleverer macro might parse the body for ensures, commutes, and the like, and bookend the body with code to record the at-start-of-transaction values of the involved refs and code to compare these against the out-transaction values just before commit, getting the latter using (future (deref foo)) or some similar method that will see the out-transaction values, and note any that had changed. So hooks deeper into STM might not be necessary after all, at least for some simple tests of why a retry might have happened. Having code that can introspect on code and write code seems to allow for some serious kung-fu!)

Reply all
Reply to author
Forward
0 new messages