Clearly using STM (dosync with Refs) makes code easier to write than
using Java synchronization because you don't have to determine up
front which objects need to be locked. In the Clojure approach,
nothing is locked. Changes in the transaction happen to in-transaction
values and there is only a small amount of blocking that occurs at the
end of the transaction when changes are being committed. Score one for
Clojure!
What concerns me though is how often the work done in two transactions
running in separate threads turns out to be useful work. It seems that
it will typically be the case that when two concurrent transactions
access the same Refs, one of them will commit and the other will
retry. The retry will discard the in-transaction changes that were
made to the Refs, essentially rendering the work it did of no value.
So there was increased concurrency, but not useful concurrency.
Of course there is a chance that the transaction contains some
conditional logic that makes it so the Refs to be accessed aren't
always the same, but my speculation is that that's are rare
occurrence. It's probably more typical that a transaction always
accesses the same set of Refs every time it executes.
This makes it seem that Java's locking approach isn't so bad. Well,
it's bad that I have to identify the objects to lock, but it's good
that it doesn't waste cycles doing work that will just be thrown away.
I hope I'm missing some important details and will be set straight by someone!
--
R. Mark Volkmann
Object Computing, Inc.
I'm trying to understand the degree to which Clojure's STM provides
more concurrency than Java's blocking approach. I know it's difficult
to make generalizations and that specific applications need to be
measured, but I'll give it a go anyway.
Clearly using STM (dosync with Refs) makes code easier to write than
using Java synchronization because you don't have to determine up
front which objects need to be locked. In the Clojure approach,
nothing is locked. Changes in the transaction happen to in-transaction
values and there is only a small amount of blocking that occurs at the
end of the transaction when changes are being committed. Score one for
Clojure!
What concerns me though is how often the work done in two transactions
running in separate threads turns out to be useful work. It seems that
it will typically be the case that when two concurrent transactions
access the same Refs, one of them will commit and the other will
retry. The retry will discard the in-transaction changes that were
made to the Refs, essentially rendering the work it did of no value.
So there was increased concurrency, but not useful concurrency.
Of course there is a chance that the transaction contains some
conditional logic that makes it so the Refs to be accessed aren't
always the same, but my speculation is that that's are rare
occurrence. It's probably more typical that a transaction always
accesses the same set of Refs every time it executes.
This makes it seem that Java's locking approach isn't so bad. Well,
it's bad that I have to identify the objects to lock, but it's good
that it doesn't waste cycles doing work that will just be thrown away.
I don't think the same thing happens. In the case of Clojure, both
transaction A and B start. Suppose A finishes first and commits. Then
transaction B retries, finishes and commits. That's what I was
referring to as non-useful work. I'm not saying it's the wrong
approach, but it is different.
>> Of course there is a chance that the transaction contains some
>> conditional logic that makes it so the Refs to be accessed aren't
>> always the same, but my speculation is that that's are rare
>> occurrence. It's probably more typical that a transaction always
>> accesses the same set of Refs every time it executes.
>
> Which Refs your transactions modify will depend heavily based on the
> specific application you are working on. For example I can imagine that an
> application dealing with bank accounts and transferring money between them
> the probability of two transactions concurrently hitting the same account is
> pretty low. In other applications where a lot of transactions modify the
> same global state the chances of conflicts are much higher.
Agreed.
>> This makes it seem that Java's locking approach isn't so bad. Well,
>> it's bad that I have to identify the objects to lock, but it's good
>> that it doesn't waste cycles doing work that will just be thrown away.
>
> There's a reason concurrent programming is notoriously hard in most
> languages, because it takes a lot of effort and skill to get right. Between
> having to correctly identify which objects need to be locked and trying to
> avoid deadlocks dealing with explicit locks can be pretty messy and
> dangerous. That doesn't mean Java's approach is bad, after all the internals
> of Clojure are implemented using Java locks. But explicit management of
> locks is often too low level and unnecessarily complex, and Clojure provides
> a higher level way of dealing with concurrency that makes it easier and
> safer to work with most of the time.
I agree that Clojure makes the programming much easier, but is there a
downside? Maybe the downside is performance. If I found out that a
particular transaction was commonly being retried many times, is that
a sign that I need to write the code differently? How would I find out
that was happening? I know I could insert my own code to track that,
but it seems like this may be a commonly needed tool for Clojure to
detect excessive conflicts/retries in transactions. Maybe we could set
a special variable like *track-retries* that would cause Clojure to
produce a text file that describes all the transaction retries that
occurred in a particular run of an application. If such a tool isn't
needed or wouldn't be useful, I'd like to understand why.
given what i've heard about Azul running threaded Java code (Clojure
might be different, of course!), i think there are insufficient
guarantees to make such tools useful. running the same threaded Java
code on a different machine has a not-insignificant chance of having
quite different scheduling.
sincerely.
I don't think the same thing happens. In the case of Clojure, both
> In the case where two transactions need to modify the same Ref they
> definitely to be serialized, either by explicitly using locks in Java, or by
> letting Clojure automatically retry one of them. In either case it about the
> same thing happens. Transaction A starts and finishes, then Transaction B
> starts and finishes.
transaction A and B start. Suppose A finishes first and commits. Then
transaction B retries, finishes and commits. That's what I was
referring to as non-useful work. I'm not saying it's the wrong
approach, but it is different.
I agree that Clojure makes the programming much easier, but is there a
downside? Maybe the downside is performance. If I found out that a
particular transaction was commonly being retried many times, is that
a sign that I need to write the code differently? How would I find out
that was happening? I know I could insert my own code to track that,
but it seems like this may be a commonly needed tool for Clojure to
detect excessive conflicts/retries in transactions. Maybe we could set
a special variable like *track-retries* that would cause Clojure to
produce a text file that describes all the transaction retries that
occurred in a particular run of an application. If such a tool isn't
needed or wouldn't be useful, I'd like to understand why.
The fact that B is tried once concurrently with A, and is then aborted and retried is in my opinion the same as transaction B being stuck waiting on a lock while A is being processed, but I can see how trying B concurrently with A the first time might waste more resources, and that perhaps for certain applications it locks might have better performance.
I can imagine how in certain situations a profile mode where Clojure keeps track of transaction retries, and maybe even the reason why they happened might be useful.
|
Luc Préfontaine Armageddon was yesterday, today we have a real problem... |
1) There are tools to measure the impact of GC on runs of an
application, but we don't yet have similar tools for TM.
2) The operation of GC can be tuned through command-line options, but
no such options are available yet for TM.
I don't know if these are really needed for TM, but I suspect they
would be useful and would at least make people feel more comfortable
about building large applications using TM.
I believe everything you say about programming with STM being much
easier than programming with locks. My concern is only about
performance measurement and tuning, either through options or code
refactoring.
--
I don't want to go out on a limb, having not looked at the Clojure STM
implementation. However, I would bet that the costs are roughly equal.
Even if Clojure was 50% slower, or 100% slower, the knowlege that you
can spin up a large number of threads and not worry about deadlocks is
ultimately more valuable.
On Mon, Mar 23, 2009 at 12:36 PM, Mark Volkmann
--
Howard M. Lewis Ship
Creator Apache Tapestry and Apache HiveMind
I don't want to go out on a limb, having not looked at the Clojure STM
implementation. However, I would bet that the costs are roughly equal.
Even if Clojure was 50% slower, or 100% slower, the knowlege that you
can spin up a large number of threads and not worry about deadlocks is
ultimately more valuable.