Long-running STM updates will never complete... is there a solution?

26 views
Skip to first unread message

Garth Sheldon-Coulson

unread,
Mar 15, 2010, 1:10:26 PM3/15/10
to clo...@googlegroups.com
Apologies in advance if this question has a simple answer that I'm overlooking!

Suppose I have a ref or atom and I periodically need to call a long-running function on it:

(def r (ref 0))
(defn f [_] (Thread/sleep 1000) -10)

Suppose also that the ref is being modified from other threads at shorter intervals than the execution time of f. I'll simulate this with:

(future (loop [] (Thread/sleep 500) (dosync (alter r inc)) (recur)))

In this situation, the following will never return:

(dosync (alter r f))

The reason is that the value of r never stays the same long enough for f to see and then update a consistent value.

Is there a way to cause r to be locked during the (dosync (alter r f)) so that all other alterations will be held until after it completes? I know this isn't normally the desired behavior, but is it possible to obtain?

Neither of the following works:
(locking r (dosync (alter r f)))
(dosync (locking r (alter r f)))

I could simulate part of the behavior I want with agents and await, but then I lose the ability to synchronize alterations.

Thanks for any insights!

Garth

Garth Sheldon-Coulson

unread,
Mar 15, 2010, 1:21:22 PM3/15/10
to clo...@googlegroups.com
Ah, maybe I should have read the thread on ensure before I posted this! =)

(dosync (ensure r) (alter r f))

works.

Meikel Brandmeyer

unread,
Mar 15, 2010, 4:08:03 PM3/15/10
to clo...@googlegroups.com
Hi,

On Mon, Mar 15, 2010 at 01:21:22PM -0400, Garth Sheldon-Coulson wrote:

> Ah, maybe I should have read the thread on ensure before I posted this! =)
>
> (dosync (ensure r) (alter r f))
>
> works.

Now I'm confused. Calling ensure on r shouldn't have an effect since we
call alter on r anyway, no?

Sincerely
Meikel

Michał Marczyk

unread,
Mar 15, 2010, 4:43:08 PM3/15/10
to clo...@googlegroups.com
On 15 March 2010 21:08, Meikel Brandmeyer <m...@kotka.de> wrote:
> Now I'm confused. Calling ensure on r shouldn't have an effect since we
> call alter on r anyway, no?

ensure "protects the ref from modification by other transactions"
(from the docs). alter does not.

Reading into the Java code, ensure puts a lock on the ref, which, once
in place, guarantees that the transaction doing the ensuring has an
exclusive right to modify the ref until it commits / retries... or
something, my Java-fu is still nothing to boast about, regrettably.

At any rate, my current understanding is that, in Garth's example, the
ensure gives (alter r f) all the time it needs to modify r's value
while putting all other transactions which attempt to modify r on
hold. alter, by itself, never interferes with background transactions;
should something disappear from under its feet, it expects to be
retried.

Ok, back to improving my Java chops in the hope of grasping all the
intricasies of Rich's code sometime... *sigh*

Sincerely,
Michał

Laurent PETIT

unread,
Mar 15, 2010, 4:53:48 PM3/15/10
to clo...@googlegroups.com
Hi,

Ok, but maybe the point of Meikel is that doing so may (?) be relying
on undocumented implementation detail ... ?

2010/3/15 Michał Marczyk <michal....@gmail.com>:

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

Michał Marczyk

unread,
Mar 15, 2010, 5:02:12 PM3/15/10
to clo...@googlegroups.com
On 15 March 2010 21:53, Laurent PETIT <lauren...@gmail.com> wrote:
> Ok, but maybe the point of Meikel is that doing so may (?) be relying
> on undocumented implementation detail ... ?

Well, I don't think so -- that passage I quoted from ensure's docs
seems to be an attempt at documenting it. In fact, as far as I can
tell, this is the single purpose of ensure.

OTOH, if my mental picture of what's going in is correct, I guess I'd
be in favour of an extra clarification effort on that docstring...
Then again I'm still in the process of thinking over what I saw today
(w.r.t. STM), so maybe it just needs some time to sink in.

Sincerely,
Michał

ataggart

unread,
Mar 15, 2010, 5:41:57 PM3/15/10
to Clojure

I'm inclined to say this is incorrect as I'm on my iphone so I can't
look at the source. The concurrency functions (e.g., ref-set, alter,
ensure) only lock their refs during the commit process. The ensure
function is provided to add a *non-changing* ref to the set of refs
that need to be locked; ref-set, etc., do this implicitly. To lock
the refs upon first use would largely obviate the point of the STM.

The issue Garth describes is a case of live-locking, an extant failure
mode of the STM. Some solutions would be to break up the work from
just a single transaction (though sacrificing consistency), or use the
locking construct:
http://richhickey.github.com/clojure/clojure.core-api.html#clojure.core/locking

Garth Sheldon-Coulson

unread,
Mar 15, 2010, 5:58:31 PM3/15/10
to clo...@googlegroups.com
Well it definitely seems that ensure has the behavior Michal
described, because the ensure code I posted works. I'm glad this
behavior is available, because I don't think there is any other way to
achieve the combination of synchronization and locking I need. (I
couldn't get locking to work on a ref; see my original msg.)

Maybe the two different behaviors call for different constructs: one
for making sure an unmodified ref hasn't changed at commit time (the
original purpose I thought ensure was supposed to serve), and one with
the behavior I need here, namely causing all other modifications to
block for the remaining life of the transaction.

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your
> first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
>

--
Sent from my mobile device

ataggart

unread,
Mar 15, 2010, 6:13:15 PM3/15/10
to Clojure
If using ensure solves the problem then something else is going on,
because ensure doesn't lock a ref for the life of the transaction any
more than ref-set does. As the doc for ensure notes, it allows for
*more* concurrency than ref-set, not less. From clojure.org/refs:

"All changes made to Refs during a transaction (via ref-set, alter or
commute) will appear to occur at a single point in the 'Ref world'
timeline (its 'write point')."

Ensure simply adds (after a bit of status checking) the ref to the set
of refs that need to be locked and examined when commit occurs. From
clojure.lang.LockingTransaction.doEnsure(Ref):

void doEnsure(Ref ref){
if(!info.running())
throw retryex;
if(ensures.contains(ref))
return;
ref.lock.readLock().lock();

//someone completed a write after our snapshot
if(ref.tvals != null && ref.tvals.point > readPoint) {
ref.lock.readLock().unlock();
throw retryex;
}

Info refinfo = ref.tinfo;

//writer exists
if(refinfo != null && refinfo.running())
{
ref.lock.readLock().unlock();

if(refinfo != info) //not us, ensure is doomed
{
blockAndBail(refinfo);
}
}
else
ensures.add(ref);
}

As for different constructs, they exist, one is 'ensure, the other is
'locking.


On Mar 15, 2:58 pm, Garth Sheldon-Coulson <g...@MIT.EDU> wrote:
> Well it definitely seems that ensure has the behavior Michal
> described, because the ensure code I posted works. I'm glad this
> behavior is available, because I don't think there is any other way to
> achieve the combination of synchronization and locking I need. (I
> couldn't get locking to work on a ref; see my original msg.)
>
> Maybe the two different behaviors call for different constructs: one
> for making sure an unmodified ref hasn't changed at commit time (the
> original purpose I thought ensure was supposed to serve), and one with
> the behavior I need here, namely causing all other modifications to
> block for the remaining life of the transaction.
>

> >http://richhickey.github.com/clojure/clojure.core-api.html#clojure.co...

Christophe Grand

unread,
Mar 15, 2010, 6:17:05 PM3/15/10
to clo...@googlegroups.com
On Mon, Mar 15, 2010 at 10:41 PM, ataggart <alex.t...@gmail.com> wrote:

On Mar 15, 1:43 pm, Michał Marczyk <michal.marc...@gmail.com> wrote:
> On 15 March 2010 21:08, Meikel Brandmeyer <m...@kotka.de> wrote:
>
> > Now I'm confused. Calling ensure on r shouldn't have an effect since we
> > call alter on r anyway, no?
>
> ensure "protects the ref from modification by other transactions"
> (from the docs). alter does not.
>
> Reading into the Java code, ensure puts a lock on the ref, which, once
> in place, guarantees that the transaction doing the ensuring has an
> exclusive right to modify the ref until it commits / retries... or
> something, my Java-fu is still nothing to boast about, regrettably.
>
> At any rate, my current understanding is that, in Garth's example, the
> ensure gives (alter r f) all the time it needs to modify r's value
> while putting all other transactions which attempt to modify r on
> hold. alter, by itself, never interferes with background transactions;
> should something disappear from under its feet, it expects to be
> retried.
>
> Ok, back to improving my Java chops in the hope of grasping all the
> intricasies of Rich's code sometime... *sigh*
>
> Sincerely,
> Michał

I'm inclined to say this is incorrect as I'm on my iphone so I can't
look at the source. The concurrency functions (e.g., ref-set, alter,
ensure) only lock their refs during the commit process.  

Michal is right: ensure effectively readlocks the ref -- ref-set (and alter since it is currently implemented on top of ref-set) writelocks a ref twice: the first time to "touch" it, the second time to commit.

However I think it's trying to read too much in "Protects the ref from modification by other transactions" to conclude that ensure guarantees the ref to be locked. 

The issue Garth describes is a case of live-locking, an extant failure
mode of the STM.  Some solutions would be to break up the work from
just a single transaction (though sacrificing consistency), or use the
locking construct:
http://richhickey.github.com/clojure/clojure.core-api.html#clojure.core/locking


One should also try to put most of the calculation outside of the transaction or to make retries cheapr (eg through partial memoization).

Christophe

Chouser

unread,
Mar 15, 2010, 6:27:06 PM3/15/10
to clo...@googlegroups.com

I make no claims about the Rightness of this suggestion, but
simply offer another example of a work-around:

(dosync (alter r identity) (alter r f))

That is, a "dummy write" via alter is also sufficient to allow
the transaction to succeed, just like 'ensure'.

--Chouser

ataggart

unread,
Mar 15, 2010, 6:28:12 PM3/15/10
to Clojure
Yes, I went off the rails because I was thinking of *a lock*, instead
of *the locks*, namely the read and write locks. Since readers block
writers (but not the other way around), ensure grabbing a read lock on
the ref means anyone wanting to get a write lock will get blocked.

On Mar 15, 3:17 pm, Christophe Grand <christo...@cgrand.net> wrote:

> >http://richhickey.github.com/clojure/clojure.core-api.html#clojure.co...

Garth Sheldon-Coulson

unread,
Mar 15, 2010, 6:24:40 PM3/15/10
to clo...@googlegroups.com
I would be happy to use locking if it's the more appropriate
construct. I tried it before ensure, in fact, because it seemed more
intuitive.

ataggart, could you take a look at my first message? The code I posted
using locking didn't work for some reason, and I imagine I was doing
something wrong.

Christophe, thanks for the partial memoization tip. I think that would
work in many cases, but for my current use case I really need to put a
lock on the ref, because the function is very expensive and other
threads will be updating it with previously unseen values so quickly
that memoization can't help. Are you saying that the fact that ensure
works in my example isn't to be counted on?


On 3/15/10, ataggart <alex.t...@gmail.com> wrote:

ataggart

unread,
Mar 15, 2010, 6:57:44 PM3/15/10
to Clojure
You'd need to use locking in both places since it's holding the
monitor of the Ref instance, not the read/write locks, e.g.:
(future (loop [] (Thread/sleep 500) (locking r (dosync (alter r inc)))
(recur)))

(locking r (dosync (alter r f)))

The point of ensure, as I see it, is to say only that the value will
be consistent *when the commit occurs*. As for the whether the
immediate read-lock-grabbing behavior of ensure should be replied
upon, I can't say, though I'm inclined to no more rely on that than I
am on the barging logic. Rich's input would be valuable here.


On Mar 15, 3:24 pm, Garth Sheldon-Coulson <g...@MIT.EDU> wrote:
> I would be happy to use locking if it's the more appropriate
> construct. I tried it before ensure, in fact, because it seemed more
> intuitive.
>
> ataggart, could you take a look at my first message? The code I posted
> using locking didn't work for some reason, and I imagine I was doing
> something wrong.
>
> Christophe, thanks for the partial memoization tip. I think that would
> work in many cases, but for my current use case I really need to put a
> lock on the ref, because the function is very expensive and other
> threads will be updating it with previously unseen values so quickly
> that memoization can't help. Are you saying that the fact that ensure
> works in my example isn't to be counted on?
>

Garth Sheldon-Coulson

unread,
Mar 15, 2010, 7:06:35 PM3/15/10
to clo...@googlegroups.com
Ah, I see! Thank you very much.

Yes, it would be helpful to get some more clarity on the
transaction-long readlocking of ensure.

On 3/15/10, ataggart <alex.t...@gmail.com> wrote:

Christophe Grand

unread,
Mar 16, 2010, 4:36:17 AM3/16/10
to clo...@googlegroups.com
On Mon, Mar 15, 2010 at 11:24 PM, Garth Sheldon-Coulson <ga...@mit.edu> wrote:
Christophe, thanks for the partial memoization tip. I think that would
work in many cases, but for my current use case I really need to put a
lock on the ref, because the function is very expensive and other
threads will be updating it with previously unseen values so quickly
that memoization can't help. Are you saying that the fact that ensure
works in my example isn't to be counted on?

Ensure seems to mitigate the problem but I think that with a different timing and bad luck (which always happen under load in concurrent systems), you may still experience a high-level of retries. (The readlock fastened by ensure is released just after the computation of f for alter to writelock it, tag it and release the lock. So there are two moments where the current transaction can be barged by another (older) transaction.)

Two questions:
Do you care about the return value of the long running computation inside the transaction?
Does this computation depends on the value of other refs?

If you anwser no to both, you should consider commute.
If you answered no only to the first, a delay may help you (by making the transaction shorter):
(dosync (alter r #(delay (f @%)))) ;; don't forget to capture all the ref values you depend on

hth,

Christophe

Christophe Grand

unread,
Mar 16, 2010, 5:25:04 AM3/16/10
to clo...@googlegroups.com
Hi Chouser,


On Mon, Mar 15, 2010 at 11:27 PM, Chouser <cho...@gmail.com> wrote:
I make no claims about the Rightness of this suggestion, but
simply offer another example of a work-around:

   (dosync (alter r identity) (alter r f))

That is, a "dummy write" via alter is also sufficient to allow
the transaction to succeed, just like 'ensure'.

This is an interesting idea which can be automated (well, to be frank, an implicit ensure could also be automated but it doesn't feel kosher) by adding a doAlter to c.l.LockingTransaction:
Object doAlter(Ref ref, IFn fn, ISeq args)){
    if(!info.running())
        throw retryex;
    if(commutes.containsKey(ref))
        throw new IllegalStateException("Can't alter after commute");
   
    Object val;
    if(!sets.contains(ref))
        {
        sets.add(ref);
        val = lock(ref);
        }
    else
        val = vals.get(ref);
    Object ret = fn.applyTo(RT.cons(val, args))
    vals.put(ref, ret);
    return ret;
}

and in c.l.Ref:
public Object alter(IFn fn, ISeq args) throws Exception{
    LockingTransaction t = LockingTransaction.getEx();
    return t.doalter(this, fn, args)));
}

What do you think?

Christophe

Chouser

unread,
Mar 16, 2010, 3:46:31 PM3/16/10
to clo...@googlegroups.com
On Tue, Mar 16, 2010 at 5:25 AM, Christophe Grand <chris...@cgrand.net> wrote:
> On Mon, Mar 15, 2010 at 11:27 PM, Chouser <cho...@gmail.com> wrote:
>>
>> I make no claims about the Rightness of this suggestion, but
>> simply offer another example of a work-around:
>>
>>    (dosync (alter r identity) (alter r f))
>>
>> That is, a "dummy write" via alter is also sufficient to allow
>> the transaction to succeed, just like 'ensure'.
>
> This is an interesting idea which can be automated (well, to be frank, an
> implicit ensure could also be automated but it doesn't feel kosher) by
> adding a doAlter to c.l.LockingTransaction:

Well, what you've got there seems much cleaner than a dummy
write, and makes 'alter' do a lot less work than it did. I won't
pretend to understand the STM well enough to know if it's all
correct though.

--Chouser
http://joyofclojure.com/

Reply all
Reply to author
Forward
0 new messages