How to encapsulate local state in closures

chris

unread,

Dec 21, 2008, 1:47:45 PM12/21/08

to Clojure

I would like to be able to encapsulate local state in a closure.
Specifically, I would like a function that returns an incrementing
integer, thus:
(test_func)
1
(test_func)
2
What is the best way to go about this? With local bindings is failing
and I can't figure just why...

(def test_closure
(with-local-vars [one 1]
(fn [] (var-get one))))
#'user/test_closure
user> (test_closure)
; Evaluation aborted.
The var is null when I call the closure.

Thanks,
Chris

Parth Malwankar

unread,

Dec 21, 2008, 3:03:36 PM12/21/08

to Clojure

On Dec 21, 11:47 pm, chris <cnuern...@gmail.com> wrote:
> I would like to be able to encapsulate local state in a closure.
> Specifically, I would like a function that returns an incrementing
> integer, thus:
> (test_func)
> 1
> (test_func)
> 2
> What is the best way to go about this? With local bindings is failing
> and I can't figure just why...
>

One way to do this would be to use atom.

(defn mk-counter [start]
(let [n (atom start)]
(fn [] (swap! n inc))))

(def counter (mk-counter 0))

user=> (counter)
1
user=> (counter)
2
user=> (counter)
3

Parth

Brian Doyle

unread,

Dec 21, 2008, 7:24:26 PM12/21/08

to clo...@googlegroups.com

I haven't been following the new atom stuff, so I was wondering why atom would be best in this
situation, vs a ref? Thanks.

Stephen C. Gilardi

unread,

Dec 21, 2008, 7:42:45 PM12/21/08

to clo...@googlegroups.com

On Dec 21, 2008, at 7:24 PM, Brian Doyle wrote:

I haven't been following the new atom stuff, so I was wondering why atom would be best in this
situation, vs a ref? Thanks.

The implementation of atoms is supported by the JVM typically using a processor hardware instruction that accomplishes thread safety for a single location in a very focused, fast way.

From http://clojure.org/atoms :

Atoms are an efficient way to represent some state that will never need to be coordinated with any other, and for which you wish to make synchronous changes (unlike agents, which are similarly independent but asynchronous).

--Steve

Mark Engelberg

unread,

Dec 21, 2008, 7:45:41 PM12/21/08

to clo...@googlegroups.com

But if mk-counter is called twice because it's retried in part of a
transaction, then you're in big trouble when you use atom. Better to
use a ref here. atom needs to be reserved for the very few cases when
retries don't matter (like a cache).

Parth Malwankar

unread,

Dec 21, 2008, 10:31:31 PM12/21/08

to Clojure

On Dec 22, 5:24 am, "Brian Doyle" <brianpdo...@gmail.com> wrote:
> I haven't been following the new atom stuff, so I was wondering why atom
> would be best in this
> situation, vs a ref? Thanks.

Rich discusses the use of atoms, refs and agents in good detail
in this thread:
http://groups.google.com/group/clojure/msg/fd0371eb7238e933

In case you don't want multiple counters but just one,
the following can also be done.

user=> (let [n (atom 0)] (defn counter [] (swap! n inc)))
#'user/counter

user=> (counter)
1
user=> (counter)
2
user=> (counter)
3

Parth

>

> On Sun, Dec 21, 2008 at 1:03 PM, Parth Malwankar

> <parth.malwan...@gmail.com>wrote:

Parth Malwankar

unread,

Dec 21, 2008, 10:41:15 PM12/21/08

to Clojure

If I understand it right, as long as the "counter" is used
within a single thread and not across threads there shouldn't
be any issues. Same as a cache.

If the idea is to use one counter across multiple threads
then refs can be used.

I don't think I follow why mk-counter would be retried. There
is not reason for it to fail as it simply creates a new "counter"
and returns it and doesn't need to block or be blocked.

Parth

Mark Engelberg

unread,

Dec 22, 2008, 2:25:20 AM12/22/08

to clo...@googlegroups.com

I misspoke; it's the call to counter that's the problem. Let's say
you want to use a counter to count the number of times a ref is set,
something like this:

(dosync (counter) (ref-set r 1))

If your var-set causes the transaction to retry, an atom-based counter
will increment twice. As I understand it, atoms are one of the most
"dangerous" things in Clojure, and should be avoided unless you're
completely sure it will not change the semantics of your program if it
gets executed multiple times. Aside from the memoization example for
which it was invented, I am hard-pressed to think of a good use for
atoms. For something like a counter, you really have to use ref, and
that should remain people's default when dealing with mutability.

I haven't done much with atoms yet, so if I've misunderstood Rich's
posts, feel free to explain my mistake.

Parth Malwankar

unread,

Dec 22, 2008, 7:23:32 AM12/22/08

to Clojure

On Dec 22, 12:25 pm, "Mark Engelberg" <mark.engelb...@gmail.com>
wrote:

Thats a valid example. "counter" should not be used
in the dosync.

However, if the counter is meant for a single thread then one
way to use it would be:

... (counter) (dosync (ref-set r 1)) ...

Basically, as counter has side-effects it shouldn't be called
in dosync as stated in the docs ( http://clojure.org/refs )

To quote the docs:
7. I/O and other activities with side-effects should be avoided
in transactions, since transactions will be retried. The io!
macro can be used to prevent the use of an impure function
in a transaction.

End quote.

The above is a general rule so for e.g. we should not
have (dosync (println "done setting") (ref-set r 1)). If my
understanding is correct, this should be:

... (println "done setting") (dosync (ref-set r 1)) ...

If we want a counter to count events across
multiple threads, then yes, it would make sense to
use refs.

If I get it right, atoms are quite useful to maintain state
in the context of a single thread with memoization and
counter (within a thread) being two examples.

There may possibly be performance implications
for refs and atoms but I haven't really benchmarked it.

Parth

J. McConnell

unread,

Dec 22, 2008, 7:33:05 AM12/22/08

to clo...@googlegroups.com

On Mon, Dec 22, 2008 at 2:25 AM, Mark Engelberg
<mark.en...@gmail.com> wrote:
>
> Aside from the memoization example for
> which it was invented, I am hard-pressed to think of a good use for
> atoms.

Not having used them myself, I can't think of many good examples
either. However, one in addition to the cache example would be an
auto-incrementing identifier; something like a database sequence. It's
semantics wouldn't change if there were gaps in the produced values as
long as all ID's were unique.

- J.

Mark Engelberg

unread,

Dec 22, 2008, 7:34:36 AM12/22/08

to clo...@googlegroups.com

On Mon, Dec 22, 2008 at 4:23 AM, Parth Malwankar
<parth.m...@gmail.com> wrote:
> If I get it right, atoms are quite useful to maintain state
> in the context of a single thread with memoization and
> counter (within a thread) being two examples.

No, RH said that atoms were definitely intended for multiple threads,
not just single threads. But their use is highly specific. With
memoization, it doesn't matter if things get retried, as long as
things don't get "lost". atom basically guarantees that the ref and
the set occur atomically (via swap), so you don't have to worry about
two threads losing something from the cache as follows:
Current cache {:a 1 :b 2}
One thread tries to add :c 3, and another tries to add :d 4.
Without atomic swap, one thread could try to update the cache to {:a 1
:b 2 :c 3} and the other to {:a 1 :b 2 :d 4} (because they are both
basing their updates on what they see). Whichever one wins, one of
the values will be "lost" from the cache.)
So atoms make this one guarantee, allowing safe multithread
memoization, but at great risk for other types of applications,
because most "seemingly-obvious" uses for atoms would probably be
hosed by the possible retry.

I fear a lot of people are going to end up misusing atoms. I assume
they were necessary to make memoization perform better than under the
ref-with-commute approach.

Adrian Cuthbertson

unread,

Dec 26, 2008, 11:35:36 PM12/26/08

to Clojure

On Dec 22, 2:34 pm, "Mark Engelberg" <mark.engelb...@gmail.com> wrote:
> On Mon, Dec 22, 2008 at 4:23 AM, Parth Malwankar
>

> <parth.malwan...@gmail.com> wrote:
> > If I get it right, atoms are quite useful to maintain state
> > in the context of a single thread with memoization and
> > counter (within a thread) being two examples.
>
> No, RH said that atoms were definitely intended for multiple threads,
> not just single threads. But their use is highly specific. With
> memoization, it doesn't matter if things get retried, as long as

> things don't get "lost". atombasically guarantees that the ref and

> the set occur atomically (via swap), so you don't have to worry about
> two threads losing something from the cache as follows:
> Current cache {:a 1 :b 2}
> One thread tries to add :c 3, and another tries to add :d 4.
> Without atomic swap, one thread could try to update the cache to {:a 1
> :b 2 :c 3} and the other to {:a 1 :b 2 :d 4} (because they are both
> basing their updates on what they see). Whichever one wins, one of
> the values will be "lost" from the cache.)
> So atoms make this one guarantee, allowing safe multithread
> memoization, but at great risk for other types of applications,
> because most "seemingly-obvious" uses for atoms would probably be
> hosed by the possible retry.
>
> I fear a lot of people are going to end up misusing atoms. I assume
> they were necessary to make memoization perform better than under the
> ref-with-commute approach.

It's important to distinguish between updating atoms within
transactions and outside transactions. In the former case, one has to
ensure the update function can be retried without ill-effects.
However, outside a transaction, atoms are just mutable values, that
can safely be shared between threads, provided that their updating
does not need to be coordinated with other updates (to other atoms,
refs or agents).

Here's an example; say we have a multi-threaded service where we wish
to share a single counter to use as say a serial id. The following
code initializes serid to 0 and provides a function incserid to
increment it.

(def serid (atom 0))
(defn incserid [] (swap! serid inc))
(defn docount [n cntr] (dotimes [ind n] (cntr)))

Now, the following uses a thread factory to instantiate (nthreads)
number of threads, each which will execute the above incrementor
(ncount) number of times...

(import '(java.util.concurrent Executors))
(def th-factory (Executors/newSingleThreadExecutor))
(defn do-thrds [nthreads ncount] (dotimes [t nthreads] (.submit th-
factory (partial docount ncount incserid))))

So, if we then initiate 100 threads to each increment the serid 10000
times...

(do-thrds 100 10000)
@serid
=>1000000

We see that the threads have happily shared the atom and updated it
correctly with no ill-effects. Note the lack of requirement for
dosync.

So in summary, atoms are great (and should be used in preference to
refs) where the shared state needs to be mutable, shared and
independent (not requiring coordinated update with other objects).

Regards, Adrian.

Mark Engelberg

unread,

Dec 27, 2008, 12:03:41 AM12/27/08

to clo...@googlegroups.com

On Fri, Dec 26, 2008 at 8:35 PM, Adrian Cuthbertson
<adrian.cu...@gmail.com> wrote:
> It's important to distinguish between updating atoms within
> transactions and outside transactions. In the former case, one has to
> ensure the update function can be retried without ill-effects.
> However, outside a transaction, atoms are just mutable values, that
> can safely be shared between threads, provided that their updating
> does not need to be coordinated with other updates (to other atoms,
> refs or agents).

Yes, but when you write your atom-based code, you have no way to know
whether you or others who want to reuse it will want to use it as part
of a transaction. atom-based code is not generally safe for
transactions, which is why I suggested it should be avoided. In your
example, if incserid is used in a transaction, it is possible that
certain serids will be skipped. This may be acceptable, in which
case, go ahead and use an atom, but often programs rely on subtle
assumptions (like the serids will be issued consecutively), and your
program can become brittle if there's a chance your code won't follow
these assumptions. Probably better off not to take the chance. Stick
with something like refs which will yield more predictable behavior,
and thus be easier to test. Memoization is a very special exception,
because it really doesn't matter if something gets cached more than
once.

The whole point of Clojure seems to make as much code as possible safe
for its software transactional memory. Thus the persistent data
structures, and other design details. Although interoperating with
Java also produces risk within transaction, generally speaking, if you
stay within the "Clojure core", you're safe for transactions. Except
atoms.

Rich Hickey

unread,

Dec 29, 2008, 11:05:56 AM12/29/08

to clo...@googlegroups.com

It is certainly not the whole point of Clojure to make as much code as
possible safe for its software transactional memory. Clojure is a set
of tools. They are designed to allow for robust programs to be built,
including multithreaded programs. STM is one of those tools, but is
not a universal answer.

I think it would be wise to avoid sweeping generalizations - sweeping
generalizations are always wrong :)

There are many things that will never be safe inside transactions -
I/O in particular, and you can look at many forms of mutation (e.g.
all the Java OO mutation) as I/O of a sort. That doesn't mean that
these things aren't useful, or should be avoided. Good practice
programming with STM involves segregating the I/O portion of your code
from the transactional portion, and minimizing the footprint of your
transactions in general.

I think many people look at the facilities provided by Clojure and
hope there's also some magic recipe for good multithreaded designs.
There isn't. Even with Clojure's (or any language's) tools, you have
to think.

I don't disagree that refs should be your first choice, and that atoms
are special purpose tools for more experienced users. But they are not
just for memoization caches.

Taking the case at hand, ID generation. A modern multithreaded program
would try to avoid monotonically increasing consecutive IDs, as any
implementation of that would be a severe concurrency bottleneck. If
you were to use a ref as an ID source, every transaction would have to
line up for access to that ref. Yes, it will be predictable - with
predictably bad scalability. Using an atom for this can seriously
improve the throughput of transactions, by removing what might be the
only ref they share.

I think everyone should try to avoid dispensing advice from
theoretical arguments. You need to understand your tools, the
tradeoffs they involve, the use case at hand, and make good decisions.

Rich

Mark Engelberg

unread,

Dec 29, 2008, 2:29:56 PM12/29/08

to clo...@googlegroups.com

On Mon, Dec 29, 2008 at 8:05 AM, Rich Hickey <richh...@gmail.com> wrote:
> It is certainly not the whole point of Clojure to make as much code as
> possible safe for its software transactional memory. Clojure is a set
> of tools. They are designed to allow for robust programs to be built,
> including multithreaded programs. STM is one of those tools, but is
> not a universal answer.

It's good to hear you say this, because I agree. I was "dispensing
advice" based on my perception of your philosophy (and because I sense
from discussions that most people are assuming atoms are safe in ways
they really aren't). But in fact, I think that Clojure's tools for
mutability don't yet go far enough, and I'd prefer to have a few more
"unsafe" things in the toolbox for experienced programmers.

For example, mutability is extremely useful for
constructing/initializing complex data structures, especially ones
that have cyclic references. (Or think about how StringBuffer is used
to set up a string with mutability, and then it is delivered as an
immutable String). Also, mutability of local variables can be handy
when implementing a complex "classic" algorithm that is described as a
sequence of imperative steps, in order to keep the form of the code as
close as possible to the source.

This sort of local mutability has no impact on referential
transparency, and is really quite safe when used properly. But none
of the existing types of mutability seem like a good fit for this
need. It seems like overkill to have to use a ref or atom with their
transaction or swapping syntaxes in order to mutate something that
will never be observed as mutable by the outside world.

Rich Hickey

unread,

Dec 29, 2008, 3:40:16 PM12/29/08

to Clojure

On Dec 29, 2:29 pm, "Mark Engelberg" <mark.engelb...@gmail.com> wrote:

People who know what they are doing can do these things right now with
Clojure's array support. There really isn't any more value for Clojure
to add to that, so no special primitives. I fully accept the necessity
of doing that at times, and the correctness of internally mutating,
but externally referentially transparent functions, and Clojure has
several of them. That's one of the reasons Clojure isn't 'pure'. OTOH
it's not a good argument for mutable local vars. What other "unsafe
things" are you looking for?

People could have used j.u.c.atomic also, but atom provides a unified
interface consistent with the other reference types, and using swap!
encourages a race-free discipline most people wouldn't pursue with
plain mutable locals and often get wrong with CAS.

Rich

Dave Griffith

unread,

Dec 29, 2008, 5:57:30 PM12/29/08

to Clojure

It looks like the mutable locals use case is covered by the "with-
local-vars" binding form. That said, I'm not sure how useful this
would be. Even in Java 5, 95% of my local vars are immutable, i.e
annotated as final and never have any mutating methods called on
them. Most of the rest are simple accumulators: StringBuffers,
collections which are filled then used then tossed, or summations. In
Clojure these would be replaced by some sort of reduction, requiring
no . The only time I could see myself using with-local-vars is in
some complex graph-theoretic code, of the sort I write maybe once a .

Mark Engelberg

unread,

Dec 29, 2008, 8:08:31 PM12/29/08

to clo...@googlegroups.com

On Mon, Dec 29, 2008 at 12:40 PM, Rich Hickey <richh...@gmail.com> wrote:
> People who know what they are doing can do these things right now with
> Clojure's array support. There really isn't any more value for Clojure
> to add to that, so no special primitives. I fully accept the necessity
> of doing that at times, and the correctness of internally mutating,
> but externally referentially transparent functions, and Clojure has
> several of them. That's one of the reasons Clojure isn't 'pure'. OTOH
> it's not a good argument for mutable local vars. What other "unsafe
> things" are you looking for?

After giving this some more thought, I think the absolutely simplest
way to further improve Clojure's mutability support would be to add an
atom-set function for the cases where you really want to clobber the
contents of atom and don't care what the contents already are. This
takes the "variable which needs no synchronization" thing one step
further, allowing for fastest speed in those situations.

The semantics would be like this:
(defn atom-set [a val]
(swap! a (constantly val)))

But presumably it could be implemented in the Clojure core in a way
that is even faster than the above implementation, since it doesn't
need to do the check that the contents haven't changed before setting.
I think this is consistent with the idea of atom as a power-user's
tool for getting the best possible performance when synchronization is
not required.

Mark Engelberg

unread,

Dec 29, 2008, 8:20:25 PM12/29/08

to clo...@googlegroups.com

On Mon, Dec 29, 2008 at 2:57 PM, Dave Griffith
<dave.l....@gmail.com> wrote:
>
> It looks like the mutable locals use case is covered by the "with-
> local-vars" binding form.

Not really. with-local-vars has somewhat surprising semantics.

For example, you'd expect this (contrived) function to generate an
"add 2" function:
(defn create-add-2 []
(with-local-vars [x 1]
(do
(var-set x 2)
(fn [y] (+ y (var-get x))))))

But in fact, it just generates a function which errors.

If Clojure had some sort of "mutable local" binding construct, I would
expect this to work:
(defn create-add-2 []
(mutable [x 1]
(do
(set! x 2)
(fn [y] (+ x y)))))
because you should be able to refer to a mutable local inside of a closure.

But any attempt to *set* the local in the closure would generate an
error, because you really should be using refs for something like
this:
(defn create-growing-adder []
(mutable [x 1]
(do
(set! x 2)
(fn [y] (do (set! x (inc x)) (+ x y))))))

I think if Clojure could do something like this (enforce a certain
kind of referentially transparent mutable local), that would be neat,
but just extending the interface for atoms with atom-set (as I
proposed in my previous post) is probably a perfectly fine and more
realistic solution.

Timothy Pratley

unread,

Dec 29, 2008, 9:54:34 PM12/29/08

to Clojure

> I think if Clojure could do something like this (enforce a certain
> kind of referentially transparent mutable local), that would be neat,

It is possible to achieve this behavior explicitly:

(defn create-add-2 []
(with-local-vars [x 1]
(do
(var-set x 2)

(let [z @x]
(fn [y] (+ y z))))))
(def myc (create-add-2))
user=> (myc 1)
3
user=> (myc 2)
4

A bit verbose, but also very clear about what is really mutable and
what isn't.

Rich Hickey

unread,

Dec 30, 2008, 8:53:41 AM12/30/08

to Clojure

On Dec 29, 8:08 pm, "Mark Engelberg" <mark.engelb...@gmail.com> wrote:

Could you provide an example of when you would need/use that?

Rich

Mark Engelberg

unread,

Dec 30, 2008, 6:29:23 PM12/30/08

to clo...@googlegroups.com

On Tue, Dec 30, 2008 at 5:53 AM, Rich Hickey <richh...@gmail.com> wrote:
> Could you provide an example of when you would need/use that?
>

Sure.

Use Case #1: Implementing classic imperative algorithms

Consider the binary gcd algorithm on page 338 of The Art of Computer
Programmiing, volume 2. This is a very clever implementation of gcd
using only bit shifting and parity checking. Like many classic
algorithms, it is written in a very imperative style. The following
is a direct translation to Clojure:

(defn binary-gcd [a b]
(let [k (atom 0), u (atom a), v (atom b), t (atom 0),
algsteps {:B1 (fn [] (if (and (even? @u) (even? @v))
(do (atom-set k (inc @k))
(atom-set u (bit-shift-right @u 1))
(atom-set v (bit-shift-right @v 1))
:B1)
:B2)),
:B2 (fn [] (if (odd? @u)
(do (atom-set t (- @v)) :B4)
(do (atom-set t @u) :B3))),
:B3 (fn [] (atom-set t (bit-shift-right @t 1)) :B4),
:B4 (fn [] (if (even? @t) :B3 :B5))
:B5 (fn [] (if (> @t 0)
(atom-set u @t)
(atom-set v (- @t)))
:B6),
:B6 (fn [] (atom-set t (- @u @v))
(if (not (zero? @t)) :B3 (bit-shift-left @u @k)))}]
(loop [step :B1]
(let [next ((algsteps step))]
(if (number? next) next (recur next))))))

To test this code, I used the following implementation of atom-set:

(defn atom-set [a val]
(swap! a (constantly val)))

Now, the code would certainly be cleaner if Clojure had tail-call
optimization and letrec, because you could set each algorithmic step
up as a mutually recursive function, rather than storing the steps in
a hash table, and setting up a driver loop to handle the state changes
as in a state machine. Or if Clojure just had letrec, this could be
expressed using the new trampoline construct.

But that's not really the point. The point here is that this code
took me no effort to translate from the Knuth book, it worked on the
first run with no debugging needed, and anyone can easily compare this
code against the Knuth pseudocode, and see at a glance that this is a
correct implementation of that algorithm. There may be ways to
convert this to a functional style (and I encourage others to try it
as an interesting exercise), but with any such transformation, it will
be significantly more difficult to confirm that the program correctly
implements the algorithm.

About half of the atom-sets are updates that could use the swap!
function, but many of these are replacing the atom's contents with
something completely unrelated to its existing contents.

Use Case #2: setting up mutually referential data structures

A common way to set up mutually referential data structures is to
start them off with links that are set to nil, and then use imperative
setting to direct the links at the right things. As a quick example,
consider the cyclic-linked-list solution to the classic Josephus
problem. Here is one way to set up a cyclic ring of people:

(defn make-person [n]
{:id n, :link (atom nil)})

(defn link-people [p1 p2]
(atom-set (p1 :link) p2))

(defn ring-people [n]
(let [vec-people (vec (map make-person (range n)))]
(dotimes [i (dec n)] (link-people (vec-people i) (vec-people (inc i))))
(link-people (vec-people (dec n)) (vec-people 0))
(vec-people 0)))

Now depending on how this is going to be used, you could argue that
maybe you're better served by refs. But if the whole point of atoms
is to give power-users another, faster choice when no coordination is
needed, why not give the programmer a choice here? If I know for sure
that I'm just going to use this ring within a single thread, or use it
internally in a function that generates and uses the ring without ever
exposing the ring outside that function, then an atom may be the right
tool for the job.

binarygcd.clj

josephus.clj

Pinocchio

unread,

Dec 30, 2008, 10:19:44 PM12/30/08

to clo...@googlegroups.com

I am trying to understand the arguments here... so here is a summary. Please let me know if I missed something:

1) It is possible to get "internally" mutable state using with-local-vars and var-set. The problem here is that the "internal" state cannot be operated on my multiple threads concurrently. Thus functions defined this way will not automatically scale on multi-cores. One may have to be careful while wrapping such state in a closure and accessing it from multiple threads. However, they *will* allow to "naturally" write certain algorithms containing local mutation.
2) It is possible to get "internally" mutable state using clojure arrays and recur style re-binding. The problem here is that recur style rebinding may not be expressive enough to "naturally" encode certain algorithms.
3) It is possible to get "internally" mutable state using atoms (along with atom-set for better readability and convenience)... which seems like a nice suggestion. The problem here is the generic problem with atoms which should be used carefully under do-sync. Particularly, if an atom is indeed an "internal" state in a function's closure, and the closure itself is used in a transaction then we have problems due to transaction retries. So, we either use atoms assuming that the function closures *will* be retried in transactions which limits the situations in which atoms can be used or we make sure we document such function closures so that they are not used in do-sync (may be clojure can print a warning if it sees an atom being used inside a do-sync).

However, this problem occurs only if there is a closure. If the function is retried and the internal state is created afresh then do we still have a problem? Is this a valid way to handle "internal" mutable state which is not encapsulated in closures?

So can we conclude that atoms with atom-set can be freely used in functions which don't return closures around them and if we really need to do that, we should use refs instead (or use one of the first two options given in the beginning)?

Pinocchio

Rich Hickey

unread,

Dec 30, 2008, 11:38:36 PM12/30/08

to clo...@googlegroups.com

I don't see these as being compelling arguments for atom-set. You can
do these things with Java arrays or other Java things like
clojure.lang.Box.

OTOH, atom-set just invites the read-modify-write race conditions
swap! was meant to avoid.

There's simply no value for Clojure to add to a simple mutable box.
Clojure does provide the tools for low-level mutation - access to
Java. You can wrap that in whatever functions/macros you like.

Rich

Mark Engelberg

unread,

Dec 31, 2008, 12:30:04 AM12/31/08

to clo...@googlegroups.com

On Tue, Dec 30, 2008 at 8:38 PM, Rich Hickey <richh...@gmail.com> wrote:
> There's simply no value for Clojure to add to a simple mutable box.
> Clojure does provide the tools for low-level mutation - access to
> Java. You can wrap that in whatever functions/macros you like.
>

There's no way to use the @ dereferencing syntax for a custom mutable
box, right? The sample I gave would be considerably harder to read if
you had to use (box-ref u) instead of @u everywhere.

> OTOH, atom-set just invites the read-modify-write race conditions
> swap! was meant to avoid.

Yes, but not every write to an atom is going to be a read-modify-write
kind of change, which is why it feels odd to always be limited to
swap!.

Since my examples emphasized the "simple mutable box" benefits of
having atom-set, and you didn't find that compelling, let's go back to
the discussion earlier in this thread about using an atom to maintain
a counter that increments to generate unique IDs. You've already
stated that this is one of the uses that is a good fit for atom.

It would be entirely reasonable to want to, under certain conditions,
to reset the counter to 0. When you reset to 0, that's just a
straight write. Having to say (swap! counter (constantly 0)) feels
convoluted, and somewhat obscures the fact that this change is not
dependent on the previous state of the atom. Furthermore, by having
to represent this reset as a swap, the counter reset might not happen
as quickly as one would like. If there is a lot of contention and
other threads are in the process of incrementing the counter, the
reset might fail to go through for a while, because the value is
frequently changed by other threads between the unnecessary read and
the write of 0. An atom-set would not only reflect the intentions of
the code better, but would provide better performance for the times
where a read is irrelevant to the value being written.

Yes, atom-set invites a certain potential for misuse, but atoms
already require a certain degree of care.

Rich Hickey

unread,

Dec 31, 2008, 9:27:26 AM12/31/08

to clo...@googlegroups.com

On Dec 31, 2008, at 12:30 AM, Mark Engelberg wrote:

>
> On Tue, Dec 30, 2008 at 8:38 PM, Rich Hickey <richh...@gmail.com>
> wrote:
>> There's simply no value for Clojure to add to a simple mutable box.
>> Clojure does provide the tools for low-level mutation - access to
>> Java. You can wrap that in whatever functions/macros you like.
>>
>
> There's no way to use the @ dereferencing syntax for a custom mutable
> box, right?

Not true. Just implement clojure.lang.IRef and deref/@ will work.

> The sample I gave would be considerably harder to read if
> you had to use (box-ref u) instead of @u everywhere.
>
>> OTOH, atom-set just invites the read-modify-write race conditions
>> swap! was meant to avoid.
>
> Yes, but not every write to an atom is going to be a read-modify-write
> kind of change, which is why it feels odd to always be limited to
> swap!.
>

It doesn't matter that it's not a read-modify-write change, it matters
that you think about other consumers of the atom. swap! makes you do
that.

> Since my examples emphasized the "simple mutable box" benefits of
> having atom-set, and you didn't find that compelling, let's go back to
> the discussion earlier in this thread about using an atom to maintain
> a counter that increments to generate unique IDs. You've already
> stated that this is one of the uses that is a good fit for atom.
>
> It would be entirely reasonable to want to, under certain conditions,
> to reset the counter to 0. When you reset to 0, that's just a
> straight write. Having to say (swap! counter (constantly 0)) feels
> convoluted, and somewhat obscures the fact that this change is not
> dependent on the previous state of the atom. Furthermore, by having
> to represent this reset as a swap, the counter reset might not happen
> as quickly as one would like. If there is a lot of contention and
> other threads are in the process of incrementing the counter, the
> reset might fail to go through for a while, because the value is
> frequently changed by other threads between the unnecessary read and
> the write of 0. An atom-set would not only reflect the intentions of
> the code better, but would provide better performance for the times
> where a read is irrelevant to the value being written.
>

This is a theoretical argument again, and remains unconvincing. (swap!
counter (constantly 0)) may feel bad, but it should - you're trashing
a shared reference with no regard for its contents. And if you have
code where the counter is so hot a one-time swapping reset is a perf
issue, you have bigger problems.

One thing is certain, right now code like this is unlikely:

(let [val (inc @a)]
(swap! a (constantly val)))

Whereas with atom-set in the API, code like this is likely:

(atom-set a (inc @a)) ;broken - race condition

after all, it looks just like its (correct) ref counterpart:

(ref-set r (inc @r))

> Yes, atom-set invites a certain potential for misuse, but atoms
> already require a certain degree of care.
>

This is not the kind of criteria I want to use when designing. With
that logic Clojure could just be a free-for-all like Java, which also
requires a 'certain degree of care'.

That said, I think a big part of the problem here lies in the name. If
it was e.g. reinitialize-atom or reset-atom I might feel differently.

I also think that your use cases for atoms for local mutation are a
mismatch. atoms are about sharing. You really want something else for
private/local mutable references, and I have some ideas for that.

Rich

Mark Engelberg

unread,

Dec 31, 2008, 1:20:40 PM12/31/08

to clo...@googlegroups.com

On Wed, Dec 31, 2008 at 6:27 AM, Rich Hickey <richh...@gmail.com> wrote:
> I also think that your use cases for atoms for local mutation are a
> mismatch. atoms are about sharing. You really want something else for
> private/local mutable references, and I have some ideas for that.

You're right that I'm basically asking for something that works well
for private/local mutable references. It's possible right now to use
with-local-vars, or atoms, or refs, but none of them quite fill that
need perfectly (by design).

My impression so far of atoms is that they are "underpowered", and
suitable for a very small set of use cases, most of which you could
have already done with refs and commute (albeit with a slight
performance penalty). Since atoms are side-effecting, their
composability with other transaction-based Clojure code is limited.
Yes, my feelings about this are entirely theoretical, and maybe in
practice, as more and more Clojure code is written, I'll get to see
more good uses for atoms. That's part of the fun of working with a
new and vibrant language that is full of possibilities.

But this perception I have is why I like the idea of extending atom's
capabilities to cover the local mutable reference use cases. I like
the idea of thinking of an atom as a simple box, with a cool bonus
feature of being able to atomically swap its contents, providing a
step up in safety without going as far as a ref.

Anyway, it's great to hear you have some more ideas on the subject,
and I look forward to seeing what you come up with.

Rich Hickey

unread,

Dec 31, 2008, 1:56:58 PM12/31/08

to Clojure

On Dec 31, 1:20 pm, "Mark Engelberg" <mark.engelb...@gmail.com> wrote:

> On Wed, Dec 31, 2008 at 6:27 AM, Rich Hickey <richhic...@gmail.com> wrote:
> > I also think that your use cases for atoms for local mutation are a
> > mismatch. atoms are about sharing. You really want something else for
> > private/local mutable references, and I have some ideas for that.
>

> My impression so far of atoms is that they are "underpowered", and
> suitable for a very small set of use cases, most of which you could
> have already done with refs and commute (albeit with a slight
> performance penalty). Since atoms are side-effecting, their
> composability with other transaction-based Clojure code is limited.

I wish you would stop repeating that. I've give you concrete use cases
where atoms might be essential to good scalability in a transactional
context, lest every transaction that uses a memoized function or ID
generator have to serialize on it.

Their compatibility with transactions can be completely irrelevant for
non-transactional subsystems where atoms can form a useful substrate
for lock-free synchronous state.

And, in subsystems using transactions, all effort must be applied to
keeping transactions small, and most code out of them. Using atoms can
help enable that where refs are not necessary.

Rich

Luc Prefontaine

unread,

Dec 31, 2008, 3:54:42 PM12/31/08

to clo...@googlegroups.com

I have not yet written thousands of line of parallel code in Clojure (I am in this learning curve as of now while experimenting with Terracotta).
However I can compare with other frameworks I've used in the past: ASTs, Event Flags,... on VMS,
semaphores, condition variables, mutexes, ... in libthread on various platforms, Java with its simplistic approach,...)
and I fully agree with Rich.

There are always "simple shareable states" in any parallel system and you need an economical way
to share these without requiring the full blown thing (taking a mutex before accessing/changing stuff, ....).
Semaphores are a good example of state that can be shared and changed atomically with simple calls.
Dig a bit and you will find them in many places.

Having worked with a lot of real time systems I can guarantee that there are many spots where such mechanisms are
a necessity.

In java it's worse than in other language because many of these things have been hidden with
"nice" keywords (synchronized, wait, ...). Other features have been dropped because people were
killing their apps by lack of knowledge. Other features were not considered because they were not seen as required... yet.
The implementation details in Java are so hidden to most people that there is
seldom a discussion about the cost of using a specific feature and the impacts it has on parallelism.

Parallel code is not mainstream yet in businesses aside from Swing code (:)), I hope Clojure will change that a lot),
most coders do not deal with these issues very often or in a very superficial way.

I expect this will have to change when parallel designs will become common.
I would rather have more options than less to tune up my design and increase throughput... so atoms are welcomed.

Luc

>From a guy who's been doing a lot of multi threaded apps for more than twenty years (for those too young to remember,
that was BEFORE libthread came to life :))) and yes I have grey hairs :)))

Timothy Pratley

unread,

Dec 31, 2008, 11:09:51 PM12/31/08

to Clojure

> On Dec 30, 2008, at 6:29 PM, Mark Engelberg wrote:

> Use Case #1: Implementing classic imperative algorithm (GDC)

I replaced your (atoms) using (with-local-vars) and the function runs
perfectly fine. The local vars are not closed over, so they cannot
leak, and the code is cleaner. So this example isn't a case for atom-
set in my opinion.

> Use Case #2: setting up mutually referential data structures

The visible choices are ref and atom.

Atoms are non transactional/coordinated. Atoms can only be set by
providing a modifying function. Lets look at an imaginary example if
you were able to set them:
(atom-set a (+ @a (* @a @a)))
The value of @a can actually change halfway through evaluating (+ @a
(* @a @a)) giving a misleading result. Consistency is maintained but
the semantics are deceptive. The atom could be set to any combination
of old value and new value run through that function. atom-set
encourages transactional style statements without their guarantees.
Instead there is swap! which reads the current value, applies the
function to it, and will only commit the result if the atom has not
changed. If another thread has changed the value, it will retry until
successful.

(swap! a + (* @a @a))
is just as deceptive. Because swap! is a function, (* @a @a) is
evaluated before swap! is called, resulting in the exact same race
condition described with atom-set. It is unnecessary to use @a with
swap! because the correct value of the atom to use is passed in to the
function you provide to swap! The correct way to write this particular
example would be:
(swap! a #(+ %1 (* %1 %1)))

It is very easy to define an efficient atom-set:
(defn atom-set [a val] (.set a val) val)
But there is no way to prevent users doing things like (atom-set a
(inc @)) which is incorrect. Neither can swap! prevent users from
dereferencing the target atom, however there is a clear distinction in
the access method which reminds and encourages correct usage.

> It would be entirely reasonable to want to, under certain conditions,
> to reset the counter to 0. When you reset to 0, that's just a
> straight write. Having to say (swap! counter (constantly 0)) feels
> convoluted, and somewhat obscures the fact that this change is not
> dependent on the previous state of the atom. Furthermore, by having
> to represent this reset as a swap, the counter reset might not happen
> as quickly as one would like. If there is a lot of contention and
> other threads are in the process of incrementing the counter, the
> reset might fail to go through for a while, because the value is
> frequently changed by other threads between the unnecessary read and
> the write of 0. An atom-set would not only reflect the intentions of
> the code better, but would provide better performance for the times
> where a read is irrelevant to the value being written.

(.set a 0)
Provides precisely the behavior you describe.
Unfortunately It is not obvious that this is the case unless you look
up the java docs.
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/atomic/AtomicReferenceFieldUpdater.html

Perhaps the swap! doc string should explicitly describe the danger:
"The function passed to swap should not enclose the current atom value
as this may create a race condition, instead it should always take the
atom value as an argument. There is no ref-set as it would likely
promote such usage. You can set atoms by calling the java .set method,
but remember that the current atom value should not be used to set a
new value."

On Dec 31 2008, 2:19 pm, Pinocchio <cchino...@gmail.com> wrote:
> 1) It is possible to get "internally" mutable state using
> with-local-vars and var-set. The problem here is that the "internal"
> state cannot be operated on my multiple threads concurrently. Thus
> functions defined this way will not automatically scale on multi-cores.
> One may have to be careful while wrapping such state in a closure and
> accessing it from multiple threads. However, they *will* allow to
> "naturally" write certain algorithms containing local mutation.

(with-local-vars) has no multi-threading issues, as they are not
shared in any way. To make this true they are not retained in
closures:
user=> (def f (with-local-vars [a 2] #(+ 1 (var-get a))))
#=(var user/f)
user=> (f)
java.lang.IllegalStateException: Var null is unbound. (NO_SOURCE_FILE:
0)

> 2) It is possible to get "internally" mutable state using clojure arrays
> and recur style re-binding. The problem here is that recur style
> rebinding may not be expressive enough to "naturally" encode certain
> algorithms.

Clojure does require you to be explicit about state mutation, but in
my view that does not diminish the expressiveness. I'm yet to come
across a case where an algorithm can not be directly translated
retaining imperative style.

> 3) It is possible to get "internally" mutable state using atoms (along
> with atom-set for better readability and convenience)... which seems
> like a nice suggestion. The problem here is the generic problem with
> atoms which should be used carefully under do-sync. Particularly, if an
> atom is indeed an "internal" state in a function's closure, and the
> closure itself is used in a transaction then we have problems due to
> transaction retries. So, we either use atoms assuming that the function
> closures *will* be retried in transactions which limits the situations
> in which atoms can be used or we make sure we document such function
> closures so that they are not used in do-sync (may be clojure can print
> a warning if it sees an atom being used inside a do-sync).

"problems due to transaction retries" is not atom specific, but does
indeed highlight how atoms could be used with unexpected consequences.
I would say that ref is the mutable with least surprising semantics,
so a good default choice unless one have specific goal in mind.

> However, this problem occurs only if there is a closure. If the function
> is retried and the internal state is created afresh then do we still
> have a problem? Is this a valid way to handle "internal" mutable state
> which is not encapsulated in closures?

Correct, a mutable can only 'leak' from a function if it is returned
or captured in a return... ie: a closure formed on it. Hence for
unclosed mutation I think with-local-vars is the nicest. Why have
(binding) and (set!) when (with-local-vars) and (var-set) achieves the
same thing in a more explicit fashion? set! is still required for java
interop of course, and I guess (binding) has uses that I don't fully
appreciate yet.

Regards,
Tim.

Reply all

Reply to author

Forward