I haven't been following the new atom stuff, so I was wondering why atom would be best in this
situation, vs a ref? Thanks.
(dosync (counter) (ref-set r 1))
If your var-set causes the transaction to retry, an atom-based counter
will increment twice. As I understand it, atoms are one of the most
"dangerous" things in Clojure, and should be avoided unless you're
completely sure it will not change the semantics of your program if it
gets executed multiple times. Aside from the memoization example for
which it was invented, I am hard-pressed to think of a good use for
atoms. For something like a counter, you really have to use ref, and
that should remain people's default when dealing with mutability.
I haven't done much with atoms yet, so if I've misunderstood Rich's
posts, feel free to explain my mistake.
Not having used them myself, I can't think of many good examples
either. However, one in addition to the cache example would be an
auto-incrementing identifier; something like a database sequence. It's
semantics wouldn't change if there were gaps in the produced values as
long as all ID's were unique.
- J.
No, RH said that atoms were definitely intended for multiple threads,
not just single threads. But their use is highly specific. With
memoization, it doesn't matter if things get retried, as long as
things don't get "lost". atom basically guarantees that the ref and
the set occur atomically (via swap), so you don't have to worry about
two threads losing something from the cache as follows:
Current cache {:a 1 :b 2}
One thread tries to add :c 3, and another tries to add :d 4.
Without atomic swap, one thread could try to update the cache to {:a 1
:b 2 :c 3} and the other to {:a 1 :b 2 :d 4} (because they are both
basing their updates on what they see). Whichever one wins, one of
the values will be "lost" from the cache.)
So atoms make this one guarantee, allowing safe multithread
memoization, but at great risk for other types of applications,
because most "seemingly-obvious" uses for atoms would probably be
hosed by the possible retry.
I fear a lot of people are going to end up misusing atoms. I assume
they were necessary to make memoization perform better than under the
ref-with-commute approach.
Yes, but when you write your atom-based code, you have no way to know
whether you or others who want to reuse it will want to use it as part
of a transaction. atom-based code is not generally safe for
transactions, which is why I suggested it should be avoided. In your
example, if incserid is used in a transaction, it is possible that
certain serids will be skipped. This may be acceptable, in which
case, go ahead and use an atom, but often programs rely on subtle
assumptions (like the serids will be issued consecutively), and your
program can become brittle if there's a chance your code won't follow
these assumptions. Probably better off not to take the chance. Stick
with something like refs which will yield more predictable behavior,
and thus be easier to test. Memoization is a very special exception,
because it really doesn't matter if something gets cached more than
once.
The whole point of Clojure seems to make as much code as possible safe
for its software transactional memory. Thus the persistent data
structures, and other design details. Although interoperating with
Java also produces risk within transaction, generally speaking, if you
stay within the "Clojure core", you're safe for transactions. Except
atoms.
It is certainly not the whole point of Clojure to make as much code as
possible safe for its software transactional memory. Clojure is a set
of tools. They are designed to allow for robust programs to be built,
including multithreaded programs. STM is one of those tools, but is
not a universal answer.
I think it would be wise to avoid sweeping generalizations - sweeping
generalizations are always wrong :)
There are many things that will never be safe inside transactions -
I/O in particular, and you can look at many forms of mutation (e.g.
all the Java OO mutation) as I/O of a sort. That doesn't mean that
these things aren't useful, or should be avoided. Good practice
programming with STM involves segregating the I/O portion of your code
from the transactional portion, and minimizing the footprint of your
transactions in general.
I think many people look at the facilities provided by Clojure and
hope there's also some magic recipe for good multithreaded designs.
There isn't. Even with Clojure's (or any language's) tools, you have
to think.
I don't disagree that refs should be your first choice, and that atoms
are special purpose tools for more experienced users. But they are not
just for memoization caches.
Taking the case at hand, ID generation. A modern multithreaded program
would try to avoid monotonically increasing consecutive IDs, as any
implementation of that would be a severe concurrency bottleneck. If
you were to use a ref as an ID source, every transaction would have to
line up for access to that ref. Yes, it will be predictable - with
predictably bad scalability. Using an atom for this can seriously
improve the throughput of transactions, by removing what might be the
only ref they share.
I think everyone should try to avoid dispensing advice from
theoretical arguments. You need to understand your tools, the
tradeoffs they involve, the use case at hand, and make good decisions.
Rich
It's good to hear you say this, because I agree. I was "dispensing
advice" based on my perception of your philosophy (and because I sense
from discussions that most people are assuming atoms are safe in ways
they really aren't). But in fact, I think that Clojure's tools for
mutability don't yet go far enough, and I'd prefer to have a few more
"unsafe" things in the toolbox for experienced programmers.
For example, mutability is extremely useful for
constructing/initializing complex data structures, especially ones
that have cyclic references. (Or think about how StringBuffer is used
to set up a string with mutability, and then it is delivered as an
immutable String). Also, mutability of local variables can be handy
when implementing a complex "classic" algorithm that is described as a
sequence of imperative steps, in order to keep the form of the code as
close as possible to the source.
This sort of local mutability has no impact on referential
transparency, and is really quite safe when used properly. But none
of the existing types of mutability seem like a good fit for this
need. It seems like overkill to have to use a ref or atom with their
transaction or swapping syntaxes in order to mutate something that
will never be observed as mutable by the outside world.
After giving this some more thought, I think the absolutely simplest
way to further improve Clojure's mutability support would be to add an
atom-set function for the cases where you really want to clobber the
contents of atom and don't care what the contents already are. This
takes the "variable which needs no synchronization" thing one step
further, allowing for fastest speed in those situations.
The semantics would be like this:
(defn atom-set [a val]
(swap! a (constantly val)))
But presumably it could be implemented in the Clojure core in a way
that is even faster than the above implementation, since it doesn't
need to do the check that the contents haven't changed before setting.
I think this is consistent with the idea of atom as a power-user's
tool for getting the best possible performance when synchronization is
not required.
Not really. with-local-vars has somewhat surprising semantics.
For example, you'd expect this (contrived) function to generate an
"add 2" function:
(defn create-add-2 []
(with-local-vars [x 1]
(do
(var-set x 2)
(fn [y] (+ y (var-get x))))))
But in fact, it just generates a function which errors.
If Clojure had some sort of "mutable local" binding construct, I would
expect this to work:
(defn create-add-2 []
(mutable [x 1]
(do
(set! x 2)
(fn [y] (+ x y)))))
because you should be able to refer to a mutable local inside of a closure.
But any attempt to *set* the local in the closure would generate an
error, because you really should be using refs for something like
this:
(defn create-growing-adder []
(mutable [x 1]
(do
(set! x 2)
(fn [y] (do (set! x (inc x)) (+ x y))))))
I think if Clojure could do something like this (enforce a certain
kind of referentially transparent mutable local), that would be neat,
but just extending the interface for atoms with atom-set (as I
proposed in my previous post) is probably a perfectly fine and more
realistic solution.
Sure.
Use Case #1: Implementing classic imperative algorithms
Consider the binary gcd algorithm on page 338 of The Art of Computer
Programmiing, volume 2. This is a very clever implementation of gcd
using only bit shifting and parity checking. Like many classic
algorithms, it is written in a very imperative style. The following
is a direct translation to Clojure:
(defn binary-gcd [a b]
(let [k (atom 0), u (atom a), v (atom b), t (atom 0),
algsteps {:B1 (fn [] (if (and (even? @u) (even? @v))
(do (atom-set k (inc @k))
(atom-set u (bit-shift-right @u 1))
(atom-set v (bit-shift-right @v 1))
:B1)
:B2)),
:B2 (fn [] (if (odd? @u)
(do (atom-set t (- @v)) :B4)
(do (atom-set t @u) :B3))),
:B3 (fn [] (atom-set t (bit-shift-right @t 1)) :B4),
:B4 (fn [] (if (even? @t) :B3 :B5))
:B5 (fn [] (if (> @t 0)
(atom-set u @t)
(atom-set v (- @t)))
:B6),
:B6 (fn [] (atom-set t (- @u @v))
(if (not (zero? @t)) :B3 (bit-shift-left @u @k)))}]
(loop [step :B1]
(let [next ((algsteps step))]
(if (number? next) next (recur next))))))
To test this code, I used the following implementation of atom-set:
(defn atom-set [a val]
(swap! a (constantly val)))
Now, the code would certainly be cleaner if Clojure had tail-call
optimization and letrec, because you could set each algorithmic step
up as a mutually recursive function, rather than storing the steps in
a hash table, and setting up a driver loop to handle the state changes
as in a state machine. Or if Clojure just had letrec, this could be
expressed using the new trampoline construct.
But that's not really the point. The point here is that this code
took me no effort to translate from the Knuth book, it worked on the
first run with no debugging needed, and anyone can easily compare this
code against the Knuth pseudocode, and see at a glance that this is a
correct implementation of that algorithm. There may be ways to
convert this to a functional style (and I encourage others to try it
as an interesting exercise), but with any such transformation, it will
be significantly more difficult to confirm that the program correctly
implements the algorithm.
About half of the atom-sets are updates that could use the swap!
function, but many of these are replacing the atom's contents with
something completely unrelated to its existing contents.
Use Case #2: setting up mutually referential data structures
A common way to set up mutually referential data structures is to
start them off with links that are set to nil, and then use imperative
setting to direct the links at the right things. As a quick example,
consider the cyclic-linked-list solution to the classic Josephus
problem. Here is one way to set up a cyclic ring of people:
(defn make-person [n]
{:id n, :link (atom nil)})
(defn link-people [p1 p2]
(atom-set (p1 :link) p2))
(defn ring-people [n]
(let [vec-people (vec (map make-person (range n)))]
(dotimes [i (dec n)] (link-people (vec-people i) (vec-people (inc i))))
(link-people (vec-people (dec n)) (vec-people 0))
(vec-people 0)))
Now depending on how this is going to be used, you could argue that
maybe you're better served by refs. But if the whole point of atoms
is to give power-users another, faster choice when no coordination is
needed, why not give the programmer a choice here? If I know for sure
that I'm just going to use this ring within a single thread, or use it
internally in a function that generates and uses the ring without ever
exposing the ring outside that function, then an atom may be the right
tool for the job.
I don't see these as being compelling arguments for atom-set. You can
do these things with Java arrays or other Java things like
clojure.lang.Box.
OTOH, atom-set just invites the read-modify-write race conditions
swap! was meant to avoid.
There's simply no value for Clojure to add to a simple mutable box.
Clojure does provide the tools for low-level mutation - access to
Java. You can wrap that in whatever functions/macros you like.
Rich
There's no way to use the @ dereferencing syntax for a custom mutable
box, right? The sample I gave would be considerably harder to read if
you had to use (box-ref u) instead of @u everywhere.
> OTOH, atom-set just invites the read-modify-write race conditions
> swap! was meant to avoid.
Yes, but not every write to an atom is going to be a read-modify-write
kind of change, which is why it feels odd to always be limited to
swap!.
Since my examples emphasized the "simple mutable box" benefits of
having atom-set, and you didn't find that compelling, let's go back to
the discussion earlier in this thread about using an atom to maintain
a counter that increments to generate unique IDs. You've already
stated that this is one of the uses that is a good fit for atom.
It would be entirely reasonable to want to, under certain conditions,
to reset the counter to 0. When you reset to 0, that's just a
straight write. Having to say (swap! counter (constantly 0)) feels
convoluted, and somewhat obscures the fact that this change is not
dependent on the previous state of the atom. Furthermore, by having
to represent this reset as a swap, the counter reset might not happen
as quickly as one would like. If there is a lot of contention and
other threads are in the process of incrementing the counter, the
reset might fail to go through for a while, because the value is
frequently changed by other threads between the unnecessary read and
the write of 0. An atom-set would not only reflect the intentions of
the code better, but would provide better performance for the times
where a read is irrelevant to the value being written.
Yes, atom-set invites a certain potential for misuse, but atoms
already require a certain degree of care.
>
> On Tue, Dec 30, 2008 at 8:38 PM, Rich Hickey <richh...@gmail.com>
> wrote:
>> There's simply no value for Clojure to add to a simple mutable box.
>> Clojure does provide the tools for low-level mutation - access to
>> Java. You can wrap that in whatever functions/macros you like.
>>
>
> There's no way to use the @ dereferencing syntax for a custom mutable
> box, right?
Not true. Just implement clojure.lang.IRef and deref/@ will work.
> The sample I gave would be considerably harder to read if
> you had to use (box-ref u) instead of @u everywhere.
>
>> OTOH, atom-set just invites the read-modify-write race conditions
>> swap! was meant to avoid.
>
> Yes, but not every write to an atom is going to be a read-modify-write
> kind of change, which is why it feels odd to always be limited to
> swap!.
>
It doesn't matter that it's not a read-modify-write change, it matters
that you think about other consumers of the atom. swap! makes you do
that.
> Since my examples emphasized the "simple mutable box" benefits of
> having atom-set, and you didn't find that compelling, let's go back to
> the discussion earlier in this thread about using an atom to maintain
> a counter that increments to generate unique IDs. You've already
> stated that this is one of the uses that is a good fit for atom.
>
> It would be entirely reasonable to want to, under certain conditions,
> to reset the counter to 0. When you reset to 0, that's just a
> straight write. Having to say (swap! counter (constantly 0)) feels
> convoluted, and somewhat obscures the fact that this change is not
> dependent on the previous state of the atom. Furthermore, by having
> to represent this reset as a swap, the counter reset might not happen
> as quickly as one would like. If there is a lot of contention and
> other threads are in the process of incrementing the counter, the
> reset might fail to go through for a while, because the value is
> frequently changed by other threads between the unnecessary read and
> the write of 0. An atom-set would not only reflect the intentions of
> the code better, but would provide better performance for the times
> where a read is irrelevant to the value being written.
>
This is a theoretical argument again, and remains unconvincing. (swap!
counter (constantly 0)) may feel bad, but it should - you're trashing
a shared reference with no regard for its contents. And if you have
code where the counter is so hot a one-time swapping reset is a perf
issue, you have bigger problems.
One thing is certain, right now code like this is unlikely:
(let [val (inc @a)]
(swap! a (constantly val)))
Whereas with atom-set in the API, code like this is likely:
(atom-set a (inc @a)) ;broken - race condition
after all, it looks just like its (correct) ref counterpart:
(ref-set r (inc @r))
> Yes, atom-set invites a certain potential for misuse, but atoms
> already require a certain degree of care.
>
This is not the kind of criteria I want to use when designing. With
that logic Clojure could just be a free-for-all like Java, which also
requires a 'certain degree of care'.
That said, I think a big part of the problem here lies in the name. If
it was e.g. reinitialize-atom or reset-atom I might feel differently.
I also think that your use cases for atoms for local mutation are a
mismatch. atoms are about sharing. You really want something else for
private/local mutable references, and I have some ideas for that.
Rich
You're right that I'm basically asking for something that works well
for private/local mutable references. It's possible right now to use
with-local-vars, or atoms, or refs, but none of them quite fill that
need perfectly (by design).
My impression so far of atoms is that they are "underpowered", and
suitable for a very small set of use cases, most of which you could
have already done with refs and commute (albeit with a slight
performance penalty). Since atoms are side-effecting, their
composability with other transaction-based Clojure code is limited.
Yes, my feelings about this are entirely theoretical, and maybe in
practice, as more and more Clojure code is written, I'll get to see
more good uses for atoms. That's part of the fun of working with a
new and vibrant language that is full of possibilities.
But this perception I have is why I like the idea of extending atom's
capabilities to cover the local mutable reference use cases. I like
the idea of thinking of an atom as a simple box, with a cool bonus
feature of being able to atomically swap its contents, providing a
step up in safety without going as far as a ref.
Anyway, it's great to hear you have some more ideas on the subject,
and I look forward to seeing what you come up with.