ants.clj and render thread starvation

B Smith-Mannschott

unread,

Jun 29, 2009, 1:51:38 PM6/29/09

to clo...@googlegroups.com

After watching most of Rich's Clojure presentations over the weekend,
I found myself playing with ants.clj again on my netbook. The ant
simulation runs brilliantly on my quad-core machine at work. Not so
much on my netbook. The problem seems to be that with only a single
(hyperthreaded) core the render agent is almost constantly interrupted
by some pesky ant while attempting to snapshot the world, forcing the
render agent to automatically retry. And so, the ants run merrily
around the world, only I can't see it.

This raises a question. Clojure's refs and dosync sure are neat, but
this experience would seem to indicate that there are potential
scalability problems when combining long-running and short-running
transactions. Under load (or on a slow machine) a long-running
transaction may never get a chance to complete and may be stuck
forever retrying, burning CPU but producing no useful output. This
makes me uneasy.

I was able to get ants.clj to work reliably on my netbook by playing
around with the sleep times in such a way as to increase the
probability of the renderer actually completing a snapshot, but this
process felt hacky and ad-hoc. What works on my netbook might well be
sub-optimal on another system.

How could one change the design of ants.clj to work reliably (i.e.
update the screen periodically) even on slower systems?

// Ben

Krukow

unread,

Jun 30, 2009, 7:50:59 AM6/30/09

to Clojure

On Jun 29, 7:51 pm, B Smith-Mannschott <bsmith.o...@gmail.com> wrote:
[snip...]

> much on my netbook. The problem seems to be that with only a single
> (hyperthreaded) core the render agent is almost constantly interrupted
> by some pesky ant while attempting to snapshot the world, forcing the
> render agent to automatically retry. And so, the ants run merrily
> around the world, only I can't see it.

If my understanding of the STM is correct (and it may very well not
be) the render function should not retry. The render function reads
all the refs that define ant positions, but does not modify any ref.
So I suspect that the missing renders are caused by a thread
starvation rather than retries.

But I'd like to hear if Rich agrees ;-)

[snip..]

/Karl

Daniel B Lucraft

unread,

Jun 30, 2009, 4:10:41 AM6/30/09

to Clojure

I didn't think it was possible for the render agent to be interrupted
like that. Isn't the point of the MVCC system that the render agent
will see a consistent view of the world as it was when the render
started? And that the ants can continue modifying the world during the
render's processing but the render agent will not see any of it until
it has completed it's transaction? It only reads the refs so it
shouldn't ever retry.

Sorry to jump in despite being inexperienced at this, but it makes me
think I don't understand something correctly.

Dan

John Newman

unread,

Jun 30, 2009, 10:15:44 AM6/30/09

to clo...@googlegroups.com

I think I remember running the ants.clj demo on my netbook and it running normally. I'll check again this evening.

--
John

B Smith-Mannschott

unread,

Jun 30, 2009, 11:26:45 AM6/30/09

to clo...@googlegroups.com

On Tue, Jun 30, 2009 at 10:10, Daniel B Lucraft<d...@fluentradical.com> wrote:
>
> I didn't think it was possible for the render agent to be interrupted
> like that. Isn't the point of the MVCC system that the render agent
> will see a consistent view of the world as it was when the render
> started? And that the ants can continue modifying the world during the
> render's processing but the render agent will not see any of it until
> it has completed it's transaction? It only reads the refs so it
> shouldn't ever retry.
>
> Sorry to jump in despite being inexperienced at this, but it makes me
> think I don't understand something correctly.
>

To demonstrate my point, I made the following changes to my copy of ants.clj:

----------------------- [[ patch on ants.clj ]] ----------------------------
Changes in HEAD
Modified ants.clj
diff --git a/ants.clj b/ants.clj
index 5e2e012..43e3ba0 100644
--- a/ants.clj
+++ b/ants.clj
@@ -262,11 +262,14 @@
(render-ant (:ant p) g x y)))

(defn render [g]
- (let [v (dosync (apply vector (for [x (range dim) y (range dim)]
- @(place [x y]))))
+ (println "entering render")
+ (let [v (dosync (println "snapshotting world")
+ (apply vector (for [x (range dim) y (range dim)]
+ @(place [x y]))))
img (new BufferedImage (* scale dim) (* scale dim)
(. BufferedImage TYPE_INT_ARGB))
bg (. img (getGraphics))]
+ (println "rendering world")
(doto bg
(.setColor (. Color white))
(.fillRect 0 0 (. img (getWidth)) (. img (getHeight))))
@@ -278,7 +281,8 @@
(.drawRect (* scale home-off) (* scale home-off)
(* scale nants-sqrt) (* scale nants-sqrt)))
(. g (drawImage img 0 0 nil))
- (. bg (dispose))))
+ (. bg (dispose))
+ (println "exiting render")))

(def panel (doto (proxy [JPanel] []
(paint [g] (render g)))
--------------------- [[ end patch on ants.clj ]] --------------------------

After evaluating:

(def ants (setup))
(send-off animator animation)

I saw output like this:

entering render
snapshotting world
rendering world
exiting render
entering render
snapshotting world
rendering world
exiting render
...

After evaluating this, which starts up all the ants and the evaporator
thread:

(dorun (map #(send-off % behave) ants))
(send-off evaporator evaporation)

I'm now seeing this:

entering render
snapshotting world
snapshotting world
snapshotting world
snapshotting world
snapshotting world
snapshotting world
snapshotting world
snapshotting world
snapshotting world
snapshotting world
...

Clearly "snapshotting world" is getting retried. This makes a certain
degree of sense from skimming LockingTransaction, which ultimately
implements dosync.

I'm *guessing* that it records a readPoint (a kind of transactional
'timestamp') when it starts working and makes sure that this readPoint
is never less than any consulted reference's (last modification)
point. Such a constellation indicates that the reference has been
changed since the start of the transaction. To recover from this,
clojure does a retry. This it will do 10'000 times before giving up. I
don't believe I ever actually waited that long. ;-)

// ben

Rich Hickey

unread,

Jun 30, 2009, 12:01:51 PM6/30/09

to clo...@googlegroups.com

MVCC history in Clojure's STM is dynamic, created by need. There is no
read tracking, and more important for this case, no transaction
tracking. So, if a read transaction is unable to satisfy its snapshot
view from history, it will flag the offending ref with a fault and
retry. When a writer sees a ref with a read fault it will grow history
for that ref. In this way only as much history is created as is needed
to satisfy the dynamic contention patterns, and
tracking/synchronization is minimized.

The problem for scanning readers like render is that the ref that
caused the fault this pass is unlikely to be the ref that causes the
fault next time, and it will take a while to accumulate even one step
of history for each scanned ref using the fault system.

This has caused me to implement (in git master) some long-planned
controls on ref history. You can now supply :min-history and
:max-history args to ref. The defaults are 0 and 10 respectively. By
setting min-history to some positive value, history will be
accumulated even in the absence of faults, providing a window, if you
will, for scanning readers like render.

You can see this history acquisition by periodically running:

(map ref-history-count (world 20))

while the ants demo is going.

So, now you can preemptively maintain history in the ants demo by
modifying the world init with some min-history (the value 2 below is
your 'knob' for accommodating the duration of the render sweep)

(def world ... (ref (struct cell 0 0) :min-history 2) ...)

Please let me know how that works for you, and, everyone else, please
let me know if max-history default of 10 causes you any trouble
("Transaction failed after reaching retry limit").

There will eventually be more knobs coming, to control
transaction-level retry counts and timeouts.

Rich

Krukow

unread,

Jul 1, 2009, 1:50:31 AM7/1/09

to Clojure

On Jun 30, 6:01 pm, Rich Hickey <richhic...@gmail.com> wrote:
> MVCC history in Clojure's STM is dynamic, created by need. There is no
> read tracking, and more important for this case, no transaction
> tracking. So, if a read transaction is unable to satisfy its snapshot
> view from history, it will flag the offending ref with a fault and
> retry. When a writer sees a ref with a read fault it will grow history
> for that ref. In this way only as much history is created as is needed
> to satisfy the dynamic contention patterns, and
> tracking/synchronization is minimized.

Ok - I understand better now. I guess I had an "idealized" system in
mind where each ref (in principle) had a complete history associated
with it so reads would never need to be retried (of course, unneeded
entries would somehow be discarded when the system decides it can
never be read.) Certainly that would imply tracking transactions. I
guess you don't do this because of the overhead.

I like the pragmatics of :min-history, and I believe it would be
sufficient for many scenarios. However, I suspect we are now moving
closer to the situation that Cliff Click was predicting [1] where as a
programmer you need more precise knowledge about the STM
implementation to understand the behavior and tune its performance.

Kind Regards,
- Karl

[1] http://groups.google.com/group/clojure/browse_frm/thread/5c7a962cc72c1fe7/a1a0cb35070e9558

Daniel Lyons

unread,

Jul 1, 2009, 2:43:01 AM7/1/09

to clo...@googlegroups.com

On Jun 30, 2009, at 11:50 PM, Krukow wrote:
> I like the pragmatics of :min-history, and I believe it would be
> sufficient for many scenarios. However, I suspect we are now moving
> closer to the situation that Cliff Click was predicting [1] where as a
> programmer you need more precise knowledge about the STM
> implementation to understand the behavior and tune its performance.

While that may be true for the time being I think that Rich's original
response still holds water: that usually, correctness is more
important than performance and correctness is the real win with STM.
It's a nascent technology but that doesn't mean it's useless. Most
people would probably rather utilize all four or eight of their cores
at half the theoretical efficiency with no loss of correctness than
have to create an old fashioned threading model with all the bugs and
headaches that they entail, trying to get closer to that dream of 100%
utilization. This is fast food parallelism. It isn't as optimal or as
hard won but from a cost vs. benefit perspective the choice is
obvious, especially when comparing to single threaded programming.

STM might be new, but I think an analogy to garbage collection is
valid. We don't have all the best STM implementation algorithms down
yet; certainly this is an active research area. But even back when GC
was extremely new it was already a win for LOC and correctness. Over
time the technologies got better and the performance question has
mostly faded away, and it's going to be the same with STM. But only if
the focus is on correctness first and performance second.

I say mostly faded away because there will always be applications
where GC cannot be used simply because it makes the system less
predictable or because it incurs its own cost, such as realtime
systems and extremely small-scale embedded devices. But most systems
are not realtime and GC is quite prevalent, even though realtime
performance characteristics seem generally desirable. It just isn't
worth the tradeoff in desktop software. Lots of software will benefit
from STM, even if the performance gains are minimal, even if it turns
out to be provably impossible to push performance past half that of
classically threaded programs.

Often there is a sound theoretical reason for throwing out an idea
which turns out to occur so infrequently in the wild working on it is
a net win anyway. Ethernet is a good example of a technology that
shouldn't work well in the wild but which does. A stronger example
would be Microsoft's Terminator project, which, in attempting to
create termination proofs for programs through static analysis, flies
in the face of CS's most famous mathematical proof. It turns out the
problem is important enough that even the partial, imperfect solutions
we are damned to would be useful in practice (as long as you don't try
and run it through itself, of course ;)

http://research.microsoft.com/en-us/um/cambridge/projects/terminator/

The rest of that original thread is a great read, by the way, and
thanks for bringing it up!

—
Daniel Lyons

Rich Hickey

unread,

Jul 1, 2009, 9:09:56 AM7/1/09

to clo...@googlegroups.com

Yes, ok, but the key questions are, how complex is it really, and,
compared to what?

Any large lock-based application will have an ad hoc lock policy
approaching the complexity of Clojure's STM. Each programmer will have
to learn the lock strategy of each program. And what happens when
requirements or the hardware capabilities change? I'd much rather have
knobs like :min-history than to have to go through an app and change a
lock architecture that is woven throughout. In fact, I think the
possibility for knobs is a key strength of an STM. Furthermore, such
knobs can be dynamic, allowing for applications that analyze their own
runtime situation and adapt, in this case trading off memory for
concurrency. Try doing that with a lock policy. (I know you are not
advocating for locks, but Cliff was).

As usual, comparisons to GC hold. There are many flavors of GC, each
with different characteristics. An app will have to choose the right
flags, understanding the tradeoffs involved. It is still a lot better
than manual memory management for most apps.

At JavaOne I did a demo showing how an app could put its own runtime
'knob' on transaction granularity, to useful effect. A point I tried
to make was that this was the level at which one wants to think about
their concurrency issues. (Ok, fine, one would like magic that makes
it all go away :) Thinking about what constitutes a unit of work, what
are the contended resources etc. Being able to deal with that using
'knobs' on a system that is ensuring correctness seems like a huge win
to me.

So, certainly many of the things Cliff said are true, but are they
negatives? You will have to understand the fundamental characteristics
of your STM, and any knobs it provides. You will have to choose
transaction boundaries. You will have to choose ref granularity. Ok.
But absolutely nothing is going to make the fundamental issues of
concurrency go away. The real question is, what facilities are
provided for dealing with them, and how closely do they map to the
nature of the problem? How much effort do you have to spend on
correctness?

What render is getting there is really a huge thing - a consistent
snapshot of the world, that spans multiple entities, without holding
up everyone else. I have yet to see a non-STM solution (that isn't an
ad hoc STM or MVCC) that can provide the same.

I'll try to expand upon "adaptive history queues", and the new knobs,
in the docs so people can better understand what they imply.

Rich

Krukow

unread,

Jul 1, 2009, 9:12:58 AM7/1/09

to Clojure

On Jul 1, 8:43 am, Daniel Lyons <fus...@storytotell.org> wrote:
> While that may be true for the time being I think that Rich's original
> response still holds water: that usually, correctness is more
> important than performance and correctness is the real win with STM.

[snip...]

Agreed - happy Clojure user here. Just thought it was interesting to
note.

> The rest of that original thread is a great read, by the way, and
> thanks for bringing it up!

You're welcome - one of the problems with newsgroups is that lots of
lots of very useful information is out there, but not always to easy
to find - so referring back once in a while is good ;-)

Krukow

unread,

Jul 1, 2009, 9:34:58 AM7/1/09

to Clojure

On Jul 1, 3:09 pm, Rich Hickey <richhic...@gmail.com> wrote:
[snip]

> lock architecture that is woven throughout. In fact, I think the
> possibility for knobs is a key strength of an STM. Furthermore, such
> knobs can be dynamic, allowing for applications that analyze their own
> runtime situation and adapt, in this case trading off memory for
> concurrency. Try doing that with a lock policy. (I know you are not
> advocating for locks, but Cliff was).

That sounds very interesting. Can you share the JavaOne program?
(perhaps with some text explaining the points you made :-)

[snip]

> So, certainly many of the things Cliff said are true, but are they
> negatives? You will have to understand the fundamental characteristics
> of your STM, and any knobs it provides. You will have to choose
> transaction boundaries. You will have to choose ref granularity. Ok.
> But absolutely nothing is going to make the fundamental issues of
> concurrency go away. The real question is, what facilities are
> provided for dealing with them, and how closely do they map to the
> nature of the problem? How much effort do you have to spend on
> correctness?

Yes, perhaps this is a good way of looking at it. Also I like the
analogy with GC tuning: by default it does pretty well, and in
addition you have various knobs to tune performance and other
characteristics (e.g. stop-the-world or concurrent) - and of course
you have to understand the GC to use these knobs.

[snip]

> I'll try to expand upon "adaptive history queues", and the new knobs,
> in the docs so people can better understand what they imply.

This would be really useful. In order to turn the knobs correctly we
need a more detailed understanding about the design/implementation.

B Smith-Mannschott

unread,

Jul 5, 2009, 2:47:11 PM7/5/09

to clo...@googlegroups.com

Sorry it took me so long to try this out...

I've tried this with a build of
a1397390d8b3b63f2039359520629d87b152d717 (July 4), I tried
:min-history values of 2 and 9, which didn't help, meaning the window
stayed blank because the rendering agent does not run to completion. I
was able to get something to display by dialing :min-history up to
100. It draws less than one frame per second, but it does draw.
(Perhaps it's too much to expect for a single core to juggle some 50
threads.)

I'm going to play around with it some more and see what I find.

// Ben

John Newman

unread,

Jul 5, 2009, 3:15:33 PM7/5/09

to clo...@googlegroups.com

I'm on an hp mini. It runs on an Atom processor. I ran ants.clj and it was frozen for a little while. Then after maybe 30 seconds it looked normal -- a tick every second or half second or so. I'm running a lot of applications though. On Ubuntu here too.

--
John

Reply all

Reply to author

Forward