I've actually thought about concurrency in NetLogo a fair amount.
As for the pitfalls...—they're complicated. I'll ramble and
explain a bit.
First of all, this would be a pretty fundamental change to the
language, so one of the pitfalls is that, in order to even have a
chance of getting a good speed-up out of this change, I think we
would need to spend a lot of time on overhauling the existing
NetLogo code to operate in this way. On top of that, this added
level of complication—that is, model state changes passing through
an actor system—I would expect to greatly complicate the core
NetLogo code, further increasing the already-substantial maintenance
burden for future core NetLogo development work.
The complications do not stop there, though; there are much more
practical problems at hand. I'm not going to provide any proof of
these claims here, but here are some assumptions of mine:
- The cases where one most desires to parallelize NetLogo are
those involving large agentsets
- The vast majority of `ask` operations in NetLogo change world
(global) state (e.g. agent positions/variables)
- In a substantial percentage of the cases where `ask` is used,
the outcome depends on the global state (e.g.
positions/variables of other agents)
On top of those three premises, agents in `ask` blocks can change
the global state in a way that affects all of the other agents in
the agentset that is being `ask`ed (e.g. modifying a global
variable, moving towards/away from another agent). This makes
parallelization difficult. The typical actor-based solution to this
problem might be to hide the world state within an actor (or, in the
style of C or plain Java, you might use resource locks on world
state), but, since pretty much every agent is going to have to spend
time waiting on the world state to be available for reading or
writing, I would expect that you wouldn't get a very substantial
speed up—possibly even a
slowdown in many cases, because of
the concurrency overhead!—from this approach. The results depend on
that contents of the `ask` block, but I have severe doubts that
typical models would see much speedup from this sort of concurrency,
and the functionality would come at the cost of hundreds of hours of
lost development time over the years, I would expect, because of the
way in which it would have to tangle itself up in the guts of the
core NetLogo code. So that particular approach doesn't seem very
good to me, especially because it hasn't really solved the problem
of all the agents trying to change the global state simultaneously.
Let's not give up hope, though; this isn't the only way to approach
concurrency in NetLogo. Software transactional memory (STM) is a
style of concurrency that takes a cue from the database world,
treating concurrent actions as transactions upon the global state
that can be rolled back if some conflict arises in the global
state. One could imagine this being applied to `ask` operations. A
student at Utrecht University created a NetLogo-like program (HLogo)
that performed concurrency through STM and wrote
his Master's
thesis on this very topic. I actually wouldn't be surprised
if the author of the paper reads this board and might have something
to say on the matter. Regardless, on this topic, he concluded this:
Execution results show that HLogo is faster
than NetLogo for most cases, particularly where the number of
agents stay low. When the agent population grows as to produce
significant number of STM conflicts, the performance of HLogo
considerably drops.
If you look at the results in Section 4.4 of his thesis, you'll see
that the performance really does drop off quite noticeably as the
number of agents increases—dropping off to the point of being a
substantial speed
regression in comparison to standard
NetLogo. Unfortunately, though, the case where we lots of agents is
the very case we want concurrency to help us out with! My
view on the matter is that STM is inherently incapable of being able
to solve this problem; using STM for thousands of concurrent
operations that are each almost certain to conflict and cause a
rollback is merely going to lead to what is essentially just a
sequential processing of the agentset (but with a ton of concurrency
overhead).
On top of that, the STM approach also jettisons another thing that
NetLogo holds dear: reproducible results. That is, in NetLogo, you
can use the `random-seed` to set the model's random seed, and
running the same simulation with the same seed will consistently
yield the same results. However, with STM, the completion order of
the transactions is non-deterministic, so STM also fails to meet
NetLogo's reproducibility goals, as well. With that, it looks like
we'll need yet another approach.
After thinking about it a bit, we might conclude that the global
state is the biggest problem for us (as is pretty much always the
case with concurrency problems). Well, if you can't beat 'em...
just change the problem! How about we simply declare that changes
to global state won't take effect until after an `ask` command
ends? That is, if I did the following:
```
to-report silliness
clear-all
crt 10
ask turtles [ set color green ]
ask turtles [ set label (count turtles with [color = blue]) set
color blue]
report sort [label] of turtles
end
```
and then ran `silliness` in my proposed variant of NetLogo, I would
get back `[0 0 0 0 0 0 0 0 0 0]` (rather than `[0 1 2 3 4 5 6 7 8
9]`, which is what NetLogo currently returns), because each turtle,
when running the code block for that last `ask`, would use the
global state as of when the `ask` primitive was called, and they
wouldn't be able to see the changes that happened on the global
state until after the `ask` had entirely finished executing (meaning
that the agent's copy of the global state would also need to be
threaded through the call stack to any procedures called within the
`ask` block—
did
anyone else just get a monadic shiver going down their spine?).
This is definitely a bit weird, though. On the other hand, in some
ways, it's actually kind of consistent with some of NetLogo's
current behavior. For example, if I say `ask turtles [ hatch 1 ]`,
and we understand that our NetLogo's `ask` can mutate the global
state, shouldn't we expect that code to result in an infinite loop
(provided that are turtles in the world when it is called)? That
is, one might expect that the newly-hatched turtles would also be
`ask`ed, but they're not; NetLogo simply uses what the value was for
`turtles` when the `ask` began (just like I'm proposing using the
whole global state as of when the `ask` began). Of course,
within
the `ask`
block, the world state
does change in the
current version of NetLogo, as demonstrated by the code `ca crt 2
ask turtles [ hatch 1 show count turtles ]`, which prints out "3"
and then "4". Clearly, `turtles` changes within the block. I think
I might even find it a little awkward if my above "state snapshot"
suggestion were implemented and running this code printed out "2"
and then "2". But maybe not....
Either way, it doesn't seem like we could get away with removing the
serial `ask` altogether. For the sake of covering both angles, I
suppose there could be an `ask-in-parallel` and an
`ask-serially`—which I'm tempted to call "fold-ask", since `ask`ing
is serial is very much like a fold of world state, but it's kind of
a deceptive name, since folding together monoids can totally be done
in parallel, so let's just not go with that name...—even though have
two primitives for `ask`ing is less than ideal.
One
cool thing about `ask-in-parallel` is that it would not
only offer a good speedup simply from running multiple `ask` blocks
concurrently, but it could also offer a speedup in an unexpected
way: fewer random-number generator draws. RNG draws can actually
take up a pretty significant amount of CPU time in a model. When
we're doing `ask` serially, we need to do a bunch of RNG draws so we
can iterate through the agentset randomly. However, with
`ask-in-parallel`, we're baking in the assumption that order doesn't
matter. If order
did matter, we would have to worry once
again about conflicts to global state and reproducibility of
results. But, with those things out of the way, `ask` order
doesn't
matter, so we don't need to worry about doing any RNG draws for
determining `ask` order. This is an especially big win when
`ask`ing big agentsets.
But it's not all sunshine and lollipops here; we face some serious
problems if agents' actions within the `ask` block aren't
restricted. That is, if `ask` blocks can contain arbitrary NetLogo
code, what's to stop two different agents from concurrently making
conflicting changes—for example, setting global/procedure-level
variables concurrently, or calling another `ask` within the current
`ask`. The problem with setting variables within `ask` is that, if
multiple agents change the same variable, it's not clear whose end
value to use. Similarly, with nested `ask`s, what happens with `ask
turtles [ set xcor 0 ask other turtles [ set xcor 1 ] ]`? Is every
turtle's `xcor` 0? Or are they all 1? Or are all 1, except the
"last one" asked, which would have 0? How is "last one"
determined?—keep in mind that it's important that we get
reproducible results! None of the options seems particularly
promising to me. There are probably many other behaviors (e.g. file
operations) that would also be problematic in concurrent `ask`
blocks, so maybe things being forbidden should be the rule, rather
than the exception. Furthermore, anything that hits the random
number generator within the `ask` block would also be a huge problem
for getting reproducible results—but maybe you could get around that
by giving each agent its own RNG within the `ask`...?
I guess the solution would be to forbid an `ask` block from
containing these sorts of conflicting behaviors, but that's not a
very satisfying solution; it strikes me as very crippling to be
forbidding these kinds of not-terribly-uncommon behaviors in `ask`
blocks. I certainly know that
I like having my agents be
able to all write to global variables when I'm testing things out,
and I'm quite certain that many models have nested `ask`s, as well.
I guess it would make sense if only `ask-in-parallel` had these
restrictions, and `ask`s nested within an `ask-serially` could even
be `ask-in-parallel`s! It's a bit gross that there would be two
`ask`s in NetLogo with very different rules, though, and it's
definitely not an elegant solution, but maybe it's the best we could
realistically do...? Maybe there could be just one primitive
(`ask`) and NetLogo could first try to interpret all `ask`s as what
I've been proposing as `ask-in-parallel`, but, if restricted calls
were found within the block, NetLogo would then run the block as an
`ask-serially`? Maybe? I don't know—this is the point, I think, at
which I've been fully reduced to directionless babbling. Enough of
that.
So, ultimately, I don't have a great answer for you in terms of how
best to handle concurrency in agent-based modeling. It seems to be
a pretty tough problem. The world could probably produce a dozen
Ph.D theses on the matter and still not come up with a good solution
to the problem. If anyone thinks they know of a better solution,
I'd be glad to hear about it, if for no other reason than to sate my
curiosity.