Actor model of concurrency (Akka) in multi agent simulation software (NetLogo)

kosow...@gmail.com

unread,

Jan 24, 2015, 11:51:34 AM1/24/15

to netlog...@googlegroups.com

Have you considered using Akka for concurrent functioning of agents in NetLogo?

If yes, what are the pitfalls of this approach to concurrency?

Frank Duncan

unread,

Feb 4, 2015, 10:39:42 PM2/4/15

to kosow...@gmail.com, netlog...@googlegroups.com

To be honest, we haven't done a lot of thought on concurrent
functioning, mainly because reproducibility is useful in both
experiements and debugging. Between that and the fact that the legacy
code would take a lot of work to upgrade to a different architecture
means we probably wouldn't attack this problem any time soon.

Thanks for the suggestion!

Frank

On Sat, Jan 24, 2015 at 08:51:34AM -0800, kosow...@gmail.com wrote:
> Have you considered using Akka for concurrent functioning of agents in NetLogo?
>
> If yes, what are the pitfalls of this approach to concurrency?
>

> --
> You received this message because you are subscribed to the Google Groups "netlogo-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to netlogo-deve...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Ken Kahn

unread,

Feb 4, 2015, 11:02:38 PM2/4/15

to Frank Duncan, kosow...@gmail.com, netlog...@googlegroups.com

Concurrency can introduce additional randomness to a model's execution due to the non-determinacy of the arrival order of messages. As Frank points out this can be a problem (but not for all models). With effort one can have the system know which computation orders are a problem and which ones are not. There is a nice discussion of this in A Distributed Platform for Global-Scale Agent-Based Models of Disease Transmission by Parker and Epstein.

-ken

Jason Bertsche

unread,

Feb 5, 2015, 2:39:12 AM2/5/15

to kosow...@gmail.com, netlog...@googlegroups.com

I've actually thought about concurrency in NetLogo a fair amount. As for the pitfalls...—they're complicated. I'll ramble and explain a bit.

First of all, this would be a pretty fundamental change to the language, so one of the pitfalls is that, in order to even have a chance of getting a good speed-up out of this change, I think we would need to spend a lot of time on overhauling the existing NetLogo code to operate in this way. On top of that, this added level of complication—that is, model state changes passing through an actor system—I would expect to greatly complicate the core NetLogo code, further increasing the already-substantial maintenance burden for future core NetLogo development work.

The complications do not stop there, though; there are much more practical problems at hand. I'm not going to provide any proof of these claims here, but here are some assumptions of mine:

The cases where one most desires to parallelize NetLogo are those involving large agentsets
The vast majority of `ask` operations in NetLogo change world (global) state (e.g. agent positions/variables)
In a substantial percentage of the cases where `ask` is used, the outcome depends on the global state (e.g. positions/variables of other agents)

On top of those three premises, agents in `ask` blocks can change the global state in a way that affects all of the other agents in the agentset that is being `ask`ed (e.g. modifying a global variable, moving towards/away from another agent). This makes parallelization difficult. The typical actor-based solution to this problem might be to hide the world state within an actor (or, in the style of C or plain Java, you might use resource locks on world state), but, since pretty much every agent is going to have to spend time waiting on the world state to be available for reading or writing, I would expect that you wouldn't get a very substantial speed up—possibly even a slowdown in many cases, because of the concurrency overhead!—from this approach. The results depend on that contents of the `ask` block, but I have severe doubts that typical models would see much speedup from this sort of concurrency, and the functionality would come at the cost of hundreds of hours of lost development time over the years, I would expect, because of the way in which it would have to tangle itself up in the guts of the core NetLogo code. So that particular approach doesn't seem very good to me, especially because it hasn't really solved the problem of all the agents trying to change the global state simultaneously.

Let's not give up hope, though; this isn't the only way to approach concurrency in NetLogo. Software transactional memory (STM) is a style of concurrency that takes a cue from the database world, treating concurrent actions as transactions upon the global state that can be rolled back if some conflict arises in the global state. One could imagine this being applied to `ask` operations. A student at Utrecht University created a NetLogo-like program (HLogo) that performed concurrency through STM and wrote his Master's thesis on this very topic. I actually wouldn't be surprised if the author of the paper reads this board and might have something to say on the matter. Regardless, on this topic, he concluded this:

Execution results show that HLogo is faster than NetLogo for most cases, particularly where the number of agents stay low. When the agent population grows as to produce significant number of STM conflicts, the performance of HLogo considerably drops.

If you look at the results in Section 4.4 of his thesis, you'll see that the performance really does drop off quite noticeably as the number of agents increases—dropping off to the point of being a substantial speed regression in comparison to standard NetLogo. Unfortunately, though, the case where we lots of agents is the very case we want concurrency to help us out with! My view on the matter is that STM is inherently incapable of being able to solve this problem; using STM for thousands of concurrent operations that are each almost certain to conflict and cause a rollback is merely going to lead to what is essentially just a sequential processing of the agentset (but with a ton of concurrency overhead).

On top of that, the STM approach also jettisons another thing that NetLogo holds dear: reproducible results. That is, in NetLogo, you can use the `random-seed` to set the model's random seed, and running the same simulation with the same seed will consistently yield the same results. However, with STM, the completion order of the transactions is non-deterministic, so STM also fails to meet NetLogo's reproducibility goals, as well. With that, it looks like we'll need yet another approach.

After thinking about it a bit, we might conclude that the global state is the biggest problem for us (as is pretty much always the case with concurrency problems). Well, if you can't beat 'em... just change the problem! How about we simply declare that changes to global state won't take effect until after an `ask` command ends? That is, if I did the following:

```
to-report silliness
clear-all
crt 10
ask turtles [ set color green ]
ask turtles [ set label (count turtles with [color = blue]) set color blue]
report sort [label] of turtles
end
```

and then ran `silliness` in my proposed variant of NetLogo, I would get back `[0 0 0 0 0 0 0 0 0 0]` (rather than `[0 1 2 3 4 5 6 7 8 9]`, which is what NetLogo currently returns), because each turtle, when running the code block for that last `ask`, would use the global state as of when the `ask` primitive was called, and they wouldn't be able to see the changes that happened on the global state until after the `ask` had entirely finished executing (meaning that the agent's copy of the global state would also need to be threaded through the call stack to any procedures called within the `ask` block—did anyone else just get a monadic shiver going down their spine?).

This is definitely a bit weird, though. On the other hand, in some ways, it's actually kind of consistent with some of NetLogo's current behavior. For example, if I say `ask turtles [ hatch 1 ]`, and we understand that our NetLogo's `ask` can mutate the global state, shouldn't we expect that code to result in an infinite loop (provided that are turtles in the world when it is called)? That is, one might expect that the newly-hatched turtles would also be `ask`ed, but they're not; NetLogo simply uses what the value was for `turtles` when the `ask` began (just like I'm proposing using the whole global state as of when the `ask` began). Of course, within the `ask` block, the world state does change in the current version of NetLogo, as demonstrated by the code `ca crt 2 ask turtles [ hatch 1 show count turtles ]`, which prints out "3" and then "4". Clearly, `turtles` changes within the block. I think I might even find it a little awkward if my above "state snapshot" suggestion were implemented and running this code printed out "2" and then "2". But maybe not....

Either way, it doesn't seem like we could get away with removing the serial `ask` altogether. For the sake of covering both angles, I suppose there could be an `ask-in-parallel` and an `ask-serially`—which I'm tempted to call "fold-ask", since `ask`ing is serial is very much like a fold of world state, but it's kind of a deceptive name, since folding together monoids can totally be done in parallel, so let's just not go with that name...—even though have two primitives for `ask`ing is less than ideal.

One cool thing about `ask-in-parallel` is that it would not only offer a good speedup simply from running multiple `ask` blocks concurrently, but it could also offer a speedup in an unexpected way: fewer random-number generator draws. RNG draws can actually take up a pretty significant amount of CPU time in a model. When we're doing `ask` serially, we need to do a bunch of RNG draws so we can iterate through the agentset randomly. However, with `ask-in-parallel`, we're baking in the assumption that order doesn't matter. If order did matter, we would have to worry once again about conflicts to global state and reproducibility of results. But, with those things out of the way, `ask` order doesn't matter, so we don't need to worry about doing any RNG draws for determining `ask` order. This is an especially big win when `ask`ing big agentsets.

But it's not all sunshine and lollipops here; we face some serious problems if agents' actions within the `ask` block aren't restricted. That is, if `ask` blocks can contain arbitrary NetLogo code, what's to stop two different agents from concurrently making conflicting changes—for example, setting global/procedure-level variables concurrently, or calling another `ask` within the current `ask`. The problem with setting variables within `ask` is that, if multiple agents change the same variable, it's not clear whose end value to use. Similarly, with nested `ask`s, what happens with `ask turtles [ set xcor 0 ask other turtles [ set xcor 1 ] ]`? Is every turtle's `xcor` 0? Or are they all 1? Or are all 1, except the "last one" asked, which would have 0? How is "last one" determined?—keep in mind that it's important that we get reproducible results! None of the options seems particularly promising to me. There are probably many other behaviors (e.g. file operations) that would also be problematic in concurrent `ask` blocks, so maybe things being forbidden should be the rule, rather than the exception. Furthermore, anything that hits the random number generator within the `ask` block would also be a huge problem for getting reproducible results—but maybe you could get around that by giving each agent its own RNG within the `ask`...?

I guess the solution would be to forbid an `ask` block from containing these sorts of conflicting behaviors, but that's not a very satisfying solution; it strikes me as very crippling to be forbidding these kinds of not-terribly-uncommon behaviors in `ask` blocks. I certainly know that I like having my agents be able to all write to global variables when I'm testing things out, and I'm quite certain that many models have nested `ask`s, as well. I guess it would make sense if only `ask-in-parallel` had these restrictions, and `ask`s nested within an `ask-serially` could even be `ask-in-parallel`s! It's a bit gross that there would be two `ask`s in NetLogo with very different rules, though, and it's definitely not an elegant solution, but maybe it's the best we could realistically do...? Maybe there could be just one primitive (`ask`) and NetLogo could first try to interpret all `ask`s as what I've been proposing as `ask-in-parallel`, but, if restricted calls were found within the block, NetLogo would then run the block as an `ask-serially`? Maybe? I don't know—this is the point, I think, at which I've been fully reduced to directionless babbling. Enough of that.

So, ultimately, I don't have a great answer for you in terms of how best to handle concurrency in agent-based modeling. It seems to be a pretty tough problem. The world could probably produce a dozen Ph.D theses on the matter and still not come up with a good solution to the problem. If anyone thinks they know of a better solution, I'd be glad to hear about it, if for no other reason than to sate my curiosity.

Ken Kahn

unread,

Feb 5, 2015, 3:20:29 AM2/5/15

to Jason Bertsche, kosow...@gmail.com, netlog...@googlegroups.com

Jason I think you've provided evidence that none of the solutions people have thought of are work in general.

In A Distributed Platform for Global-Scale Agent-Based Models of Disease Transmission by Parker and Epstein they deal with a common special case: where agents interactions are spatially local. Agents only interact with agents on nearby patches. The world can then be divided into regions each implemented by their own processor. The processors need to send messages about changes that cross region boundaries. In general if one region gets ahead of another then it will need to roll back if its neighbours send it a message about changes in its past. BUT some changes are not a problem. In the paper they discuss an epidemic. If region-1 is at time 7 and gets a message from region-2 that agent-x was infected at time 4 then if the delay between infection and contagion is greater than 7-4 there is no need to roll back the computation since there are no consequences to being infected before becoming contagious.

-ken

Reply all

Reply to author

Forward