[Visit the blog:
http://misinformedcognoscenti.blogspot.com/]
ABSTRACT
The Actor model of concurrency readily lends itself for use in the
FIEA game engine, which is already set up to be an event-driven system
composed of entities (which resemble actors). This prepares the engine
to exploit future massively parallel hardware platforms and avoids
some of the worst problems of concurrency -- synchronization and
deadlock -- and minimizes contention. The changes required do not
affect the configuration language -- just isolated parts of the back-
end and some environment access policies. But the asynchronous
communication scheme required by a strict Actor model implementation
can incur excessive performance hits for reply-request transactions so
we probably have to make an exception for those transactions.
INTRODUCTION
In this series of articles, we review operating systems concepts in
the context of making a game engine. Operating system concepts include
process, memory and device management. Process management entails
processes, threads, scheduling and synchronization. This article
visits the topic of process management, and focuses on the Actor model
of concurrency as it relates to game engine architecture.
In the Actor model, an actor is a computational entity which sends and
receives messages, makes decisions and creates other actors. This
bears some resemblance to entities in the FIEA game engine, which send
and handle events, receive a share of CPU time each frame, and perform
arbitrary processing.
Though some cursory attempts have been made to thread it, the FIEA
game engine lacks an inherent concurrency infrastructure. Conventional
wisdom suggests that computers of the future will have many more
processing cores operating at about the same speed they do now. We
therefore explore the possibility of formally adopting the Actor model
of concurrency as an integral part of the FIEA game engine and its
configuration language.
Related Articles
Ruben Vermeersch describes how using the Actor model is easier than
threads.
The blog "Valued Lessons" has an article about message passing
concurrency in Python.
James Leigh describes Utilizing a Multi-Core System with the Actor
Model.
The SALSA Tutorial provides a description of the Actor model which
focuses on practical use.
A microkernel expert discusses the microkernel performance debate,
which relates to this discussion in that microkernels are essentially
message passing systems.
THE CASE FOR ADOPTING THE ACTOR MODEL
The FIEA game engine processing model happens to resemble the Actor
model - except that the engine is not (in its usual form) concurrent;
although all the entities operate almost as though in an extremely
simplified cooperative multitasking system with rudimentary time
slicing, they do not actually operate in separate threads (though
students have introduced actual multi-threading with some
consideration for synchronization). With other changes afoot regarding
how entities access their environment (with more strongly enforced
scoping), switching to the Actor model might be easy.
But why bother? What would it buy us? And what would it cost?
Avoiding Deadlocks
The question should not be whether to support concurrency, but how.
The fundamental problems boil down to deciding which processes to run
on what threads, and how those threads should communicate (and
synchronize). An obvious and common choice entails simply creating
threads and running processes on them, communicating via shared memory
which is synchronized via locks. This scheme, however common, invites
dreaded deadlocks and contention issues. So although we definitely
want concurrency, we would like a model free of deadlocks.
Recall the Coffman recipe for deadlocks:
The system must synchronize via mutex locks or their equivalent. This
is unavoidable at some level but we can limit the number of locks.
Processes already holding a resource lock can request another lock,
i.e. multiple locks are involved.
A process holding a lock cannot be forced to release that lock.
Multiple processes must attempt to obtain the same resource.
I boil this down to the following: Multiple processes must attempt to
obtain multiple locks in different orders. You can avoid deadlock by
strictly requiring that the order in which processes obtain locks is
globally uniform. That is guaranteed to eliminate deadlocks but it
poses some problems, such as being aware at all that this situation
can arise; in large code-bases with multiple programmers, that is
hard. Furthermore, obtaining locks in a strict order implies that in
many situations, a process will obtain locks that it usually does not
require, on the off chance that it might require it while holding
another lock; this creates otherwise avoidable contention.
The Actor model breaks this chain by disrupting the hold-and-wait
condition; actors never hold multiple locks because all of their
communication with other actors (which is the only communication the
model supports) happens via asynchronous message passing. From the
perspective of the Actor, it never holds any locks, ever. So, no
deadlocks.
Adopting a strict Actor model so implies that theoretical results
written about that model apply to the engine. Although so far, I am
not personally aware of thunderous earth-jiggling benefits of that
fact, research into the Actor model is ongoing and perhaps somebody
will conclude, using the Actor model solves Important Problems.
Each actor runs in its own thread and, as far as other actors are
concerned, that could be on another machine on another planet. The
language Erlang exploits this to support hot swapping code. As
Vermeersch states, "To allow for non-stop application, Erlang support
hot-swapping of code. This means that it is possible to replace code
while executing a program, making it possible to do upgrades and
maintenance without interrupting the running system." Imagine never
again having to log off of WoW for a client or server upgrade. That's
freaking cool.
Alright, so the Actor model looks attractive. How would we implement
that in the FIEA game engine? And if the Actor model has a feature or
limitation we do not like, what are the ramifications of departing
from this model?
Buffering Messages
The Actor model specifically avoids posing requirements on how
messages are ordered. In this regard, processes in such a model act
like remote peers in a packet-switched network using UDP -- the
deliver order is not guaranteed. The upside is that if a simulation
based on the Actor-model actually uses UDP protocol, then the network
code will not have to spend extra effort ordering packets.
The downside is that sometimes you want to guarantee delivery order.
Some systems (e.g. Scala) provide mechanisms for synchronous delivery
and therefore has the potential to violate the terms of our "no
deadlocks" coupon.
The ordering could happen inside the "mailbox" implicitly associated
with each actor. Messages could include ordering information. The
actor could simply allow messages to queue up until a contiguous
sequence of in-order arrives, at which point the actor can process
them. This would not violate asynchronous delivery. It adds
complication to the message processing subsystem but let us consider
the problem effectively solved (in theory), should the need arise.
Often message-passing has the form of request-reply, where an entity
simply wants information from another entity. In the strict Actor
model, this transaction would require a pair of asynchronous messages,
which can introduce undue latencies and inefficiencies.
Recall that part 1 described how asynchronous message passing
introduces computational inefficiency; there are twice as many
synchronizations happening (each of which entails an expensive context
switch) and the message data has to be copied both into and out of the
message buffer. In contrast, a synchronous request has only a single
synchronization primitive and can avoid copying message data at all,
if the replier processes the message immediately (which is likely). L4
exploits this to make great performance improvements over other
microkernel designs (like Mach) which use asynchronous message
passing.
Request-reply is a subset of the synchronous communication problem, in
that we can make special exceptions to how the request is posed. We
have several options:
Implement the policy that if a request is made, and the queried entity
is occupied at the time of request, that the request simply fails, and
that code must be written to handle that. This seems complicated and
awkward. Worse, it could create livelock which is even worse that
deadlock.
Rephrase request-reply cases so that any entity that might need
information from another simply subscribes to that information and
automatically receives it when it changes (i.e. uses observer a.k.a.
publish-subscribe design pattern). This avoids latency at the risk of
some possibly avoidable overhead, if the information is not needed
each time it changes. This also avoids needing to treat the situation
as a special case; the system described so far can already do this.
Let's keep this in our back pocket.
Partition entities to run in the same thread, in which case contention
would be impossible. This has the potential to have the least overhead
since it avoids the need for a lock, but also increases the
granularity of concurrency.
Allow synchronous communication only for request-reply transactions.
This increases the chances of contention, though, since the requester
has to wait on the replier, which in turn could be waiting on another
replier. This seems like a recipe for deadlock: A replier could block
while waiting on a reply from its requester (or in a transitive chain
of requesters). (But L4 implements synchronous message passing, how
does it avoid deadlock? By the intelligence and care of its authors.)
In practice, we can apply a combination of the above techniques,
depending on the situation. Synchronous communication is the most
efficient but it re-introduces the possibility of deadlock, so it
misses one of the major advantages of the Actor model. Aside from the
infamously poor performance, old microkernel systems are also
notorious for deadlocking. Since this article argues in favor of
adopting the Actor model based in part on the avoidance of deadlocks,
it seems inappropriate to simply toss that feature out for the sake of
performance. But performance is critical so we have a real dilemma
here.
Locality: Who can send messages to whom.
In the Actor model, actors do not share any data whatsoever. Only
messages can route data between Actors. That implies no access to
global variables or static methods in the traditional sense. I believe
that would not impose undue issues, if pressed, but for the sake of
efficiency I believe we could make exceptions for read-only data such
as controller input and the current time.
HOW TO ADOPT THE ACTOR MODEL
It would be nice if we could implement the Actor model with minimal
change to the configuration language. Messages would be passed via the
event system. Scoping rules would disallow accesses to non-local data.
Then behind the scenes we serialize the event system and can run each
Actor in its own thread. Magic.
Heck, for the sake of debugging we could even randomly reorder events
in the queue -- to test the robustness of message handling out of
order.
A few other details. In "traditional" terms, each actor runs in its
own thread and has only 1 lock -- a lock on itself. Effectively, every
"reaction" in the actor is a critical section. Actors only process
requests; their "process" is just a sequence of event handlers. That
means that even the "update" loop itself would have to be written as a
response to an "update" event, which gets thrown periodically. In
other words, it would NOT suffice to consider each entity as a process
that runs the "simulation" process all of the time, and handles
messages on-demand. The "simulation" process would be just another
event handler like any other. No big deal.
Also, rendering would require some intermediary since it involves
sending data (a message) to yet another actor -- a special one which
has a monopoly on the GPU. That's not trivial, but it also aligns with
how rendering happens on multi-threaded game engines anyway. In fact,
usually the rendering of an actor happens in a monolithic process
(which we can model as an actor) which serializes all rendering.
Essentially, rendering consists of populating vertex buffers and
issuing draw calls. Populating vertex buffers can happen anywhere.
Only the draw calls (and the associated render state setup) has to
happen serially. So it seems prudent to isolate those two processes.
And indeed the FIEA game engine already does that. It currently,
however, happens to issue the draw calls as though it happens from the
entity itself. That would have to change to a scheme with a single
entity which mediates all draw call submission. The details of that
subsystem constitute a topic for another post.
CONCLUSION
The benefits of the Actor model outweigh the costs so we would adopt
it. The Actor model provides concurrency free of deadlocks and
prepares the engine for the promised massively parallel game consoles
of the future. The process design of the FIEA game engine lends itself
to using the Actor model with minimal changes to the configuration
language.
This is just a start. We still have to consider process migration. For
architectures like the PS3 and its weird little SPU homonculi, process
migration is not just a whiz-bang nice-to-have, it's required. But it
seems like such changes can happen behind the scenes and, once again,
the configuration language would not have to change to support it.
PENDING ISSUE
It seems that we cannot have both the guarantee of a deadlock-free IPC
system and also have efficient request-reply transactions, so this
issue remains open for debate. I have proposed a few options for
request-reply. Perhaps it would be appropriate to try some or all of
them and see what works best. In any case, it makes for an excellent
discussion topic. How should it work?
[Visit the blog:
http://misinformedcognoscenti.blogspot.com/]