Avoiding deadlocks in systems with bounded queues

966 views
Skip to first unread message

Rajiv Kurian

unread,
Jan 13, 2014, 2:20:17 AM1/13/14
to mechanica...@googlegroups.com
In producer-consumer topologies connected by bounded queues, how does one inherently avoid deadlocks? In systems with circular communication patterns (thread 1 -> thread 2 -> thread 1) deadlock could occur if the queues connecting the threads get full.

One argument is that these buffers/queues should never be getting full if tested with realistic traffic patterns. Though the increased performance, predictability and back-pressure mechanisms of bounded queues is a great upside, the danger of deadlock in complex topologies makes them a tad risky.

For threads that are both consumers and producers, if they are blocked for a significant number of cycles trying to enqueue, they could decide to fail the enqueue request  and go back to consuming from their inbound queues. Is there a list of best practices, to handle or avoid deadlocks in these systems?

Diogo Ferreira

unread,
Jan 13, 2014, 2:28:45 AM1/13/14
to mechanica...@googlegroups.com

I typically cope with this by dispatching the message using the thread that enqueued it.

This gets a bit nasty though because it can end up taking way too much time to finish work on the original handler, so typically I keep a thread local queue per dispatcher thread and just enqueue these direct dispatch requests there.

When my handler is done processing the message from the global queue or from its own spsc queue it just checks the thread local one and dispatches all of those before polling from any other.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jeff Hodges

unread,
Jan 13, 2014, 2:37:35 AM1/13/14
to mechanica...@googlegroups.com
If your system is overwhelmed, you drop the message as far close as you can to where the client inputted it, send back an overload error message to the client, and increment a stat saying you lost it. This is the most common form of back-pressure. Most everything else will just cause your system to be come unresponsive in ways you can't expect, intend, or manage well.

But you're talking about in the context of circular designs.

Circular communication patterns are a, uh, anti-pattern. What are you trying to do?


--

Diogo Ferreira

unread,
Jan 13, 2014, 2:48:51 AM1/13/14
to mechanica...@googlegroups.com

That's definitely the best answer if you are allowed to drop messages.

It doesn't have to be circular for this to happen, it suffices to have one component which is substantially slower than the others, its queue will grow and end up blocking. Alas, that is a problem that can be mitigated once the system is designed to accommodate those kinds of components.

Rajiv Kurian

unread,
Jan 13, 2014, 2:51:30 AM1/13/14
to mechanica...@googlegroups.com


On Sunday, January 12, 2014 11:37:35 PM UTC-8, Jeff Hodges wrote:
If your system is overwhelmed, you drop the message as far close as you can to where the client inputted it, send back an overload error message to the client, and increment a stat saying you lost it. This is the most common form of back-pressure. Most everything else will just cause your system to be come unresponsive in ways you can't expect, intend, or manage well.

But you're talking about in the context of circular designs.

Circular communication patterns are a, uh, anti-pattern. What are you trying to do?
An example is a single network handler thread accepting connections and doing network reads and writes. It offloads the real work to worker threads who finish the work and send replies back to the single network handler thread that writes it back on the wire. A single thread that reads and writes often makes the design simpler than separate reader and writer threads. You need to have a single event loop and all the connection state is thread local. It makes handling disconnection events easier to handle than having two event loops learning about it at different times.
Another example is the SPSC queue mesh that was talked about in the actor systems implementation thread a few days back. Actors scheduled on arbitrary threads talk to each other which can lead to complex circular communication patterns.


On Sun, Jan 12, 2014 at 11:20 PM, Rajiv Kurian <geet...@gmail.com> wrote:
In producer-consumer topologies connected by bounded queues, how does one inherently avoid deadlocks? In systems with circular communication patterns (thread 1 -> thread 2 -> thread 1) deadlock could occur if the queues connecting the threads get full.

One argument is that these buffers/queues should never be getting full if tested with realistic traffic patterns. Though the increased performance, predictability and back-pressure mechanisms of bounded queues is a great upside, the danger of deadlock in complex topologies makes them a tad risky.

For threads that are both consumers and producers, if they are blocked for a significant number of cycles trying to enqueue, they could decide to fail the enqueue request  and go back to consuming from their inbound queues. Is there a list of best practices, to handle or avoid deadlocks in these systems?

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Rajiv Kurian

unread,
Jan 13, 2014, 3:00:24 AM1/13/14
to mechanica...@googlegroups.com


On Sunday, January 12, 2014 11:48:51 PM UTC-8, Diogo Ferreira wrote:

That's definitely the best answer if you are allowed to drop messages.

Right dropping messages and making sure every component knows how to handle dropped messages definitely helps.
 

It doesn't have to be circular for this to happen, it suffices to have one component which is substantially slower than the others, its queue will grow and end up blocking. Alas, that is a problem that can be mitigated once the system is designed to accommodate those kinds of components.

Yeah, queues (even unbounded ones) can get overwhelmed, but that's not really a deadlock. The problem with naive bounded queues (which do not handle these scenarios well) is that hypothetically during a traffic burst, queues might get full. This could be followed by a long period of lull, but deadlock would set in and would prevent further queue processing even though there was enough juice to process everything.

On Jan 13, 2014 7:37 AM, "Jeff Hodges" <je...@somethingsimilar.com> wrote:
If your system is overwhelmed, you drop the message as far close as you can to where the client inputted it, send back an overload error message to the client, and increment a stat saying you lost it. This is the most common form of back-pressure. Most everything else will just cause your system to be come unresponsive in ways you can't expect, intend, or manage well.

But you're talking about in the context of circular designs.

Circular communication patterns are a, uh, anti-pattern. What are you trying to do?
On Sun, Jan 12, 2014 at 11:20 PM, Rajiv Kurian <geet...@gmail.com> wrote:
In producer-consumer topologies connected by bounded queues, how does one inherently avoid deadlocks? In systems with circular communication patterns (thread 1 -> thread 2 -> thread 1) deadlock could occur if the queues connecting the threads get full.

One argument is that these buffers/queues should never be getting full if tested with realistic traffic patterns. Though the increased performance, predictability and back-pressure mechanisms of bounded queues is a great upside, the danger of deadlock in complex topologies makes them a tad risky.

For threads that are both consumers and producers, if they are blocked for a significant number of cycles trying to enqueue, they could decide to fail the enqueue request  and go back to consuming from their inbound queues. Is there a list of best practices, to handle or avoid deadlocks in these systems?

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Rüdiger Möller

unread,
Jan 13, 2014, 3:26:12 PM1/13/14
to



That's exactly the problem why I abandonned Actor's. As soon you try to contain Q size using bounded Queues, you get deadlocks. The only solution is to use unbounded Q's and then "manually" balance your actor graph.
I am currently testing implementing the Actor model on a "CSP" base. In this model all queues are bounded (like 10.000) elements. The price is, one needs one thread per actor (even Callbacks get an own Thread). If JVM would support Continuations one could simply reschedule a "blocking" Actor call. Since the OS schedules Threads and balances load, the CSP-based Actors (using lock-free bounded queues) do not perform that bad. In fact in my tests they are doing pretty well except one uses a lot more actors than Cores are avaiable (e.g. 60 (active) actors on 4 cores).

In the end, for many cases this performs better than actors (especially with many events and little processing/event), because allocation of huge Q's can be costly (its easy to get Q's >500k elements if programming naive)
(Note that Akka performance is similar to the blue line on Xeon processors) This test is from 2 socket opteron each 16 cores (=8 real cores per socket because of FP)

Michael Barker

unread,
Jan 13, 2014, 3:50:08 PM1/13/14
to mechanica...@googlegroups.com
Hi,

Interesting results, do you have source for the benchmark?

Mike.


On 14 January 2014 09:18, Rüdiger Möller <moru...@gmail.com> wrote:



That's exactly the problem why I abandonned Actor's. As soon you try to contain Q size using bounded Queues, you get deadlocks. The only solution is to use unbounded Q's and then "manually" balance your actor graph.
I am currently testing implementing the Actor model on a "CSP" base. In this model all queues are bounded (like 10.000) elements. The price is, one needs one thread per actor (even Callbacks get an own Thread). If JVM would support Continuations one could simply reschedule a "blocking" Actor call. Since the OS schedules Threads and balances load, the CSP-based Actors (using non-blocking queues) do not perform that bad. In fact in my tests they are doing pretty well except one uses a lot more actors than Cores are avaiable (e.g. 60 actors on 4 cores).


In the end, for many cases this performs better than actors (especially with many events and little processing/event), because allocation of huge Q's can be costly (its easy to get Q's >500k elements if programming naive)
(Note that Akka performance is similar to the blue line on Xeon processors) This test is from 2 socket opteron each 16 cores (=8 real cores per socket because of FP)



Am Montag, 13. Januar 2014 09:00:24 UTC+1 schrieb Rajiv Kurian:
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Rüdiger Möller

unread,
Jan 13, 2014, 5:58:09 PM1/13/14
to mechanica...@googlegroups.com
Akka (talked back to akka guys to get a decent configuration [hint: the config batches event processing, when switching to single event processing results get worse ~10-15%] ):


Threads: Ofc this could be optimized by divide & conquer to avoid contention of result object + latch, but I wanted to get a comparision of naive threading vs naive actor


And with experimental prototype actors (has 2 branches one  with bounded Qs + many threads; one with unbounded Q + one thread per CPU core):


Ofc one can get deadlocks as well, however there is a difference with bounded Q/many threads ('CSP'). If there is imbalance in processing, the thread running the slower actor automatically gets more CPU assigned because of backpressure created by blocking. This cannot be simulated for 'normal' actor implementations without continuations, so imbalance grows Qs (+latency)
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

Rüdiger Möller

unread,
Jan 13, 2014, 6:04:00 PM1/13/14
to mechanica...@googlegroups.com
Addition:

When forced to unbounded Qs, actor impls usually use concurrent linked lists (expensive in alloc+access). When using a bounded Q model, one can use array based queues, which is a big part of the performance advantage observed.

Peter Lawrey

unread,
Jan 13, 2014, 6:16:08 PM1/13/14
to mechanica...@googlegroups.com
I use unbounded queue in Chronicle as this avoid issues about returning memory to the producer.  i.e. data only goes from the producer to the consumer, it doesn't have to go back again.
It can be GC free. You can write/read tens of millions of messages in a 32 MB heap without triggering a minor GC.
Persistence gives you a bonus of transparency of your system with reproducibility in testing.

You get very little slow down if the producer is a long way ahead of the consumer (or message loss)  I have tested situations where the consumer is more than the main memory behind the producer with the loss of about half the performance (and no loss of messages) This makes recovery much simpler.


To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Rüdiger Möller

unread,
Jan 13, 2014, 6:40:48 PM1/13/14
to mechanica...@googlegroups.com
But in an Actor or CSP concurrency model, the objective is to create arbitrary interaction patterns without bothering about contention, synchronization and deadlocks. Bounded queues can be GC free as one could reuse elements. However when I tested this I realized that fields of the enqueued datastructure have to be 'volatile' then, which eats up the no-alloc advantage at least in the scenario i tested (regarding latency its preferable ofc).

Peter Lawrey

unread,
Jan 14, 2014, 2:18:58 AM1/14/14
to mechanica...@googlegroups.com

All good points and I agree with what you are saying. Ideally you want a generic framework which can cater for a wide range of applications or even changing requirements.
This flexability of comes at a cost to performance and your ability to understand the system. If you are forced to understand your system from the start and specialise, at the cost of flexability, you can have a faster, more robust system (assuming your structure doesn't have to change too much/often, in which case you risk getting the worst of both scenarios)

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Rüdiger Möller

unread,
Jan 14, 2014, 11:18:35 AM1/14/14
to

Rüdiger Möller

unread,
Jan 15, 2014, 4:51:15 AM1/15/14
to mechanica...@googlegroups.com
I found some papers adressing this. An actor system using (blocking) bounded queues basically is a CSP system (a pure CSP system has Q size=1, in practice one uses (short) bounded queues). 
In order to avoid deadlocks, CSP systems introduce "non-determinism", which means in case they can't send (receiver Q full), they don't block but just try to receive a message instead.
Technically this means (if continuations are not avaiable) one uses the stack as a queue, Additionally this has some implications regarding state of an Actor/Process as each time a message is sent, it may happen sender state is altered by receiving a message due to the receiver blocking.
like:

while (!receiver.offer(msg))
  myqueue.pollAndProcessMsg();
 
So you end up with the options a) unbounded Q risking outofmemory with actors b) deadlock c) stackoverflow d) manual handling of Q overflow as pointed out above. Still I'd expect CSP to work better as imbalances in circular message flow is reduced inherently (so max stack size depends on the amount of imbalance), while in an unbounded actor system no inherent mechanism is present due to the absence of backpressure.

e.g.
or

The models of "actors sending/receiving" messages vs "processes using channels to commnicate" are similar (an actor reference object can be seen as a channel), so the real difference between CSP and actors is the way messages are dispatched (blocking/non-deterministic vs never-blocking unbounded Qs). .

Am Montag, 13. Januar 2014 08:20:17 UTC+1 schrieb Rajiv Kurian:

Todd Hoff

unread,
Jan 15, 2014, 11:16:03 AM1/15/14
to mechanica...@googlegroups.com


On Sunday, January 12, 2014 11:20:17 PM UTC-8, Rajiv Kurian wrote:
In producer-consumer topologies connected by bounded queues, how does one inherently avoid deadlocks? In systems with circular communication patterns (thread 1 -> thread 2 -> thread 1) deadlock could occur if the queues connecting the threads get full.

Back pressure is necessary, but you also need a timeout on your actor state machine because back pressure and timeout notifications can also be lost. Using a state machine timeout you can unwind all your protocols and avoid deadlock at the protocol level. If the operation is idempotent or there's some transactional logic you can simply retry or send the failure up the stack. 
 
One argument is that these buffers/queues should never be getting full if tested with realistic traffic patterns. Though the increased performance, predictability and back-pressure mechanisms of bounded queues is a great upside, the danger of deadlock in complex topologies makes them a tad risky.

Since you can't control producers in a system how would you make such a guarantee? With unbounded queues all you are doing is increasing latency which kicks in timeouts and clogs the system with garbage, It also puts pressure on your memory subsystem, increasing paging and decreasing latency guarantees, so a bounded queue with timeouts will work well. IMHO you generally want a request waiting on the client's outbound queue rather than sitting in a queue on the server side so you can react faster to any change in the system that would make the request unnecessary or cause it to be augmented.
 

For threads that are both consumers and producers, if they are blocked for a significant number of cycles trying to enqueue, they could decide to fail the enqueue request  and go back to consuming from their inbound queues. Is there a list of best practices, to handle or avoid deadlocks in these systems?

You can execute multiple state machines in parallel with an actor as long as they don't cross streams, so the actor can always use as much CPU as possible at its priority level. 

Rajiv Kurian

unread,
Jan 15, 2014, 12:47:05 PM1/15/14
to mechanica...@googlegroups.com


On Wednesday, January 15, 2014 1:51:15 AM UTC-8, Rüdiger Möller wrote:
I found some papers adressing this. An actor system using (blocking) bounded queues basically is a CSP system (a pure CSP system has Q size=1, in practice one uses (short) bounded queues). 
In order to avoid deadlocks, CSP systems introduce "non-determinism", which means in case they can't send (receiver Q full), they don't block but just try to receive a message instead.
Technically this means (if continuations are not avaiable) one uses the stack as a queue, Additionally this has some implications regarding state of an Actor/Process as each time a message is sent, it may happen sender state is altered by receiving a message due to the receiver blocking.
like:

while (!receiver.offer(msg))
  myqueue.pollAndProcessMsg();
 
So you end up with the options a) unbounded Q risking outofmemory with actors b) deadlock c) stackoverflow d) manual handling of Q overflow as pointed out above. Still I'd expect CSP to work better as imbalances in circular message flow is reduced inherently (so max stack size depends on the amount of imbalance), while in an unbounded actor system no inherent mechanism is present due to the absence of backpressure.
This could be altered to prevent the re-entrant processing problems if we poll and put the messages on a thread local queue instead of processing them immediately (like we talked about in the actor thread). There is still some backpressure here.

void sendMsg(Receiver receiver, Msg msg) {
 Boolean sent = false;
 int numTries = 0;
  while(!sent) {
    sent =outGoingQueues.get(receiver.index()).offer(msg);
    if (sent) break;

    // Try for a while with back-offs.
    if (numTries >= YIELD_THRESHOLD && numTries < POLL_THRESHOLD) {
      Thread.yield(); // Or sleep or something.
    } else if (numTries >= POLL_THRESHOLD) {
      // Still not working? -  probable deadlock.
      Msg msg = IncomingQueues.get(receiver.index()).poll(); // Poll from the incoming SPSC queue of the dispatcher we are trying to send a msg to ease deadlock.
      if (msg != null) pushToLocalQueue(msg)
    }
    numTries += 1;
  }
}

After each receive block we could then check if the local queue is non-empty and if it is process it's contents. This approach sweeps the dust under the carpet though. What happens if the thread local queue is bounded and full? If we make the thread local queue unbounded then, it causes other problems.

xeus...@googlemail.com

unread,
Jan 15, 2014, 2:20:33 PM1/15/14
to mechanica...@googlegroups.com
I would solve this issue using global signals. So you have two static atomic integers, lets call them QueueWarning and QueueFull signal. As soon as any inbound queue is filled to 3/5'th it increments the "QueueWarning" signal. As soon as the queue is filled to 4/5'th it increases the QueueFull. If the queue falls back below 3/5'th it decreases the QueueFull and if it falls below 2/5'th it decreases the QueueWarning, so 2/5'th to 3/5'th -> Warning, 3/5'th -> 4/5'th Full. 

This allows those threads that accept incoming connections and read sockets to prevent a real overflow. So as soon as QueueWarning is larger then zero, you may decide not to accept a new incoming connections, maybe tell the load balancer in a health-status response that you're at your limit and don't want to accept new incoming connections, while not being really down.

If QueueFull is signaled (> 0), respond to any incoming new request with some kind of BUSY message without placing this thing into any queue, effectively protecting your service, while continuing to process the already queued tasks. The client may reconnect and then retry the query so that the load balancer should be able forward it to another instance. Still the client may wait until all open queries are answered.

If you now log which queues are going to signal the Warning and Full, you can fix the issue by either increasing these queues or using more processing power for those queues. Logging the queue sizes can be very helpful in this model.


my 2 cent


Am Montag, 13. Januar 2014 08:20:17 UTC+1 schrieb Rajiv Kurian:

Rüdiger Möller

unread,
Jan 16, 2014, 4:39:13 PM1/16/14
to mechanica...@googlegroups.com
The thingy with "nondeterminism" by polling when blocked contains Q sizes in tests reliably. Stack used is capped by the amount of imbalance. E.g. if A sends 1 msg to B and B replies with 10 messages, max stack used will be 10 entries (if Q size is 1).

BUT: 
This can mess up message ordering. Your proposal of relying on a thread local single threaded Q has the same problem, So both variants are not an option. However there is some beef in the idea of queuing at sender side where things are single threaded. Then only use very short bounded concurrent Qs for inter-thread transition.

Rüdiger Möller

unread,
Jan 16, 2014, 4:50:03 PM1/16/14
to mechanica...@googlegroups.com


Since you can't control producers in a system how would you make such a guarantee? With unbounded queues all you are doing is increasing latency which kicks in timeouts and clogs the system with garbage, It also puts pressure on your memory subsystem, increasing paging and decreasing latency guarantees, so a bounded queue with timeouts will work well. IMHO you generally want a request waiting on the client's outbound queue rather than sitting in a queue on the server side so you can react faster to any change in the system that would make the request unnecessary or cause it to be augmented.


(Blocking) Backpressure with bounded queues always induces risk of deadlocks in actor graphs containing circles. Simplest form is: A sends 10 messages to B, B replies with 10 messages to each message received from A. If Q's are bounded to say 5, they will deadlock as soon B's incoming queue reaches size 5. A is blocked, so does not accept messages anymore from B because its waiting for B, B cannot send to A anymore. Its not about external sources of input, but about deadlocks occuring in complex circular actor graphs. Unbounded Q's trade deadlock risk against out-of-mem risk.  

Rüdiger Möller

unread,
Jan 16, 2014, 4:53:00 PM1/16/14
to mechanica...@googlegroups.com

Am Mittwoch, 15. Januar 2014 20:20:33 UTC+1 schrieb xeus...@googlemail.com:
I would solve this issue using global signals. So you have two static atomic integers, lets call them QueueWarning and QueueFull signal. As soon as any inbound queue is filled to 3/5'th it increments the "QueueWarning" signal. As soon as the queue is filled to 4/5'th it increases the QueueFull. If the queue falls back below 3/5'th it decreases the QueueFull and if it falls below 2/5'th it decreases the QueueWarning, so 2/5'th to 3/5'th -> Warning, 3/5'th -> 4/5'th Full. 


This reduces performance of a 12 core machine to that of a single core machine. With high event rates you get too much contention on your globals.. 

xeus...@googlemail.com

unread,
Jan 16, 2014, 11:31:44 PM1/16/14
to mechanica...@googlegroups.com
In a normal operation the queues shouldn't fill up or? So under normal operation there is no contention at all, because the globals stay unchanged and are only read by the accepting thread. Contention starts as soon as the queues are too full, but even in that case you only have a max. of two increments per queue and two decrements per queue if load decreases, nothing I would worry about.

Rüdiger Möller

unread,
Jan 17, 2014, 5:01:06 AM1/17/14
to mechanica...@googlegroups.com
Ah .. you are right. Misunderstood your proposal at first glance ;)

Jason Koch

unread,
Jan 17, 2014, 6:47:21 PM1/17/14
to mechanica...@googlegroups.com
I agree with Rudiger, and I'd probably go further - bounded queues introduce risk of deadlocks and I don't think that's a problem the framework should have to solve. The developer needs to understand when they are working in a concurrent system and think through the consequences.

I'm not sure there's an easy answer to "concurrent operations contain a circular dependency". It's one of the challenges of concurrent programming with dependencies between actors ( / threads / callbacks / etc). If you design a loop it needs to be carefully thought through depending on your circumstance.

Perhaps a modified design would help.
For example, if actor A receives requests, sends to B, then B sends to A and A responds. Perhaps insert an actor upstream of A, which can throttle requests to a total of N outstanding requests. Queues between A and B can then be sized to 2N+1 and you should have sufficient capacity. Take care that A does not depend on the upstream to send the response of course. Or maybe a third actor C to track the work of A and B which A and B send their results to.

To me, the difference between unbounded queue and very large bounded queue seems academic for this scenario. Make the queues significantly larger than needed by designing it in. If the concern is you run out of memory by making them so large, then I question how an unbounded queue would really help you here; an unbounded queue would just as soon run out of memory if you're expecting to have that much work in flight.



On Fri, Jan 17, 2014 at 8:50 AM, Rüdiger Möller <moru...@gmail.com> wrote:


Since you can't control producers in a system how would you make such a guarantee? With unbounded queues all you are doing is increasing latency which kicks in timeouts and clogs the system with garbage, It also puts pressure on your memory subsystem, increasing paging and decreasing latency guarantees, so a bounded queue with timeouts will work well. IMHO you generally want a request waiting on the client's outbound queue rather than sitting in a queue on the server side so you can react faster to any change in the system that would make the request unnecessary or cause it to be augmented.


(Blocking) Backpressure with bounded queues always induces risk of deadlocks in actor graphs containing circles. Simplest form is: A sends 10 messages to B, B replies with 10 messages to each message received from A. If Q's are bounded to say 5, they will deadlock as soon B's incoming queue reaches size 5. A is blocked, so does not accept messages anymore from B because its waiting for B, B cannot send to A anymore. Its not about external sources of input, but about deadlocks occuring in complex circular actor graphs. Unbounded Q's trade deadlock risk against out-of-mem risk.  

--

Francis Stephens

unread,
Jan 22, 2014, 4:41:52 AM1/22/14
to mechanica...@googlegroups.com
My expectation is that although your global flags would be mostly cheap and read-only the requirement that the size of the queue be read and written on every push and pop operation would create significant cache coherency traffic. Or is there a way to get around that?

Rüdiger Möller

unread,
Jan 22, 2014, 11:54:41 AM1/22/14
to mechanica...@googlegroups.com
Well, trying to solve problems by yourself makes you understand existing solutions much better. :-))

Nice wrap up from disruptor documentation describing exactly the issues I had when fiddling with actors (bounded/unbounded Q based):

3.5 The Problems of Queues
 
Queues typically use either linked-lists or arrays for the underlying storage of elements.  If an in-memory queue is allowed to be unbounded then for many classes of problem it can grow unchecked until it reaches the point of catastrophic failure by exhausting memory.  This happens when producers outpace the consumers.  Unbounded queues can be useful in systems where the producers are guaranteed not to outpace the consumers and memory is a precious resource, but there is always a risk if this assumption doesn’t hold and queue grows without limit.  To avoid this catastrophic outcome, queues are commonly constrained in size (bounded).  Keeping a queue bounded requires that it is either array-backed or that the size is actively tracked. 
Queue implementations tend to have write contention on the head, tail, and size variables.  When in use, queues are typically always close to full or close to empty due to the differences in pace between consumers and producers.  They very rarely operate in a balanced middle ground where the rate of production and consumption is evenly matched.  This propensity to be always full or always empty results in high levels of contention and/or expensive cache coherence.  The problem is that even when the head and tail mechanisms are separated using different concurrent objects such as locks or CAS variables, they generally occupy the same cache-line. 
The concerns of managing producers claiming the head of a queue, consumers claiming the tail, and the storage of nodes in between make the designs of concurrent implementations very complex to manage beyond using a single large-grain lock on the queue.  Large grain locks on the whole queue for put and take operations are simple to implement but represent a significant bottleneck to throughput.  If the concurrent concerns are teased apart within the semantics of a queue then the implementations become very complex for anything other than a single producer – single consumer implementation. 
In Java there is a further problem with the use of queues, as they are significant sources of garbage.  Firstly, objects have to be allocated and placed in the queue.  Secondly, if linked-list backed, objects have to be allocated representing the nodes of the list.   When no longer referenced, all these objects allocated to support the queue implementation need to be re-claimed. 



Reply all
Reply to author
Forward
0 new messages