Strategies for Back Pressure

Patrick Julien

unread,

Aug 25, 2014, 3:56:04 PM8/25/14

to lmax-di...@googlegroups.com

I have been reading and re-reading Martin Fowler's architecture overview of the LMAX exchange[1]. It's still not clear to me how back pressure and errors were handled in this design.

For back pressure, I think it was accomplished by blocking when acquiring the next sequence number from the disruptor when it's completely full. I think that makes sense if this thing was using blocking I/O to read from the network.

If it was using asynchronous I/O, it's not clear to me how back pressure was applied to new incoming connections. Can anyone provide some hindsight here?

For error handling. If the journal could not be written to, say the disk was full, how did this prevent:

- The business logic from continuing?

- Allow resuming when disk space was cleared?

- How was an error reported back to the client? Would putting something like a Netty channel inside a disruptor event be something that is recommended? I have a feeling it's not, hence why I'm asking for advice here.

[1] http://martinfowler.com/articles/lmax.html

Rajiv Kurian

unread,

Aug 26, 2014, 10:07:17 PM8/26/14

to lmax-di...@googlegroups.com

On Monday, August 25, 2014 12:56:04 PM UTC-7, Patrick Julien wrote:

I have been reading and re-reading Martin Fowler's architecture overview of the LMAX exchange[1]. It's still not clear to me how back pressure and errors were handled in this design.

For back pressure, I think it was accomplished by blocking when acquiring the next sequence number from the disruptor when it's completely full. I think that makes sense if this thing was using blocking I/O to read from the network.

If it was using asynchronous I/O, it's not clear to me how back pressure was applied to new incoming connections. Can anyone provide some hindsight here?

Asynchronous I/O still needs to make select/poll/epoll calls with a timeout to get the channels that have activity on them. So if you keep waiting on your next sequence on a ring buffer you delay the mean time between your select* calls which equals back pressure. This is a form of back pressure but not the only form of load shedding. Many times for eg say you were designing a game server or some VOIP application, strict queuing of network packets is not a good thing. You'd rather drop things if you can't keep up rather than wait for a ring buffer slot to open up because processing old packets is near useless. In these cases you can use something like a tryNext() on your ring buffer to see if you can put an item into the buffer. If not then you can

a) Buffer the packet for later processing. You might have to take special care to make sure that FIFO constraints are maintained i.e. drain the buffer into the ring buffer as and when you can instead of putting packets straight into the ring buffer.

b) Just drop your packet if you don't have a ring buffer slot.

c) Use an asynchronous form of back pressure. You need to (in your application level protocol) communicate to your clients how much load you are prepared to take. So if your ring buffer starts getting full, you will communicate this fact to your clients who will themselves deal (maybe in different ways) to the reduced capacity on the server. Sadly a lot of this is done by TCP already so you'd be rewriting a lot of what's already been written. The one big use case for this is when you want back pressure and load shedding but don't need strict ordering of packets. You might also want to use some UDP derived protocol that helps you get there.

For error handling. If the journal could not be written to, say the disk was full, how did this prevent:

- The business logic from continuing?

The canonical disruptor example has the journalling thread as one of the initial processors. So if the journalling thread cannot keep up with the load it will create back pressure on the thread generating requests (potentially your network thread).

- Allow resuming when disk space was cleared?

Again presumably if the journalling thread manages to clean space or it magically appears from somewhere, it will start draining the ring buffer and things will start to flow through your system again. In the general case FIFO order is still maintained and this is where you might want to put in some smarts to ignore old things etc.

- How was an error reported back to the client? Would putting something like a Netty channel inside a disruptor event be something that is recommended? I have a feeling it's not, hence why I'm asking for advice here.

Point (c) that I made during the back pressure section is applicable here. Don't block on the next slot in the ring buffer. Instead use tryNext() to check for a slot. If it's not available send a msg to your client telling them that their request could not be processed and/or to slow down. In my experience ring-buffer like topologies are easiest to use if you are in control of your run loop. AFAIK netty takes over your the loop and their model is to partition connections onto threads (fixed partitions) and then call read/write calls from those threads with your logic manifesting through callbacks.

Here is a link to two sample applications I wrote a while back to test some of this.

i) https://github.com/RajivKurian/Java-NIO-example - This is a mix of Scala and Java. I have a very simple binary buddy allocator that i use to get byte buffers to process incoming requests. These requests are length prefixed. Once a request is complete (I receive expected number of bytes) it is handed over to another thread (via a ring buffer) to do the actual processing. There is an abstraction that reclaims the byte buffers once they are processed by the processing thread and put back in the binary buddy pool. So really there is one big unsafe chunk that is allocated and the binary buddy pool operates via offsets from it. None of the back pressure stuff that I talked about is in here, but you should get ideas from it.

ii) https://github.com/RajivKurian/epoll-example - This is a C/C++ example which does a lot less than (i). It has a super ghetto ring buffer in C++ which does not align itself to a cache line (though the internal fields themselves are cache line aligned). A better ring buffer implementation which uses posix_memalign (sorry unix only) to align the start of the ring buffer (internal fields aligned usind std C++11) - https://github.com/RajivKurian/image-processor/blob/master/project/src/queues/ring_buffer.hpp

[1] http://martinfowler.com/articles/lmax.html

Michael Barker

unread,

Aug 27, 2014, 4:51:41 AM8/27/14

to lmax-di...@googlegroups.com

Hi Patrick,

I have been reading and re-reading Martin Fowler's architecture overview of the LMAX exchange[1]. It's still not clear to me how back pressure and errors were handled in this design.

For back pressure, I think it was accomplished by blocking when acquiring the next sequence number from the disruptor when it's completely full. I think that makes sense if this thing was using blocking I/O to read from the network.

You can use blocking blocking and non-blocking calls to acquire the next sequence number. With data arriving from the network, we tend to use the non-blocking approach and drop messages when the buffer is full. We could elect to block, but that would just cause packet loss elsewhere. Also our messaging bus can lock up if you block the receiving thread.

If it was using asynchronous I/O, it's not clear to me how back pressure was applied to new incoming connections. Can anyone provide some hindsight here?

See above.

For error handling. If the journal could not be written to, say the disk was full, how did this prevent:

- The business logic from continuing?

The Disruptor supports gating between independent consumers. When setting up the Disruptor, you declare that the application event handler is gated on the journaler. The application thread will then only consider messages processed by the journaler to be available.

- Allow resuming when disk space was cleared?

We don't resume in this situation. We log a fatal error message and mark the whole service as failed in the case of an IOException. You could handle resuming by having the Journaler thread retry indefinitely until the write succeeds.

- How was an error reported back to the client? Would putting something like a Netty channel inside a disruptor event be something that is recommended? I have a feeling it's not, hence why I'm asking for advice here.

If it is an application error then we simply send a response message that indicates the failure. If this is a technical failure like IO errors in the journaler then we a separate tool in our infrastructure that monitors the health of various components. The journaler would put itself into an errored state and it would be picked up by external monitoring. We wouldn't communicate this error to the client.

Mike.

Patrick Julien

unread,

Aug 27, 2014, 7:36:58 AM8/27/14

to lmax-di...@googlegroups.com

thank you both

Reply all

Reply to author

Forward