Improving remote throughput for large bursts

401 views
Skip to first unread message

Patrik Nordwall

unread,
Apr 11, 2014, 2:40:59 AM4/11/14
to akka...@googlegroups.com
I'm working on improving the throughput of sending messages to remote system for a scenario that seems to be scaringly common. Sending many messages in one go.

What happens is that the TCP buffer gets full and doesn't accept more writes. Then we must buffer, backoff and try again. First step was to replace the stashing in the endpoint writer with a more efficient internal buffer.

That made things worse for some buffer sizes. The reason is probably that the inefficient stashing accidentally provided the needed backoff.

Now I have implemented an adaptive backoff strategy that seems to be the right direction. Attached the results of my tests. Better throughput for all tested combinations, and most important it handles bursts of 300000 messages without degraded throughput or false failure detection.

This is only one of many tests that should be done, but I wanted to share the so far good news.

Cheers,
Patrik


--

Patrik Nordwall
Typesafe Reactive apps on the JVM
Twitter: @patriknw

remote-bench-result2.pdf

Björn Antonsson

unread,
Apr 11, 2014, 3:02:11 AM4/11/14
to Patrik Nordwall, akka...@googlegroups.com
Awesome improvements.

B/
--
You received this message because you are subscribed to the Google Groups "Akka Developer List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
Björn Antonsson
Typesafe – Reactive Apps on the JVM
twitter: @bantonsson

Roland Kuhn

unread,
Apr 11, 2014, 3:22:07 AM4/11/14
to akka-dev
Great results, you are absolutely right that they must be shared! :-)

--
You received this message because you are subscribed to the Google Groups "Akka Developer List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<remote-bench-result2.pdf>



Dr. Roland Kuhn
Akka Tech Lead
Typesafe – Reactive apps on the JVM.
twitter: @rolandkuhn


Patrik Nordwall

unread,
Apr 25, 2014, 9:28:19 AM4/25/14
to akka...@googlegroups.com
These improvements have been merged to master and release-2.3 branches. I made another improvement that I have high hopes for. Heartbeat messages for the remote and cluster death watch have priority over other messages, which means that they have a better chance of getting through even when bursts of many messages are sent. Heartbeats of the transport failure detector was changed to piggyback on normal message payload, so those should also pass through.

I encourage anyone with heavy usage of akka remote/cluster to try this timestamped snapshot: 2.3-20140425-151510
that is published to repo http://repo.akka.io/snapshots/

Cheers,
Patrik

Reply all
Reply to author
Forward
0 new messages