Performance of Akka-Http 2.5.4

668 views
Skip to first unread message

Jakub Kahovec

unread,
Aug 11, 2017, 9:53:21 AM8/11/17
to Akka User List
Hi,

we've done recently a little research regarding the performance of current JVM based HTTP servers (and Nginx for comparison)  and as regards the results of Akka-Http we were rather unpleasantly surprised.

Here are the presumptions for the benchmark:
  • Single handler handling GET /benchmark returns HTTP 200 OK with payload benchmark and HTTP headers Date, Server, Content-Type and Content-Length.
  • Using mostly default settings, change them only when it's obvious that the default settings do not work well in the benchmark.
  • We are interested in the maximum number of requests per second and latencies.
  • Three rounds, first is warmup and the rest try to test the frameworks under different loads (each test 60 seconds)
Hardware : client and benchmark machines: 40 CPUs (Intel(R) Xeon(R) CPU E5-2630L v4 @ 1.80GHz), 64GB of RAM

Results:

Warmup

| metric          | nginx         | akka-http | colossus  | finagle   | http4s    | netty     | spring    | vertx     |
|---              |---            |---        |---        |---        |---        |---        |---        |---        |
| req/s**         | 377627.54     | 161957.96 | 341140.06 | 271592.28 | 147829.99 | 337583.43 | 149290.22 | 356968.36 |
| latency avg     | 99.18μs       | 656.77μs  | 386.72μs  | 579.93μs  | 8.28ms    | 336.11μs  | 21.98ms   | 345.04μs  |
| latency 75th    | 98.00μs       | 255.00μs  | 115.00μs  | 149.00μs  | 248.00μs  | 131.00μs  | 250.00μs  | 101.00μs  |
| latency 99th    | 149.00μs      | 815.00μs  | 271.00μs  | 820.00μs  | 156.60ms  | 198.00μs  | 846.85ms  | 188.00μs  |
| cpu idle %      | 90            | 36        | 76        | 61        | 13        | 75        | 57        | 76        |


Round 1

| metric          | nginx         | akka-http | colossus   | finagle   | http4s    | netty      | spring    | vertx     |
|---              |---            |---        |---         |---        |---        |---         |---        |---        |
| req/s           | 988019.69     | 245979.16 | 1007300.12 | 460936.09 | 149288.46 | 1021265.73 | 258088.17 | 990153.39 |
| latency avg     | 642.29μs      | 3.31ms    | 831.16μs   | 4.49ms    | 122.44ms  | 2.12ms     | 9.57ms    | 1.96ms    |
| latency 75th    | 537.00μs      | 2.79ms    | 572.00μs   | 5.12ms    | 157.41ms  | 0.88ms     | 2.20ms    | 1.17ms    |
| latency 99th    | 1.06ms        | 13.84ms   | 3.77ms     | 33.31ms   | 931.98ms  | 18.51ms    | 321.54ms  | 16.68ms   |
| cpu idle %      | 59            | 63        | 5          | 1         | 16        | 2          | 4         | 2         |


Round 2

| metric           | nginx      | akka-http | colossus   | finagle   | http4s    | netty        | spring    | vertx     |
|---               |---         |---        |---         |---        |---        |---           |---        |---        |
| req/s            | 988149.09  | 242432.27 | 1028527.92 | 463750.89 | 166627.09 | 1068348.53   | 257553.62 | 991568.36 |
| latency avg      | 1.05ms     | 9.36ms    | 1.25ms     | 4.91ms    | 89.26ms   | 3.82ms       | 11.00ms   | 3.31ms    |
| latency 75th     | 1.10ms     | 10.66ms   | 1.14ms     | 6.40ms    | 95.28ms   | 5.10ms       | 4.46ms    | 4.61ms    |
| latency 99th     | 1.41ms     | 43.96ms   | 3.50ms     | 33.56ms   | 629.33ms  | 31.28ms      | 275.59ms  | 22.80ms   |
| cpu idle %       | 57         | 62        | 4          | 1         | 14        | 2            | 5         | 1         |


As you can see from the results Akka-Http (2.5.4 + Scala 2.12) didn't performed very well, from requests per seconds perspective as well as latency. We accounting it to the utilising of CPUs, which comparing to the other servers was idle for around 60%. We were trying to tweak the parallelism (set  parallelism-max to num of cores or higher), used different executers (fork-join-executer, affinity-pool-executer and thread-pool-executer,) and some others stream or http.server settings  but none of those helped much actually. 

Are we missing something fundamental or are there some other settings which might help to increase CPU utilisation or increase performance ?

Thank you

Jakub





Konrad “ktoso” Malawski

unread,
Aug 11, 2017, 9:58:19 AM8/11/17
to Akka User List, Jakub Kahovec
When wanting to discuss any benchmarks, please share actual code as otherwise it's impossible to comment on what you're actually benchmarking. Same goes for benchmark setup, you did not explain how the benchmark was run and what network it was on etc.

We have continuously confirmed same performance as the netty 4 backend in play apps, as well as beating spray in performance. Also, you're not comparing apples to apples it fbyou comprare raw netty without any DSLs to a high level routing API that Akka provides - thus, please share code. 

Jakub Kahovec

unread,
Aug 11, 2017, 10:43:53 AM8/11/17
to Akka User List, jakub....@gmail.com
It was ran on the 1 Gb network, but I don't think it's actually relevant here as the other servers ran on the same network and performed better. I understand that raw netty should perform better, but other server performed pretty good  as well. As I said before we were mainly surprised by the utilisation of the CPU's which was pretty low, comparing to the others.  

I'm attaching the code used for benchmarking.
benchmark.zip

Michael Zhong

unread,
Nov 10, 2017, 7:32:14 AM11/10/17
to Akka User List
That's awful, I've recently start using akka-http to write some performance critical server, maybe I should go use some other library.

在 2017年8月11日星期五 UTC+8下午10:43:53,Jakub Kahovec写道:

lutzh

unread,
Nov 11, 2017, 10:57:36 AM11/11/17
to Akka User List
Hi Jakub,

I can't download the zip file (I*m using the web interface, it tells me "The requested document, benchmark.zip (0x38bdd40db96ca part 0.1), could not be found: DOCID_NOT_FOUND")

Could you maybe share the sources on Github, Bitbucket or the likes?

Thanks,
Lutz

lutzh

unread,
Nov 11, 2017, 11:01:29 AM11/11/17
to Akka User List
Hi Michael,

Don't throw out Akka HTTP just yet, do your own measurements first. With maybe the exception of high frequency trading, I think you'll be ok...

Patrik Nordwall

unread,
Nov 12, 2017, 6:51:36 AM11/12/17
to akka...@googlegroups.com
First of all, I think your benchmarking methodology is flawed. Start by watching a presentation by Gil Tane about coordinated omission, such as https://www.youtube.com/watch?v=lJ8ydIuPFeU

You should use wrk2 instead of wrk.

If you are interested in latencies you must first decide on a throughput rate that the system can handle and that must be lower than the maximum throughput. E.g.

./wrk -t2 -c100 -d60s --rate 20000 --latency http://localhost:8080/benchmark

If you are interested in finding the maximum throughput you can increase the rate until it can't reach the target, and then you will also see that the latencies become very high, which is expected since the system is fully saturated.

Secondly, your configuration doesn't look good. Start out with defaults, possibly reducing the number of threads in the default dispatcher:

akka {
  actor {
    default-dispatcher {
      fork-join-executor {
        parallelism-min = 6
        parallelism-max = 6
      }
      throughput=20
    }
  }
}

Thirdly, the benchmark is not representing a real world application.

Regards,
Patrik


--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscribe@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.



--

Patrik Nordwall
Akka Tech Lead
Lightbend -  Reactive apps on the JVM
Twitter: @patriknw

johannes...@lightbend.com

unread,
Nov 13, 2017, 7:46:13 AM11/13/17
to Akka User List
I missed this post before. 

I'd like to add another point. Akka Http hasn't been performance tested on a 40 core machine. The high idle CPU percecntage means that either Akka / Akka Http is not configured correctly for this amount of cores or that there are actual contention issues at these levels of scale. It would be definitely interesting to know what the problem is to offer a better default experience for running Akka Http on this kind of hardware.

If you are still listening in, Jakub, it would be nice if you could set parallelism-max to the number of cores on your machine and/or set `parallelism-factor = 1` as Patrik suggested. One reason for bad performance could be that the default parallelism-factor of 3 would lead to 120 threads battling for resources, starving each other off CPU time maybe even while keeping some resource. If this alone doesn't increase performance, a few stack dumps from the server process during steady state would help because that would likely point out places with high contention.

For anyone else listening in here, I also wanted to stress, that you need to put any kind of performance numbers into perspective. We cannot test everything in every environment and details usually matter in benchmarks. High CPU idle times like in this case mean that something currently just doesn't work correctly in this setting. For best performance, you need to benchmark for yourself on your own hardware and then be prepared to dig into issues.

Johannes

Unmesh Joshi

unread,
Nov 27, 2017, 11:03:21 AM11/27/17
to Akka User List
I am curious to know if anyone found the root cause of this. For HTTP processing, there will be an actor, ActorGraphInterpreter created to process each connection with back pressure. The way the graph stages get executed, should give similar effect as EatWhatYouKill policy in Jetty (https://webtide.com/eat-what-you-kill/), providing mechanical sympathy. I haven't done any performance measurements, but code inspection suggests that. 

Jakub Kahovec

unread,
Nov 28, 2017, 8:31:03 AM11/28/17
to Akka User List
Thank you Johannes for your comments. As you pointed out, the 40 core machine isn't a typical hardware, we chose it just for measurement purposes and were quite surprised by the results. 
We've been using for quite a while a home-grown server built on top of Netty which works pretty well for us, but time to time we consider switching for something 'more standard', and often Akka is being mention.
I've been following Akka for many years I must say I'm really impressed by the project, you guys have done amazing work. I had used Spray before and was impatiently waiting for Akka Http, so when it finally got out and was being said to be on pair, or faster, with Spray, I couldn't wait to try it out. The test itself  was actually done by colleagues of mine, and after finding out that Akka Http didn't performed very well I was kind of also blaming them that they didn't set it up correctly and was myself trying out every possible settings, including  parallelism-factor = 1 and parallelism-max = 40, changing dispatchers etc. to persuade them  that Akka Http can defeat other JVM based servers and was eventually sad that I was not able to achieve it. That's why I've created this post and included the benchmark code so that others, or your team, can help me to find out what needs to be tweaked to make Akka Http shining again.

Jakub

Michael Zinsmaier

unread,
Dec 5, 2017, 5:11:43 PM12/5/17
to Akka User List
Hi Jakub,

out of curiosity do you know the split between garbage collection/actual work time and the memory used by the different frameworks?
(at least in the Netty example it looks as if you are reusing views on the same buffer with "duplicate" - not sure of the impact though)

Best Michael
Reply all
Reply to author
Forward
0 new messages