Hi,
I'll try to explain what I'm experiencing in my akka-http app.
I noticed that under load a lot of connections (~1-2%) were dropped or timed out. I started investigating, tuning os and akka params and trimming down my sample app until I got this:
//N.B.: this is a test
implicit val system = ActorSystem()
implicit val mat: ActorMaterializer = ActorMaterializer()
implicit val ec = system.dispatcher
val binding: Future[ServerBinding] = Http().bind("0.0.0.0", 1104).map { conn ⇒
val promise = Promise[Unit]()
// I don't even wait for the end of the flow
val handler = Flow[HttpRequest].map { _ ⇒ promise.success(()); HttpResponse() }
// to be sure it's not a mapAsync(1) problem I use map and block here, same result
val t0 = System.currentTimeMillis()
println(s"${Thread.currentThread().getName} start")
conn handleWith handler
Await.result(promise.future, 10.seconds)
println(s"${Thread.currentThread().getName} end ${System.currentTimeMillis() - t0}ms");
}.to(Sink.ignore).run()
Await.result(binding, 10.seconds)
When I run a small test using ab with something like "-c 1000" concurrent connections or more (even if I'm handling one at a time here), some of the requests immediately start getting unusual delays:
default-akka.actor.default-dispatcher-3 start
default-akka.actor.default-dispatcher-3 end 2015ms -> gets bigger
This keeps getting worse. After a while I can kill ab, wait some minutes and make a single request and it either gets refused or times out. The server is basically dead
I get the exact same result with this, if you're wondering why I did all that blocking and printing stuff above:
val handler = Flow[HttpRequest].map(_ ⇒ HttpResponse()).alsoToMat(Sink.ignore)(Keep.right)
val binding: Future[ServerBinding] = Http().bind("0.0.0.0", 1104).mapAsync(1) { conn ⇒
conn handleWith handler
}.to(Sink.ignore).run()and the same happens if I use bindAndHandle with a simple route.Â
In my standard setup (bindAndHandle, any number of concurrent connections (1k to 10k tried) and keepalive for the requests) I see a number of connections between 1 and 3% failing.
This is what I get calling a simple route with  bindAndHandle, MaxConnections(10000) and connection keepalive enabled on the client: lots of timeouts after just 10k calls already:
Concurrency Level: Â Â Â 4000
Time taken for tests: Â 60.605 seconds
Complete requests: Â Â Â 10000
Failed requests: Â Â Â Â 261
  (Connect: 0, Receive: 87, Length: 87, Exceptions: 87)
Keep-Alive requests: Â Â 9913
...
Connection Times (ms)
       min  mean[+/-sd] median  max
Connect: Â Â Â Â 0 Â Â 7 Â 31.3 Â Â Â 0 Â Â 191
Processing: Â Â 0 Â 241 2780.8 Â Â Â 5 Â 60396
Waiting: Â Â Â Â 0 Â 92 1270.8 Â Â Â 5 Â 60396
Total: Â Â Â Â Â 0 Â 248 2783.5 Â Â Â 5 Â 60459
Percentage of the requests served within a certain time (ms)
...
 90%   13
 95%   255
 98%  2061
 99%  3911
 100%  60459 (longest request)Â
It looks like it does the same on my local machine (mac) but I'm not 100% sure. I'm doing the tests on an ubuntu 8-core 24GB ram vm
I really don't know what to do, I'm trying every possible combination of system parameters and akka config but I keep getting the same result. Â
Basically everything I tried (changing /etc/security/limits.conf, changing sysctl params, changing akka concurrent connections, backlog, dispatchers etc) led to the same result, that is: connections doing nothing and timing out. As if the execution were queued somehow
Is there something I'm missing? Some tuning parameter/config/something else?Â
It looks like the piece of code that times out is conn handleWith handler even if 'handler' does nothing and and it keeps doing it even after the load stops. I.e. the connection is established correctly, but the processing is stuck.
this is my ulimit -a:
core file size      (blocks, -c) 0
data seg size      (kbytes, -d) unlimited
scheduling priority       (-e) 0
file size        (blocks, -f) unlimited
pending signals         (-i) 96360
max locked memory    (kbytes, -l) unlimited
max memory size     (kbytes, -m) unlimited
open files            (-n) 100000
pipe size       (512 bytes, -p) 8
POSIX message queues   (bytes, -q) 819200
real-time priority        (-r) 0
stack size        (kbytes, -s) 8192
cpu time        (seconds, -t) unlimited
max user processes        (-u) 32768
virtual memory      (kbytes, -v) unlimited
file locks            (-x) unlimited
vm.swappiness = 0
Cheers