CLOSE_WAIT Problem

42 views
Skip to first unread message

Hila

unread,
Jun 25, 2015, 6:09:14 AM6/25/15
to spray...@googlegroups.com
Hi,

Our application receives about 50k POST (json) requests per second, and we are facing some problems on production. The application gets stuck and stops receiving connections.
During this time, This warning appears on our logs:
 [default-akka.actor.default-dispatcher-11] WARN  s.can.server.HttpServerConnection - Configured registration timeout of 1 second expired, stopping
And many (~20K) connections are in "CLOSE_WAIT" status

We are using spray-can 1.3.1, and akka 2.3.6. and didn't changed the default configuration of spray.

When doing ss -lt I'm seeing this output:

State      Recv-Q Send-Q      Local Address:Port          Peer Address:Port
LISTEN     0      128             127.0.0.1:9000                     *:*
LISTEN     0      128                    :::11211                   :::*
LISTEN     0      128                     *:11211                    *:*
LISTEN     0      128                     *:6379                     *:*
LISTEN     0      128                    :::6379                    :::*
LISTEN     0      128             127.0.0.1:11212                    *:*
LISTEN     0      128             127.0.0.1:11213                    *:*
LISTEN     0      128             127.0.0.1:11214                    *:*
LISTEN     129    128                     *:8080                     *:*
LISTEN     0      128                     *:http                     *:*
LISTEN     0      128                     *:81                       *:*
LISTEN     0      128                     *:82                       *:*
LISTEN     0      128                     *:83                       *:*
LISTEN     0      128                     *:84                       *:*
LISTEN     0      128                    :::ssh                     :::*
LISTEN     0      128                     *:ssh                      *:*
LISTEN     0      128                     *:4568                     *:*
LISTEN     0      128                     *:https                    *:*

We've tried to increase the backlog on the Http.Bind (At the moment it's set to 2048),
and also here: sysctl -w net.ipv4.tcp_max_syn_backlog=50000
But with no change.

I've read here some info regarding this, but didn't understand what is the suggested solution..

Do you think it's related? if so, have any idea how I can solve this?

We would really appreciate any help with this problem.

Thanks

Hila

unread,
Jun 25, 2015, 6:09:20 AM6/25/15
to spray...@googlegroups.com
Hi,

Our application receives about 50k POST (json) requests per second, and we are facing some problems on production. The application gets stuck and stops receiving connections.
During this time, This warning appears on our logs:
 [default-akka.actor.default-dispatcher-11] WARN  s.can.server.HttpServerConnection - Configured registration timeout of 1 second expired, stopping
And many (~20K) connections are in "CLOSE_WAIT" status.

Richard Bradley

unread,
Jun 25, 2015, 8:59:41 AM6/25/15
to spray...@googlegroups.com
We had the same problem

There are some O/S params you can tune, but the only real fix is to use connection pooling at the client end (i.e. don't create a new connection for each request, but send multiple requests down the same connection).
You could also offer the API on multiple ports, and have the clients spread the load over multiple server ports.

There is a hard limit of how many new connections / second the O/S can accept per TCP 4-tuple of (client IP, client port, server IP, server port), caused by the TIME_WAIT part of the TCP.algorithm.

Hope this helps,


Rich
Reply all
Reply to author
Forward
0 new messages