Add connection state to RequestHeader

74 views
Skip to first unread message

Paul Draper

unread,
Sep 13, 2016, 11:43:10 PM9/13/16
to Play framework dev
Filed #5973 and cchantep suggested the mailing list.

This is the problem: When a Play server slows down, it builds up a backlog of requests (unlimited in size, due to Netty accepting them in a dedicated thread). The clients come, fill buffer with an HTTP request, time out, leave, and perhaps even retry. Play continues to process requests for old clients who have closed the TCP connection and don't care, slowing down the response times for newly arriving clients. This creates an inherently unstable system: a brief period of high load causes sustained client timeouts for a long time after.

I applied this 30 line patch to a fork of Play 2.3 to add RequestHeader#isClientConnected https://github.com/lucidsoftware/playframework/pull/1/commits/a7a7b6781971e16680439cf7b736a2d12d1ed14f
I then have a filter that checks this value for GETs and HEADs and skips processing.

Is there a better approach than this patch?

Christian Schmitt

unread,
Sep 14, 2016, 3:50:01 AM9/14/16
to Play framework dev
Netty 4.x has a IdleStateHandler which could be used to kill idle connections. At the moment we only allow setting the Read/Write site via -Dplay.server.http.idleTimeout but I guess allowing to set the Read site would make it more flexible. And kill requests if there is no read in a certain time. I'm not sure if that may help, but I guess that may also work. Actually akka-http also has a Request timeout which just kills of the Request if it can't be procced in a certain amount of time.

Dominik Dorn

unread,
Sep 14, 2016, 5:31:08 AM9/14/16
to Play framework dev
hmm.. your change introduces mutability to a previously immutable class, which is probably not going to be accepted. 

It might be better to track the state of a request (with its request.id) in another object, e.g. (simplified)

object ConnectionTracker {

 private val stateMap: mutable.Map[Long, Boolean] = new ConcurrentHashMap[Long, Boolean] // didn't find a concurrent set implementation

 def createNewRequest(requestId: Long) : Unit = stateMap.put(requestId, true)

 def setConnectionClosed(requestId: Long) = stateMap.remove(requestId)
 
def setRequestCompleted(requestId: Long) = stateMap.remove(requestId)

 def isConnected(requestId : Long) : Boolean = stateMap.contains(requestId)


but maybe that is obsolete with the IdleStateHandler Christian mentioned? 

Cheers,
Dominik

James Roper

unread,
Sep 14, 2016, 6:12:30 AM9/14/16
to Paul Draper, Play framework dev

I think a better approach, rather than reacting to the server not responding, is to implement request limits. It's better to proactively reject requests before you become overloaded than to let yourself become overloaded and let every request be impacted. Currently we don't support this. There are two limits we could implement, one is a connection limit, when the number of connections exceeds a certain limit, we can stop accepting them. This is actually really quite straight forward, we're already using reactive streams to accept connections, instead of doing a foreach to accept them, we could do a mapAsyncUnordered with the parallelism set to the connection limit, and return a future that is redeemed when the connection is closed.

The second limit would be an outstanding request limit. This can be implemented with a shared AtomicLong in the Netty request handler that gets incremented and decremented for each request, and once the limit is exceeded, start returning 503 errors.


--
You received this message because you are subscribed to the Google Groups "Play framework dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to play-framework-dev+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dominik Dorn

unread,
Sep 14, 2016, 6:36:26 AM9/14/16
to James Roper, Paul Draper, Play framework dev
Hmm.. this does not prevent the work of the request from hitting the server once the client disconnects.
Pauls idea was to track the state of the connection and right before the server would actually start working on processing the request, check if the connection is still alive.
If it isn't, just drop the request. This way we could accept all the requests but just handle those where the client is really waiting for a response. If I read correct, the max connection / max request proposal would allow to easily do a DoS.... 

James Roper

unread,
Sep 14, 2016, 6:59:16 AM9/14/16
to Dominik Dorn, Paul Draper, Play framework dev
They can both serve to mitigate DoS actually.

With the outstanding requests limit, as the number of outstanding requests builds up, this is an indication that the server is unable to process them as quickly as they are coming in.  Rather than letting clients time out, just reject them, this gives the server an opportunity to recover at a reasonable load rather than allowing that list of requests to continually build up and only go down if the clients time out.  It actually helps prevent DoS, because in a DoS attack, the malicious clients aren't going to time out (because timing out would defeat the purpose), this will mean instead of the queue of malicious requests growing continually while legitimate requests never make the front of the queue because the client times out before they can, all requests will be blocked equally, meaning some legitimate and some malicious requests get through.

Capping connections can help to mitigate DoS since clients with existing connections will be given priority over a tirade of malicious connections.  The other thing is when the number of connections hits the limit, you can start raising errors, and get someone to respond to them.

--
You received this message because you are subscribed to the Google Groups "Play framework dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to play-framework-dev+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
James Roper
Software Engineer

Lightbend – Build reactive apps!
Twitter: @jroper

Christian Schmitt

unread,
Sep 14, 2016, 7:07:15 AM9/14/16
to Play framework dev, dom...@dominikdorn.com, pauld...@gmail.com
Woudln't it be better to have a Request Timeout than? Which just cancels the Socket when the Request -> Response never hits. I mean a malicious client that will send meaningless packages to get around the client time out could still fill all the available slots (if we implement something like you said). Something like: http://doc.akka.io/docs/akka/2.4.10/scala/http/common/timeouts.html#Request_timeout I guess most servers do this. It would not be the same as a Read / Write timeout, rather than a timeout no matter if a read / write happens.
I guess paired with rate limiting the server, it would have the "best" option to defend against overloading?

Paul Draper

unread,
Sep 21, 2016, 6:50:57 PM9/21/16
to Play framework dev, dom...@dominikdorn.com, pauld...@gmail.com
I think a better approach, rather than reacting to the server not responding, is to implement request limits.

Oh, nice. I figured that had a lower chance of being palatable. It was discussed in 2014 https://groups.google.com/forum/#!topic/play-framework/2ZAkswaTmH0
 
one is a connection limit, when the number of connections exceeds a certain limit, we can stop accepting them

That would be nice. Problem is, it won't work for Linux, because of how terrible its networking is.

#!/usr/bin/env scala
import java.io._
import java.nio.charset.StandardCharsets
import java.net._
object Test {
  def main(args: Array[String]) = {
    val server = new ServerSocket(9000, 1) // backlog is 1
    try {
      for (i <- 1 to 10) {
        val client = new Socket(InetAddress.getLocalHost(), 9000)
        try {
          val output = new OutputStreamWriter(client.getOutputStream, StandardCharsets.US_ASCII)
          output.write(s"GET /client$i HTTP/1.0\r\n")
          output.write("\r\n")
          output.flush()
          println(s"Sent $i")
        } finally {
          client.close()
        }
      }
      val accepts = (1 to 10).map { i =>
        new Thread(new Runnable() {
          def run() = {
            val client = server.accept()
            try {
              val input = new BufferedReader(new InputStreamReader(client.getInputStream, StandardCharsets.US_ASCII))
              println(s"Read $i: ${input.readLine()}")
            } finally {
              client.close()
            }
          }
        })
      }
      accepts.foreach(_.start())
      accepts.foreach(_.join())
    } finally {
      server.close()
    }
  }
}

^ You'll eventually receive all 9 requests on Linux, despite having a socket backlog of 1 and not having accepted any of them.

second limit would be an outstanding request limit

I've done this too. It took some hacking to get 503 responses to not depend on the (probably saturated) default Play thread pool.

It's better to proactively reject requests before you become overloaded than to let yourself become overloaded and let every request be impacted.

Either way, requests will be affected. The main difference is between client-specified vs server-specified. 

Connection/request limits is the server saying "I won't be able to respond to you soon enough. 503"...even if the client would have been willing to wait.
Request timeouts are the client saying "You aren't responding to me fast enough. Bye"...even if the server would have been willing to wait

I've done both. Either one would be a welcome official addition.
Message has been deleted

James Roper

unread,
Sep 21, 2016, 9:37:30 PM9/21/16
to Paul Draper, Play framework dev, Dominik Dorn
On 22 September 2016 at 08:55, Paul Draper <pauld...@gmail.com> wrote:
hmm.. your change introduces mutability to a previously immutable class, which is probably not going to be accepted. 

TCP is a stateful protocol. And I wish to access that state. Fits the problem domain.

Not at all.  There's nothing about HTTP that ties it to TCP, RFC 2616 makes this explicit:

   HTTP communication usually takes place over TCP/IP connections. The
   default port is TCP 80 [19], but other ports can be used. This does
   not preclude HTTP from being implemented on top of any other protocol
   on the Internet, or on other networks.

And this isn't just some hypothetical provision, Google's QUIC protocol puts HTTP on top of UDP, and if you're using Chrome your browser is likely using that to talk to Google's servers.  In that case, there is no connection state there to report on.  We've also investigated using Akka remoting as a mechanism for proxying HTTP requests, in that case HTTP is sitting on top of stateless actor messaging, not TCP, in fact proxying in general offers no way to report on the state of the end user connection.

HTTP itself defines HTTP messages as immutable messages, so modelling them as such makes a lot of sense.  Handling TCP state is best done at a lower level.

To unsubscribe from this group and stop receiving emails from it, send an email to play-framework-dev+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
James Roper
Software Engineer

Lightbend – Build reactive apps!
Twitter: @jroper

--
You received this message because you are subscribed to the Google Groups "Play framework dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to play-framework-dev+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Paul Draper

unread,
Sep 26, 2016, 5:15:15 PM9/26/16
to Play framework dev, pauld...@gmail.com, dom...@dominikdorn.com
Good point about HTTP being stateless.

At least in theory. The HTTP/1.1 Connection header, HTTP/1.1 pipelining, and HTTP/2 multiplexing are built right into the HTTP RFCs.

These are meaningless concepts with anything but a stateful connection. So....HTTP is mostly stateless.

---

Having used both connection limits and connection status in production for quite some time, here's another difference:

Connection limits are arbitrary and vary by operation. I/O-bound responses (say, querying an upstream server) don't really need a limit. CPU-bound responses (say, image/video processing) do.
It's undesirable to take down a server by exceeding a request limit with large numbers of harmless I/O-only requests. I've manually whitelisted/blacklisted endpoints, but that makes it easy to miss things.

Connection state has very simple criteria: if the client doesn't care (probably because we're too slow), don't waste further time generating a response.

---

I think I could reduce the arbitrariness of the first. I could wrap the Play default thread pool and maintain a count of queued tasks.
As long as that stays below a certain limit, I can accept a new request.
That way, async operations don't count against my limit.
To unsubscribe from this group and stop receiving emails from it, send an email to play-framework-dev+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
James Roper
Software Engineer

Lightbend – Build reactive apps!
Twitter: @jroper

--
You received this message because you are subscribed to the Google Groups "Play framework dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to play-framework-dev+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages