[2.3 scala] How to throttle / queue requests

Martin Grotzke

unread,

Oct 21, 2014, 3:14:55 PM10/21/14

to play-fr...@googlegroups.com

Hi,

in our async/non-blocking app (part of a microservice architecture) we have the situation that sometimes we get several 1000 requests within a couple of seconds from another service that lead to OOME gc overhead limit exceeded.

I'd like not to fail those requests but limit the number of concurrent requests to e.g. 100 req/sec.

To achieve this I'm thinking about the following solution:

Use an Essential filter that counts the concurrent requests.
If active request count > threshold then add a promise + nextFilter function to a queue and return the promise.future.
When a request is finished it takes the first promise + nextFilter from the queue and completes the promise with the nextFilter result (perhaps run in a future, not sure).

(Because I'm on vacation I can't try this but can only work in theory)

What do you think about it?
Are there other / better solutions?

I'd like to solve it within the application to be able to restrict it based on the request (e.g. user agent or route), and I'd like to keep it as simple as possible (transparent to the app logic, no "big" tools like e.g. hystrix).

Cheers,
Martin

Martin Grotzke

unread,

Oct 22, 2014, 2:19:53 AM10/22/14

to play-fr...@googlegroups.com

Hi James,

I'm especially interested in your thoughts about this problem / solution, so I'd be really happy if you'd find the time to answer.

TIA && cheers,
Martin

Megazord

unread,

Oct 22, 2014, 1:07:02 PM10/22/14

to play-fr...@googlegroups.com

If you don't want to touch your application, I think that handling throttling in nginx could be easier:

http://nginx.org/en/docs/http/ngx_http_limit_req_module.html

HTH

Will Sargent

unread,

Oct 22, 2014, 1:31:03 PM10/22/14

to play-fr...@googlegroups.com

Hi Martin,

You've got it right. This is an application backpressure problem: you need a non-blocking bounded work queue, and then when you reach the "high water mark" you can start sending back 429 results.

Will Sargent

Consultant, Professional Services

Typesafe, the company behind Play Framework, Akka and Scala

--
You received this message because you are subscribed to the Google Groups "play-framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to play-framewor...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Martin Grotzke

unread,

Oct 22, 2014, 6:47:13 PM10/22/14

to play-fr...@googlegroups.com

Hi,

thanks for your feedback! I now have put together a solution that does
what I had in mind, plus that the number of requests to queue can be
configured (so that requests that would grow the queue over that limit
are rejected with 429). So far it seems to be working, at least the test
is green :-)

You can see the solution here:
https://github.com/inoio/play-requests-limiter/blob/master/app/io/ino/play/ConcurrentRequestsLimiter.scala
(the spec:
https://github.com/inoio/play-requests-limiter/blob/master/test/io/ino/play/FunSpec.scala)

Cheers,
Martin

On 10/22/2014 07:30 PM, Will Sargent wrote:
> Hi Martin,
>
> You've got it right. This is an application backpressure problem: you
> need a non-blocking bounded work queue, and then when you reach the
> "high water mark" you can start sending back 429 results.
>
> Will Sargent
> Consultant, Professional Services

> Typesafe <http://typesafe.com>, the company behind Play Framework
> <http://www.playframework.com>, Akka <http://akka.io> and Scala
> <http://www.scala-lang.org/>

>
> On Tue, Oct 21, 2014 at 12:14 PM, Martin Grotzke

> <martin....@googlemail.com <mailto:martin....@googlemail.com>>

> <mailto:play-framewor...@googlegroups.com>.

> For more options, visit https://groups.google.com/d/optout.
>
>

> --
> You received this message because you are subscribed to the Google
> Groups "play-framework" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to play-framewor...@googlegroups.com

> <mailto:play-framewor...@googlegroups.com>.

signature.asc

James Roper

unread,

Oct 22, 2014, 9:00:00 PM10/22/14

to play-framework

Hi Martin,

Looks fine to me - there's a few race conditions in your code that could allow more than the maximum number of active requests to be active - if many requests come in at the same time for example, or if a request comes in at the same time as another request completes - but it's unlikely to have a significant impact as long as you don't expect the active request limit to be a hard limit. If you wanted hard limits, you could send all requests off to an actor, and have the actor manage it - actors update their own state atomically, eliminating any potential race conditions.

Cheers,

James

--

James Roper
Software Engineer

Typesafe – Build reactive apps!
Twitter: @jroper

Martin Grotzke

unread,

Oct 23, 2014, 6:48:34 PM10/23/14

to play-fr...@googlegroups.com

Hi James,

thanks for your feedback! You're right, I don't expect the active request limit to be a hard limit, that's the cost of this very simple solution. But in fact my assumption is that there won't be many requests (e.g. > 200) hitting the active request count check before the first of them increases the counter. I hope that the cpu scheduler is on my side to prevent this - perhaps I should try harder to create such a situation in the test.

What do you think about synchronizing the counter check/modification vs the actor solution?

Cheers,
Martin

To unsubscribe from this group and stop receiving emails from it, send an email to play-framewor...@googlegroups.com.

James Roper

unread,

Oct 23, 2014, 8:21:24 PM10/23/14

to play-framework

On Fri, Oct 24, 2014 at 9:48 AM, Martin Grotzke <martin....@googlemail.com> wrote:

Hi James,

thanks for your feedback! You're right, I don't expect the active request limit to be a hard limit, that's the cost of this very simple solution. But in fact my assumption is that there won't be many requests (e.g. > 200) hitting the active request count check before the first of them increases the counter. I hope that the cpu scheduler is on my side to prevent this - perhaps I should try harder to create such a situation in the test.

What do you think about synchronizing the counter check/modification vs the actor solution?

Don't synchronize - it will kill you under high load. As soon as a monitor gets contended, performance goes down by an order of magnitude. Actors don't suffer from this problem.

Martin Grotzke

unread,

Oct 24, 2014, 3:17:30 PM10/24/14

to play-fr...@googlegroups.com

Ok, thanks. What would you say is high load and what would it mean to get killed? Do you know any benchmarks that can shed some light on this?

Cheers,
Martin

James Roper

unread,

Oct 26, 2014, 8:49:47 PM10/26/14

to play-framework

What you really should do is read "Java Concurrency in Practice" by Brian Goetz.

Short of that, when you use synchronize, two things happen. One is that the JVM needs to sync its memory to RAM to ensure that the current CPU that this is executing on can see all the most recent writes to memory. The JVM does a lot of optimisations where fields can be stored/accessed in registers, or just from CPU cache, but synchronize causes it to not use any of those caches. The problem here is that you're probably only interested in seeing the most recent writes of two fields, the queue and the count, but you force the entire world that that thread views to be needlessly synchronized back with RAM. This can be expensive.

The other thing that happens is that a queue is created for all threads waiting on that monitor. Now, if there's no contention (ie, you never have more than one thing going into the synchronized block at once) then the JVM can actually optimize that check away into just a single CAS. But as soon as there's contention, then the JVM has to switch to creating a queue for threads to access that block, and suddenly, rather than the throughput of that block of code hitting a ceiling and staying flat, the throughput actually starts going down, ie at load X, it may be handling 20K req/s, and a higher load, it starts only handling 18K req/s. And as contention increases, this throughput will often continue going down and down. Now, if increases in load lead to decreased throughput, then you end up with a spiral increase, where there end up being more and more and more requests currently trying to be handled, and less getting processed, and this starts straining other resources, like memory, which means the GC has to do more work, and performance goes down more... and you get the point. Think this is unlikely in your application? The fact that you believe you need to have rate limiting in there is evidence that this may well happen to your application, you are expecting high loads, high enough that you need to throttle requests. You want to make sure, more than anything else, that your code that does the throttling is not vulnerable to these sorts of issues where throughput goes down as contention increases. By using only lockless structures, you help to ensure that throughput remains constant as contention increases.

What are the exact numbers? I can't tell you, it really depends on your app, the hardware, etc.

Reply all

Reply to author

Forward