[2.0.1-scala] Actor problems in production - webserver stops responding to some requests

486 views
Skip to first unread message

Jxtps

unread,
May 21, 2012, 6:18:34 PM5/21/12
to play-fr...@googlegroups.com
I'm running into some problems in production - after a while the webserver just stops responding to certain requests and I get weird timeouts ("play - Cannot invoke the action, eventually got an error: Thrown(akka.pattern.AskTimeoutException: Timed out)").
 
I've made some changes since then - it superficially looked like having a Async { Akka.future { // code that throws an exception } } might be causing it, so I fixed some such issues, but that would seem very weird (and testing it in dev mode in isolation works just fine) and the server effectively hung shortly after redeploying with those fixes as well though without the "play - Cannot invoke the action..." log entries - there's nothing in the log and it's very unclear what actually goes wrong.
 
Since the problems only appear in production, and I don't really want to run a broken site, this makes debugging a bit tricky.
 
The website relies heavily on external APIs that are called with simple http web requests (this is done in a scala library that was written prior to play2, so it's not using the WS stuff). These web requests have all sorts of issues with them - delayed responses, badly formatted returns leading to SAX XML exceptions, etc. Such is life on the internet and those APIs not under my control so I can't fix those issues at the source.
 
All such calls are wrapped in Async { Akka.future { blocks to handle their asynchronous nature, and AFAIK only such calls hang.
 
Reading https://github.com/playframework/Play20/wiki/AkkaCore and http://doc.akka.io/docs/akka/2.0.1/scala/dispatchers.html it seems like the default promises-dispatcher parallelism-factor is 1, which on my 2-core production machines sounds like it would lead to only 2 actor threads handling these calls. That sounds like it's much too low - the calls take 0% CPU, but block for several seconds, and with only 2 processing pipelines they would quickly get backlogged.
 
  Q: Is my reading of those docs correct? If so, why is the parallelism-factor set so low?
 
  Q: How can I see the actual number of actors/threads allocated to the promises-dispatcher? (or any other dispatcher)
 
  Q: Play1 had a very useful "status" command that gave insight into any such backlogging - how can I get similar insight into what's going on in a play2 application?
 
Reading http://doc.akka.io/docs/akka/2.0.1/scala/logging.html it seems like one can enable per-actor-and-message logging, but how do I do that for play's internal actors?
 
I tried play.akka.loglevel="DEBUG" but then the server hangs on startup and play.akka.loglevel="INFO" didn't give anything useful.
 
Thanks!

peter hausel

unread,
May 22, 2012, 10:11:09 AM5/22/12
to play-fr...@googlegroups.com


On Monday, May 21, 2012 6:18:34 PM UTC-4, Jxtps wrote:
I'm running into some problems in production - after a while the webserver just stops responding to certain requests and I get weird timeouts ("play - Cannot invoke the action, eventually got an error: Thrown(akka.pattern.AskTimeoutException: Timed out)").
 
I've made some changes since then - it superficially looked like having a Async { Akka.future { // code that throws an exception } } might be causing it, so I fixed some such issues, but that would seem very weird (and testing it in dev mode in isolation works just fine) and the server effectively hung shortly after redeploying with those fixes as well though without the "play - Cannot invoke the action..." log entries - there's nothing in the log and it's very unclear what actually goes wrong.
 
Since the problems only appear in production, and I don't really want to run a broken site, this makes debugging a bit tricky.
 
The website relies heavily on external APIs that are called with simple http web requests (this is done in a scala library that was written prior to play2, so it's not using the WS stuff). These web requests have all sorts of issues with them - delayed responses, badly formatted returns leading to SAX XML exceptions, etc. Such is life on the internet and those APIs not under my control so I can't fix those issues at the source.
 
All such calls are wrapped in Async { Akka.future { blocks to handle their asynchronous nature, and AFAIK only such calls hang.
 
Reading https://github.com/playframework/Play20/wiki/AkkaCore and http://doc.akka.io/docs/akka/2.0.1/scala/dispatchers.html it seems like the default promises-dispatcher parallelism-factor is 1, which on my 2-core production machines sounds like it would lead to only 2 actor threads handling these calls. That sounds like it's much too low - the calls take 0% CPU, but block for several seconds, and with only 2 processing pipelines they would quickly get backlogged.
 
Which play version are you on? What's the use case? Can you reliably replicate the issue?

 
  Q: Is my reading of those docs correct? If so, why is the parallelism-factor set so low?
 
I can not remember the reason off top of my head but you should be able to override every parameter (please see the akka doc http://doc.akka.io/docs/akka/2.0.1/general/configuration.html for more information). Also, on master, promise dispatching has been changed (promises are executed on a separated actor system now - this actor system also fully can be configured)


  Q: How can I see the actual number of actors/threads allocated to the promises-dispatcher? (or any other dispatcher)
 
https://github.com/playframework/Play20/blob/master/framework/src/play/src/main/resources/reference.conf tells you what we set as defaults which defaults you can override from your app (please note: this reference file is for 2.1 SNAPSHOT). Please see the akka documentation for more details.
 
  Q: Play1 had a very useful "status" command that gave insight into any such backlogging - how can I get similar insight into what's going on in a play2 application?
 
I would suggest to change the log level
 
Reading http://doc.akka.io/docs/akka/2.0.1/scala/logging.html it seems like one can enable per-actor-and-message logging, but how do I do that for play's internal actors?
 
I tried play.akka.loglevel="DEBUG" but then the server hangs on startup and play.akka.loglevel="INFO" didn't give anything useful.
 
I could not replicate this using a semi-complex play scala app but will try it with various sample apps.

Thanks,
Peter 
Thanks!

Jxtps

unread,
May 22, 2012, 2:54:39 PM5/22/12
to play-fr...@googlegroups.com
Play version: 2.0.1
 
Replicate: seemingly somewhat reliably. I've upped the parallelism-factor and that seems to have alleviated the problem. There are too many moving parts and the load is varying too much to be super-conclusive, but the hung-request problem appears to have gone away (the database is at 50% continuous CPU load, spiking to 80-90% at times, which then causes some queries to start taking several seconds, which then contributes to the requests getting hung - but they should still recover, and right now they seem to).
 
It seems like the play1 version is causing less CPU load on both the web box and the DB. Some minor changes in cache policies (it's behind nginx which is caching some parts) muddles things a bit though. This is where the "play status" command would have been very useful.
 
The play2 request object is very limited - in play1 you had access to the controller & method, which helps tremenduously when trying to add some in-process monitoring. Reading the source of https://github.com/playframework/Play20/blob/master/framework/src/play/src/main/scala/play/core/router/Router.scala and the in-project generated \target\scala-2.9.1\src_managed\main\routes_routing.scala there doesn't seem to be a way to get the HandlerDef out of the router (which AFAICT is the only thing that knows the controller + action combo) - it only gives out Option[Handler] (which is actually an Option[Action] in my case).
 
This makes it harder to do generic instrumentation - or am I missing something?
 
play1 had great thread-state reporting with the "play status" command - there's nothing similar for Akka actors & dispatchers!? (couldn't find anything on google)

Jxtps

unread,
May 22, 2012, 3:00:07 PM5/22/12
to play-fr...@googlegroups.com
Something like http://miniprofiler.com/ would be a godsend. It's by the stack overflow folks, who seem to generally know what they're doing...
 
Any community interest in such?

Sadache Aldrobi

unread,
May 22, 2012, 3:43:53 PM5/22/12
to play-fr...@googlegroups.com
On Tue, May 22, 2012 at 8:54 PM, Jxtps <jxtp...@gmail.com> wrote:
Play version: 2.0.1
 
Replicate: seemingly somewhat reliably. I've upped the parallelism-factor and that seems to have alleviated the problem. There are too many moving parts and the load is varying too much to be super-conclusive, but the hung-request problem appears to have gone away (the database is at 50% continuous CPU load, spiking to 80-90% at times, which then causes some queries to start taking several seconds, which then contributes to the requests getting hung - but they should still recover, and right now they seem to).
 
It seems like the play1 version is causing less CPU load on both the web box and the DB. Some minor changes in cache policies (it's behind nginx which is caching some parts) muddles things a bit though. This is where the "play status" command would have been very useful.

Is this comparaison for the same number of requests per second? Maybe the fact that Play2 can handle more rps puts more pressure on the db.
 
 
The play2 request object is very limited - in play1 you had access to the controller & method, which helps tremenduously when trying to add some in-process monitoring. Reading the source of https://github.com/playframework/Play20/blob/master/framework/src/play/src/main/scala/play/core/router/Router.scala and the in-project generated \target\scala-2.9.1\src_managed\main\routes_routing.scala there doesn't seem to be a way to get the HandlerDef out of the router (which AFAICT is the only thing that knows the controller + action combo) - it only gives out Option[Handler] (which is actually an Option[Action] in my case).
 
This makes it harder to do generic instrumentation - or am I missing something?
 
play1 had great thread-state reporting with the "play status" command - there's nothing similar for Akka actors & dispatchers!? (couldn't find anything on google)
 
 

On Tuesday, May 22, 2012 7:11:09 AM UTC-7, peter hausel wrote:


On Monday, May 21, 2012 6:18:34 PM UTC-4, Jxtps wrote:
I'm running into some problems in production - after a while the webserver just stops responding to certain requests and I get weird timeouts ("play - Cannot invoke the action, eventually got an error: Thrown(akka.pattern.AskTimeoutException: Timed out)").
 
I've made some changes since then - it superficially looked like having a Async { Akka.future { // code that throws an exception } } might be causing it, so I fixed some such issues, but that would seem very weird (and testing it in dev mode in isolation works just fine) and the server effectively hung shortly after redeploying with those fixes as well though without the "play - Cannot invoke the action..." log entries - there's nothing in the log and it's very unclear what actually goes wrong.
 
Since the problems only appear in production, and I don't really want to run a broken site, this makes debugging a bit tricky.
 
The website relies heavily on external APIs that are called with simple http web requests (this is done in a scala library that was written prior to play2, so it's not using the WS stuff). These web requests have all sorts of issues with them - delayed responses, badly formatted returns leading to SAX XML exceptions, etc. Such is life on the internet and those APIs not under my control so I can't fix those issues at the source.
 
All such calls are wrapped in Async { Akka.future { blocks to handle their asynchronous nature, and AFAIK only such calls hang.
 
Reading https://github.com/playframework/Play20/wiki/AkkaCore and http://doc.akka.io/docs/akka/2.0.1/scala/dispatchers.html it seems like the default promises-dispatcher parallelism-factor is 1, which on my 2-core production machines sounds like it would lead to only 2 actor threads handling these calls. That sounds like it's much too low - the calls take 0% CPU, but block for several seconds, and with only 2 processing pipelines they would quickly get backlogged.
 
Which play version are you on? What's the use case? Can you reliably replicate the issue?

 
  Q: Is my reading of those docs correct? If so, why is the parallelism-factor set so low?
 
I can not remember the reason off top of my head but you should be able to override every parameter (please see the akka doc http://doc.akka.io/docs/akka/2.0.1/general/configuration.html for more information). Also, on master, promise dispatching has been changed (promises are executed on a separated actor system now - this actor system also fully can be configured)


  Q: How can I see the actual number of actors/threads allocated to the promises-dispatcher? (or any other dispatcher)
 
https://github.com/playframework/Play20/blob/master/framework/src/play/src/main/resources/reference.conf tells you what we set as defaults which defaults you can override from your app (please note: this reference file is for 2.1 SNAPSHOT). Please see the akka documentation for more details.
 
  Q: Play1 had a very useful "status" command that gave insight into any such backlogging - how can I get similar insight into what's going on in a play2 application?
 
I would suggest to change the log level
 
Reading http://doc.akka.io/docs/akka/2.0.1/scala/logging.html it seems like one can enable per-actor-and-message logging, but how do I do that for play's internal actors?
 
I tried play.akka.loglevel="DEBUG" but then the server hangs on startup and play.akka.loglevel="INFO" didn't give anything useful.
 
I could not replicate this using a semi-complex play scala app but will try it with various sample apps.

Thanks,
Peter 
Thanks!

--
You received this message because you are subscribed to the Google Groups "play-framework" group.
To view this discussion on the web visit https://groups.google.com/d/msg/play-framework/-/vUGEBYEvR6wJ.

To post to this group, send email to play-fr...@googlegroups.com.
To unsubscribe from this group, send email to play-framewor...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/play-framework?hl=en.



--
www.sadekdrobi.com
ʎdoɹʇuǝ

Guillaume Bort

unread,
May 22, 2012, 4:05:32 PM5/22/12
to play-fr...@googlegroups.com
The problem is that since you are not using the built in async WS API,
your webservice calls are all blocking. And as you said yourself they
can be blocking for a very long time.

Using Akka.future is not magic and it doesn't solve the problem. By
doing this you will block the main Akka system thread-pool. Once all
threads are used and blocked waiting for a webservice response, the
whole app is blocked and won't accept new requests.

So yes, adding more threads will solve your problem. It is not the
perfect solution though, since each thread consume a lot of memory and
you can't extend the pool thread infinitely. The real solution is to
webservice calls asynchronously.
Guillaume Bort

Sadache Aldrobi

unread,
May 22, 2012, 4:20:02 PM5/22/12
to play-fr...@googlegroups.com
When using Akka for blocking calls, I would strongly recommend using a different dispatcher and allocate threads necessary for that system. Never use the main Play Akka dispatcher for blocking calls.
www.sadekdrobi.com
ʎdoɹʇuǝ

Jxtps

unread,
May 22, 2012, 8:06:11 PM5/22/12
to play-fr...@googlegroups.com
Thanks for the pointers guys. I hear you on using WS (async) instead of blocking calls - definitely want to get there, but one step at a time.
 
I fixed some other performance issues, so things seem to be ok now. Hopefully there aren't too many additional issues hiding behind the ones knocked down so far.

PANKAJ GUPTA

unread,
Oct 12, 2013, 4:06:49 AM10/12/13
to play-fr...@googlegroups.com
Hi Guys,
I am also having the same issue.

I am new to play framework and maintaining a developed play app.

I have a play 2.0 App which uses a third party API library(to access/create/manage Virtual Machines on cloud platform) to test a given cloud platform provider's cloud service.
The calls are blocking and takes a lot of time which varies from few seconds to minutes.

Problem definition (getting mainly in production environment)
I have a series of ajax call which are initiated by clicking a button on webpage.
First one is a GET call which initializes and saves all the test object in db and stores them in session, and gives response after completion.
After above calls serially 9 POST ajax calls are send to server one after another based on previous server response.
If an ajax calls returns 'true' then next test is executed i.e ajax call is send.
In every call multiple api calls are made which take a lot of time to test and stores the result in database.

Error: Sometimes, the server doesn't respond to a ajax call, by inspecting a ajax call in firebug in noticed that the ajax call is not completed and is still waiting for response from the server

In my app I haven't used { Akka.future { // code that throws an exception } }
I need all api call to be blocking.

Do I need to create a different dispatcher as Sadache said, or I need to add more threads as said by Guillaume Bort.If threads are need to be increased, can you please specify which argument i have to change in my application.conf's akka configuration.

I there any possibility that the below configuration is not working in production,Please advice me how can i check that. 

Also how can I enable logging to log every incoming request to server as we get in Rails application.

My application.conf's akka configuration is
------------------------------------------------------------------------------------------------------------------------------------------------------------------
play {

    akka {
        event-handlers = ["akka.event.slf4j.Slf4jEventHandler"]

        actor {

            deployment {

                /actions {
                    router = round-robin
                    nr-of-instances = 100
                }

                /promises {
                    router = round-robin
                    nr-of-instances = 100
                }

            }

            retrieveBodyParserTimeout = 10 second

            actions-dispatcher = {
                fork-join-executor {
                    parallelism-factor = 100
                    parallelism-max = 100
                }
            }

            promises-dispatcher = {
                fork-join-executor {
                    parallelism-factor = 100
                    parallelism-max = 100
                }
            }

            websockets-dispatcher = {
                fork-join-executor {
                    parallelism-factor = 1.0
                    parallelism-max = 24
                }
            }

            default-dispatcher = {
                fork-join-executor {
                    parallelism-factor = 100
                    parallelism-max = 100
                }
            }

        }

    }

}
------------------------------------------------------------------------------------------------------------------------------------------------------------------

Thanks for yous help.












----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Reply all
Reply to author
Forward
0 new messages