PassengerMaxPoolSize exceeded; excess processes stuck shutting down

bry...@gmail.com

unread,

Dec 19, 2013, 6:14:53 PM12/19/13

to phusion-...@googlegroups.com

Recently while encountering database issues I observed far more Rails processes on a Passenger server than the configured PassengerMaxPoolSize. The processes in excess of PassengerMaxPoolSize were listed by passenger-status as "Shutting down..." which makes me wonder if I found a case where Passenger was not able to terminate processes properly.

Here is some of the passenger-status output captured at the time:

Version : 4.0.27
 Date : Tue Dec 17 23:09:44 -0600 2013
 Instance: 974
 ----------- General information -----------
 Max pool size : 5
 Processes : 5
 Requests in top-level queue : 0
 
 ----------- Application groups -----------
 /srv/ruby/rails-app/current#default:
 App root: /srv/ruby/rails-app/current
 Requests in queue: 25
 * PID: 8761 Sessions: 1 Processed: 100 Uptime: 6m 28s
 CPU: 1% Memory : 52M Last used: 2m 5s ago
 * PID: 8764 Sessions: 1 Processed: 31 Uptime: 6m 27s
 CPU: 0% Memory : 43M Last used: 5m 35s ago
 * PID: 8769 Sessions: 1 Processed: 38 Uptime: 6m 27s
 CPU: 0% Memory : 50M Last used: 4m 35s ago
 * PID: 8786 Sessions: 1 Processed: 104 Uptime: 5m 57s
 CPU: 2% Memory : 50M Last used: 1m 34s ago
 * PID: 8893 Sessions: 1 Processed: 27 Uptime: 4m 16s
 CPU: 0% Memory : 46M Last used: 2m 33s ago
 * PID: 1190 Sessions: 1 Processed: 301 Uptime: 2h 5m 56s
 CPU: 0% Memory : 55M Last used: 1h 4m 5s
 Shutting down...
 * PID: 1196 Sessions: 1 Processed: 362 Uptime: 2h 5m 56s
 CPU: 0% Memory : 55M Last used: 1h 0m 5s
 Shutting down...

(Then there were 18 more processes listed as "Shutting down..." which I've excluded for brevity for a total of 25 processes under passenger-status, whereas the PassengerMaxPoolSize is 5. Unsure if it's a coincidence that this matches the number of requests in the application queue. A `ps ax|grep ruby` at the time showed these 25 processes, plus an additional Passenger AppPreloader process.)

The Apache logs were full of application and Passenger errors... I did notice lines similar to the following repeated many times:

[ 2013-12-17 21:01:43.4349 2265/7ff935ed7700 Pool2/Implementation.cpp:849 ]: Could not spawn process for group /srv/ruby/rails-app/current#default: An error occured while starting up the preloader.
 in 'void Passenger::ApplicationPool2::SmartSpawner::handleErrorResponse(Passenger::ApplicationPool2::SmartSpawner::StartupDetails&)' (SmartSpawner.h:459)
 in 'std::string Passenger::ApplicationPool2::SmartSpawner::negotiatePreloaderStartup(Passenger::ApplicationPool2::SmartSpawner::StartupDetails&)' (SmartSpawner.h:570)
 in 'void Passenger::ApplicationPool2::SmartSpawner::startPreloader()' (SmartSpawner.h:210)
 in 'virtual Passenger::ApplicationPool2::ProcessPtr Passenger::ApplicationPool2::SmartSpawner::spawn(const Passenger::ApplicationPool2::Options&)' (SmartSpawner.h:756)
 in 'void Passenger::ApplicationPool2::Group::spawnThreadRealMain(const Passenger::ApplicationPool2::SpawnerPtr&, const Passenger::ApplicationPool2::Options&, unsigned int)' (Implementation.cpp:782)
[ 2013-12-17 21:01:43.4352 2265/7ff9377ad700 agents/HelperAgent/RequestHandler.h:1997 ]: [Client 68] Cannot checkout session.
Error page:

(Then an application backtrace where it was unable to connect to its database.)

To be clear, I absolutely do not expect Passenger to be able to work when it can't spawn processes due to infrastructure/application issues. But it would have been nice if it was able to terminate failed processes in order to not consume all memory... In this case the solution was to fix the database and reboot the Passenger server to clear out the excess processes. Prior to the reboot, restarting Passenger was attempted after the database was fixed, but only resulted in

[ 2013-12-17 21:38:20.6676 981/7f4894edd700 Pool2/Group.h:331 ]: Request queue is full. Returning an error

over and over in the logs until the reboot.

Is there a Passenger setting I'm missing that might weather this sort of situation better? "Don't break the database" is of course a good strategy too, but not a Passenger setting! :) At the moment I only have in addition to PassengerRoot and PassengerDefaultRuby:

PassengerMaxPoolSize 5
PassengerFriendlyErrorPages off

From a recent discussion I see that OOB GC can cause the process count to exceed PassengerMaxPoolSize, but this application does not (yet) do any OOB GC as it only recently started using Passenger 4.

Additional logs and full passenger-status output available on request. Thanks for any insight! Sorry if I've provided too much or too little information.

Hongli Lai

unread,

Dec 20, 2013, 4:18:26 AM12/20/13

to phusion-passenger

Passenger is unable to shut down your process because those processes
are still busy handling requests (see the "Sessions: 1" indicator; a
non-zero value means that there are open requests). Normally Passenger
waits until all requests for a process are finished, and then it tells
the process to exit. Passenger then gives that process at most 1
minute time to exit, after which it will forcefully kill the process.
But since your processes never finish their requests, the shutdown
time limit never kicks in.

There are two other time limits you can use:
- You can use an application-level time limiter, e.g. Rack::Timeout
(if you're using Ruby). It aborts the request by using the 'timeout'
library. In extreme cases where your app is stuck in native
extensions, this library will not be able to abort your request.
- You can use the PassengerMaxRequestTime feature. It aborts the
process by using the SIGKILL signal so it's always capable of aborting
the process. It's an Enterprise feature though.

The OOB GC causing process count to exceed MaxPoolSize thing has been
fixed a few released ago.

> --
> You received this message because you are subscribed to the Google Groups
> "Phusion Passenger Discussions" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to phusion-passen...@googlegroups.com.
> To post to this group, send email to phusion-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/phusion-passenger.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/phusion-passenger/dafefb89-a84f-4692-8031-d4e5e1e61ee9%40googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

--
Phusion | Ruby & Rails deployment, scaling and tuning solutions

Web: http://www.phusion.nl/
E-mail: in...@phusion.nl
Chamber of commerce no: 08173483 (The Netherlands)

bry...@gmail.com

unread,

Dec 20, 2013, 12:31:36 PM12/20/13

to phusion-...@googlegroups.com

Thank you for the prompt response and clear explanation!

In our case anything using the 'timeout' library might not help, because for legacy (aka bad) reasons the application is full of

begin
  # do something
rescue Exception => e
  # log and continue, sometimes retry up to a couple times
end

which is very bad indeed as it thwarts Timeout::timeout (especially when retry is used) among other things. The correct thing to do would be to refactor internal exception hierarchies to extend StandardError instead of Exception and then rescue => e but for this application it may be simpler to upgrade to Passenger Enterprise. I will suggest this to my co-workers.

Reply all

Reply to author

Forward