Passenger/Apache Deadlock

27 views
Skip to first unread message

calm...@gmail.com

unread,
May 9, 2008, 2:17:16 PM5/9/08
to Phusion Passenger Discussions
I'm testing passenger 1.0.5 on Ubuntu 8.04 with Apache 2.2.8.

Running with these settings:

LoadModule passenger_module /var/lib/gems/1.8/gems/passenger-1.0.5/ext/
apache2/mod_passenger.so
PassengerRoot /var/lib/gems/1.8/gems/passenger-1.0.5
RailsRuby /usr/bin/ruby1.8
RailsEnv production
RailsMaxPoolSize 70
RailsPoolIdleTime 300

And:
ServerLimit 512
MaxClients 512

This is a 16GB box with 4 cores. It's behind a load balancer with
quite a number of other systems (they all run Mongrel.)
The server does around 20 requests per second, and so far I haven't
been able to keep it running for more than about 8 hours.
Most recently, I restarted it at 12AM last night and it ran fine until
around 4AM when apache started a slow ramp up to MaxClients. That ramp
up accelerated starting around 6:50 (maybe even exactly.) By about
7:04 it was adding 20 - 40 new processes per minute. It really hit the
wall at 7:12. Once it hits MaxClients and RailsMaxPoolSize I haven't
seen it free any up. Currently that system has been pulled from our
load balancer because it was too slow. It's been out for 90 minutes or
so and every apache process is stuck in "W" Sending Reply.
The rails processes never time out. Apache doesn't free up any of it's
processes.

Strace on the apache children shows:
Process 31139 attached - interrupt to quit
read(10,

And lsof:
apache2 31139 www-data 10u unix 0xffff810172dfb600
12202033 socket

Strace on rails:
Process 25070 attached - interrupt to quit
read(5,

And lsof:
ruby1.8 25070 ilike 5u unix 0xffff8103bd1c0dc0 11792040
socket

Rails also shows established connections to our mysql and memcached
servers.

I wrote a little script to pick out the most interesting parts of
passenger-memory-stats and some other metrics. Here's the current
output:

Fri May 9 11:08:34 PDT 2008
### Processes: 513
### Total private dirty RSS: 234.80 MB
### Processes: 71
### Total private dirty RSS: 10206.32 MB
Apache Server Status:
CPULoad: .0731585
Uptime: 39927
ReqPerSec: 4.87988
BytesPerSec: 163222
BytesPerReq: 33447.9
BusyWorkers: 512
IdleWorkers: 0

total used free shared buffers
cached
Mem: 16512988 14486708 2026280 0 244408
2559632
-/+ buffers/cache: 11682668 4830320
Swap: 9968292 0 9968292

That apache request per second number is slowly decreasing as the
average goes down since that system is now out of rotation. A restart
fixes this problem, but restarting several times per day isn't a very
workable solution.

So far I love the promise of passenger so I'd be really happy if
anyone has a clue about how to fix this.

Thanks!

Hongli Lai

unread,
May 9, 2008, 3:01:21 PM5/9/08
to phusion-...@googlegroups.com

Hi.

I have some potential issues in mind that could be the cause of this.
All those issues have been fixed in the development version (git
repository). Could you give the development version a try?

Yeah I know, it sounds a bit awkward to ask someone with a large server
to try a development version. :) But we've made great progress in
reducing memory usage and increasing stability in the development
version, and we're quite confident in that it's at least as stable as,
if not more than, version 1.0.5.

You can download the development version here:
http://github.com/FooBarWidget/passenger/tarball/master
Please extract the tarball, the run
'bin/passenger-install-apache2-module' inside.

We're eager to hear your findings on this.

With kind regards,
Hongli Lai
--
Phusion | The Computer Science Company

Web: http://www.phusion.nl/
E-mail: in...@phusion.nl
Chamber of commerce no: 08173483 (The Netherlands)

calm...@gmail.com

unread,
May 9, 2008, 4:38:57 PM5/9/08
to Phusion Passenger Discussions


On May 9, 12:01 pm, Hongli Lai <hon...@phusion.nl> wrote:
> Hi.
>
> I have some potential issues in mind that could be the cause of this.
> All those issues have been fixed in the development version (git
> repository). Could you give the development version a try?
>
> Yeah I know, it sounds a bit awkward to ask someone with a large server
> to try a development version. :) But we've made great progress in
> reducing memory usage and increasing stability in the development
> version, and we're quite confident in that it's at least as stable as,
> if not more than, version 1.0.5.
>
> You can download the development version here:http://github.com/FooBarWidget/passenger/tarball/master
> Please extract the tarball, the run
> 'bin/passenger-install-apache2-module' inside.
>
> We're eager to hear your findings on this.
>
> With kind regards,
> Hongli Lai

Thanks for the quick reply. It's no problem to test development
versions, this host is in a pool with 20 servers behind a load
balancer. If it starts performing poorly the load balancer just pulls
it out of rotation, so the user impact is limited, and I know by far
the best test is doing this with a real world load.

I extracted that and ran 'rake package', installed the resulting gem,
then ran passenger-install-apache2-module.
I'm putting it back into to production now and I'll let you know how
it goes.

Since it takes several hours for the issue to show up, it could be a
while before I know if it was fixed.

Thanks!

Hongli Lai

unread,
May 13, 2008, 5:36:17 AM5/13/08
to Phusion Passenger Discussions
On May 9, 10:38 pm, "k...@plek.org" <calmk...@gmail.com> wrote:
> Thanks for the quick reply. It's no problem to test development
> versions, this host is in a pool with 20 servers behind a  load
> balancer. If it starts performing poorly the load balancer just pulls
> it out of rotation, so the user impact is limited, and I know by far
> the best test is doing this with a real world load.
>
> I extracted that and ran 'rake package', installed the resulting gem,
> then ran passenger-install-apache2-module.
> I'm putting it back into to production now and I'll let you know how
> it goes.
>
> Since it takes several hours for the issue to show up, it could be a
> while before I know if it was fixed.
>
> Thanks!

Hi kelp. Any luck with the development version?

Travis Cole

unread,
May 13, 2008, 7:42:53 PM5/13/08
to phusion-...@googlegroups.com
Sadly no. I tried the development version on Friday and ran it for several hours. It showed the same problems. To make sure it wasn't an issue with some other part of my system I switched over to a mongrel + apache mod_proxy_balance config and it's been running solid for 24 hours now. So it looks like it is a passenger issue.

Anything else I can try? 

Hongli Lai

unread,
May 14, 2008, 8:24:30 PM5/14/08
to phusion-...@googlegroups.com
Travis Cole wrote:
> Sadly no. I tried the development version on Friday and ran it for
> several hours. It showed the same problems. To make sure it wasn't an
> issue with some other part of my system I switched over to a mongrel +
> apache mod_proxy_balance config and it's been running solid for 24 hours
> now. So it looks like it is a passenger issue.
>
> Anything else I can try?

I've added a debugging mechanism to the latest development version in
the git repository. You can now inspect Passenger's internal application
pool state by reading the file /tmp/passenger_status.*.fifo. Could you
try the latest version, and post the contents of that file when your web
server locks up?

Regards,

Travis Cole

unread,
May 19, 2008, 8:03:08 PM5/19/08
to phusion-...@googlegroups.com
On Wed, May 14, 2008 at 5:24 PM, Hongli Lai <hon...@phusion.nl> wrote:

I've added a debugging mechanism to the latest development version in
the git repository. You can now inspect Passenger's internal application
pool state by reading the file /tmp/passenger_status.*.fifo. Could you
try the latest version, and post the contents of that file when your web
server locks up?

I'm trying this with the lastest code from this morning. I should have something to report back by tomorrow. 

Ninh Bui

unread,
May 19, 2008, 8:31:08 PM5/19/08
to phusion-...@googlegroups.com
Hi Travis,

I couldn't help but notice "ilike" in your processes: is this perhaps because you're responsible for keeping www.ilike.com up and running? :) If so, could we perhaps talk about the possibilities of including your situation as a case study for our talk at Railsconf? We're eager on finding out if Passenger is able to hold its own when put in charge of keeping a large rails app such as iLike up and running.

Cheers,
Ninh

Travis Cole

unread,
May 20, 2008, 8:35:32 PM5/20/08
to phusion-...@googlegroups.com
On Mon, May 19, 2008 at 5:31 PM, Ninh Bui <ni...@phusion.nl> wrote:
Hi Travis,

I couldn't help but notice "ilike" in your processes: is this perhaps because you're responsible for keeping www.ilike.com up and running? :) If so, could we perhaps talk about the possibilities of including your situation as a case study for our talk at Railsconf? We're eager on finding out if Passenger is able to hold its own when put in charge of keeping a large rails app such as iLike up and running.

Cheers,
Ninh

Yeah I'm on the Ops team at iLike. I'm currently testing Passenger on one server in a pool of many. So far it looks like this most recent code fixes the deadlock issues we saw before. We've been running nearly 24 hours now without any problems. So far it's looking very good. Total memory use is much lower than our mongrel servers since Passenger expires idle Rails processes, thus avoiding the slow leak we see on all our other rails systems.

I'll want to run it for a good while longer before pushing to more servers.

Feel free to email me privately to discuss what ever you need for the case study. kelp at ilike-inc.com gets to me also.

Thanks!

Travis Cole

unread,
May 21, 2008, 6:04:37 PM5/21/08
to phusion-...@googlegroups.com
On Wed, May 14, 2008 at 5:24 PM, Hongli Lai <hon...@phusion.nl> wrote:

Travis Cole wrote:
> Sadly no. I tried the development version on Friday and ran it for
> several hours. It showed the same problems. To make sure it wasn't an
> issue with some other part of my system I switched over to a mongrel +
> apache mod_proxy_balance config and it's been running solid for 24 hours
> now. So it looks like it is a passenger issue.
>
> Anything else I can try?

I've added a debugging mechanism to the latest development version in
the git repository. You can now inspect Passenger's internal application
pool state by reading the file /tmp/passenger_status.*.fifo. Could you
try the latest version, and post the contents of that file when your web
server locks up?

Ok this latest build (from the 19th) is much much more solid. I've been running it for almost 48 hours.

So far I just have one puzzling thing.

I have this process sitting around:

ilike    17468  0.0  0.7 545140 119356 ?       S    May20   0:15 Rails: /home/ilike/production.ilike.com/web/root 

All the rest of the Rails processes are no more than 2 or 3 hours old. Is there supposed to be one that never expires?
I have the timeout set to 150 seconds.

Output from that debug fifo:

kelp@app89:/tmp$ sudo cat passenger_status.5376.fifo 
----------- General information -----------
max      = 70
count    = 32
active   = 5
inactive = 27

----------- Applications -----------
  PID: 4339      Sessions: 0
  PID: 4291      Sessions: 0
  PID: 4271      Sessions: 0
  PID: 4441      Sessions: 0
  PID: 4259      Sessions: 0
  PID: 4424      Sessions: 0
  PID: 4443      Sessions: 0
  PID: 4435      Sessions: 0
  PID: 4345      Sessions: 0
  PID: 4255      Sessions: 0
  PID: 4417      Sessions: 0
  PID: 4308      Sessions: 0
  PID: 4431      Sessions: 0
  PID: 4428      Sessions: 0
  PID: 4392      Sessions: 0
  PID: 1080      Sessions: 0
  PID: 4413      Sessions: 0
  PID: 4285      Sessions: 0
  PID: 4438      Sessions: 0
  PID: 4398      Sessions: 0
  PID: 4379      Sessions: 0
  PID: 4273      Sessions: 0
  PID: 4269      Sessions: 0
  PID: 4333      Sessions: 0
  PID: 4261      Sessions: 0
  PID: 4267      Sessions: 0
  PID: 4281      Sessions: 0
  PID: 17468     Sessions: 8
  PID: 4265      Sessions: 1
  PID: 4275      Sessions: 1
  PID: 4277      Sessions: 1
  PID: 1147      Sessions: 1

In general it seems our Rails processes last at most 3 hours.
 

Hongli Lai

unread,
May 21, 2008, 7:14:31 PM5/21/08
to phusion-...@googlegroups.com
Travis Cole wrote:
> Ok this latest build (from the 19th) is much much more solid. I've been
> running it for almost 48 hours.
>
> So far I just have one puzzling thing.
>
> I have this process sitting around:
>
> ilike 17468 0.0 0.7 545140 119356 ? S May20 0:15 Rails:
> /home/ilike/production.ilike.com/web/root
> <http://production.ilike.com/web/root>
>
> All the rest of the Rails processes are no more than 2 or 3 hours old.
> Is there supposed to be one that never expires?

That's not normal. It would seem that Apache is holding onto a Rails
application, without ever releasing the connection. Normally the
connection is released after an idle timeout, but the fact that that
doesn't happen indicates that this might be a bug in Apache.

Other than that, everything seems fine.

Reply all
Reply to author
Forward
0 new messages