Monday morning I upgraded a production server to REE 2010.01 and Passenger 2.2.10. Typical behavior for this server is 7-10 requests per second, though only maybe 1 of those ends up in the Rails application.
Over the next couple of days, I experienced symptoms similar to a SYN flood attack, where I'd see 60-70 connections in TCP SYN_RECV state, with all of the available Apache children running, many of them stuck in W/"Sending Reply" state according to server-status. Fully stopping and restarting Apache would fix the problem for a period of time, but it'd come back within a couple of hours.
Last night I backed off to REE 2009.10 and Passenger 2.2.9 and the site has been fine since. I can't do further testing at the moment because I'm going out of town tomorrow and it's a production server. When I come back I'd like to narrow down if it's Passenger (which I suspect; Googling tells me that children stuck in W state is usually due to a problematic module) or REE.
Has anyone else seen similar behavior after upgrading to 2.2.10 or with REE 2010.01?
I experienced two Apache crashes after upgrading to 2.2.10 earlier this week. After downgrading back to 2.2.8 I've had no problems. This server is running ...
I can't confirm that 2.2.10 was the problem but it started after upgrading. I don't have enough apache-fu to diagnose the way you did so i'll be interested to know what you find out.
matte
On Feb 24, 9:04 am, Steve Madsen <st...@lightyearsoftware.com> wrote:
> Monday morning I upgraded a production server to REE 2010.01 and > Passenger 2.2.10. Typical behavior for this server is 7-10 requests > per second, though only maybe 1 of those ends up in the Rails > application.
> Over the next couple of days, I experienced symptoms similar to a SYN > flood attack, where I'd see 60-70 connections in TCP SYN_RECV state, > with all of the available Apache children running, many of them stuck > in W/"Sending Reply" state according to server-status. Fully stopping > and restarting Apache would fix the problem for a period of time, but > it'd come back within a couple of hours.
> Last night I backed off to REE 2009.10 and Passenger 2.2.9 and the > site has been fine since. I can't do further testing at the moment > because I'm going out of town tomorrow and it's a production server. > When I come back I'd like to narrow down if it's Passenger (which I > suspect; Googling tells me that children stuck in W state is usually > due to a problematic module) or REE.
> Has anyone else seen similar behavior after upgrading to 2.2.10 or > with REE 2010.01?
On Feb 24, 5:04 pm, Steve Madsen <st...@lightyearsoftware.com> wrote:
> Over the next couple of days, I experienced symptoms similar to a SYN > flood attack, where I'd see 60-70 connections in TCP SYN_RECV state, > with all of the available Apache children running, many of them stuck > in W/"Sending Reply" state according to server-status. Fully stopping > and restarting Apache would fix the problem for a period of time, but > it'd come back within a couple of hours.
> Has anyone else seen similar behavior after upgrading to 2.2.10 or > with REE 2010.01?
We're seeing almost exactly the same problem with Passenger 2.2.10. The only difference is the problems start immediately after restarting Apache - each request immediately gets stuck in the W (Sending Reply) state for a long time. passenger-status reports no or few active requests.
We are now running Passenger 2.2.9 again, which is working without any problems.
We're running Apache 2.2.3 on Red Hat Enterprise Linux 5.4 with Ruby 1.8.6p369. Our Passenger config is as follows:
Global config: PassengerMaxPoolSize 8 PassengerMaxInstancesPerApp 7 PassengerPoolIdleTime 0 RailsAppSpawnerIdleTime 1200
Virtual host (the only Passenger vhost): PassengerUseGlobalQueue on PassengerMaxRequests 30000
I can confirm seeing this behavior on our production box. I upgraded to 2.2.10 today after a mysterious Apache crash. Recalling that 2.2.10 fixed a file descriptor issue, I upgraded to it. After an hour, haproxy took the app server out of rotation. I saw that there were tons of httpd child processes while passenger-status itself showed no active connections. A restart of Apache didn't help.
I downgraded to 2.2.9 and things are running again.
My config:
Ruby 1.8.6-p111 (non-REE) Apache 2.2.3 Centos 5
Global config: PassengerUseGlobalQueue On PassengerMaxPoolSize 20 PassengerMaxInstancesPerApp 8 PassengerPoolIdleTime 0
I'm seeing the same problem too upgrading Passenger to 2.2.10 (REE is 2010.01). We run 3 servers on load-balancing were wondering why only 1 particular server which we commissioned later and so had 2.2.10 had lots of timeouts when restarting. We're using Passenger 2.2.9 on the other 2 servers.
I'll downgrade to 2.2.9 and report back but I suspect our timeout problems will be gone.
Cheers, Chu Yeow
On Mar 1, 9:58 pm, Phil Ross <phil.r...@gmail.com> wrote:
> On Feb 24, 5:04 pm, Steve Madsen <st...@lightyearsoftware.com> wrote:
> > Over the next couple of days, I experienced symptoms similar to a SYN > > flood attack, where I'd see 60-70 connections in TCP SYN_RECV state, > > with all of the available Apache children running, many of them stuck > > in W/"Sending Reply" state according to server-status. Fully stopping > > and restarting Apache would fix the problem for a period of time, but > > it'd come back within a couple of hours.
> > Has anyone else seen similar behavior after upgrading to 2.2.10 or > > with REE 2010.01?
> We're seeing almost exactly the same problem with Passenger 2.2.10. > The only difference is the problems start immediately after restarting > Apache - each request immediately gets stuck in the W (Sending Reply) > state for a long time. passenger-status reports no or few active > requests.
> We are now running Passenger 2.2.9 again, which is working without any > problems.
> We're running Apache 2.2.3 on Red Hat Enterprise Linux 5.4 with Ruby > 1.8.6p369. Our Passenger config is as follows:
On Tue, Mar 2, 2010 at 6:43 AM, chuyeow <chuy...@gmail.com> wrote: > I'm seeing the same problem too upgrading Passenger to 2.2.10 (REE is > 2010.01). We run 3 servers on load-balancing were wondering why only 1 > particular server which we commissioned later and so had 2.2.10 had > lots of timeouts when restarting. We're using Passenger 2.2.9 on the > other 2 servers.
> I'll downgrade to 2.2.9 and report back but I suspect our timeout > problems will be gone.
I guess the file descriptor fixes caused some regressions.
Can those who can reproduce the problem run 'passenger-status --show=backtraces' and post the output?
-- Phusion | The Computer Science Company
Web: http://www.phusion.nl/ E-mail: i...@phusion.nl Chamber of commerce no: 08173483 (The Netherlands)
On Mar 2, 12:44 pm, Scottie35 <sc...@sott.net> wrote:
> So far, so good!
Oops. Apache just bit the bullet. Back to 2.2.9!
From my Apache logs, looks like I got a ton of these:
[ pid=29278 file=ext/apache2/Hooks.cpp:727 time=2010-03-02 07:39:32.406 ]: Unexpected error in mod_passenger: Could not connect to the ApplicationPool server: Broken pipe (32) Backtrace: in 'Passenger::ApplicationPoolPtr Passenger::ApplicationPoolServer::connect()' (ApplicationPoolServer.h: 746) in 'int Hooks::handleRequest(request_rec*)' (Hooks.cpp:523)
On Wed, Mar 3, 2010 at 6:46 PM, Nash <n...@kabbara.us> wrote: > Any ETA on when the next gem will be released with this fix? We're > about to update our production gems, but waiting on this to be > corrected.
I want to release it as soon as a few more people confirm that the fix works.
-- Phusion | The Computer Science Company
Web: http://www.phusion.nl/ E-mail: i...@phusion.nl Chamber of commerce no: 08173483 (The Netherlands)
Looks like the patch is working for me as well. I'll be running some stress tests today, but the hanging problems I was seeing in the first few requests have gone away.
Thanks,
-Jon
On Mar 3, 12:56 pm, Hongli Lai <hon...@phusion.nl> wrote:
> On Wed, Mar 3, 2010 at 6:46 PM, Nash <n...@kabbara.us> wrote: > > Any ETA on when the next gem will be released with this fix? We're > > about to update our production gems, but waiting on this to be > > corrected.
> I want to release it as soon as a few more people confirm that the fix works.
> -- > Phusion | The Computer Science Company
> Web:http://www.phusion.nl/ > E-mail: i...@phusion.nl > Chamber of commerce no: 08173483 (The Netherlands)
> On Wed, Mar 3, 2010 at 6:46 PM, Nash <n...@kabbara.us> wrote: > > Any ETA on when the next gem will be released with this fix? We're > > about to update our production gems, but waiting on this to be > > corrected.
> I want to release it as soon as a few more people confirm that the fix works.
> -- > Phusion | The Computer Science Company
> Web:http://www.phusion.nl/ > E-mail: i...@phusion.nl > Chamber of commerce no: 08173483 (The Netherlands)