Frozen processes

35 views
Skip to first unread message

johnsbrn

unread,
Oct 3, 2008, 11:32:46 AM10/3/08
to Phusion Passenger Discussions
I am getting frozen processes on a regular basis on CentOS 4 64-bit
with REE and passenger 2.0.2, 2.0.3 and latest git hub. An strace
always show it blocked on read and sending a SIGABRT message only
occasionally kills the process and produces a stack trace. This is
causing a big headache for me because I never noticed it before we
went live. Please help, this is a huge problem right now. Also, just
had apache jump up to 75 processes and freeze up completely.

johnsbrn

unread,
Oct 3, 2008, 11:59:27 AM10/3/08
to Phusion Passenger Discussions
I've backed up to 2.0.1 and it's look ok so far, but not enough time
to know for sure. I'll report back, but there is definitely an issue
with the newer versions including git

johnsbrn

unread,
Oct 3, 2008, 1:44:05 PM10/3/08
to Phusion Passenger Discussions
2.0.1 still has issues. I'm having to watch this app constantly to
keep it alive, please help.

amos

unread,
Oct 3, 2008, 2:18:26 PM10/3/08
to Phusion Passenger Discussions
try using 'conservative' spawing method and non-enterprise ruby.

johnsbrn

unread,
Oct 3, 2008, 3:05:38 PM10/3/08
to Phusion Passenger Discussions
I'm not sure if I have all the correct gems in the standard ruby and
I've done all my testing on REE, so I'm hesitant to switch that right
now. I changed spawning to conservative, but now I have this

----------- General information -----------
max = 12
count = 14
active = 1
inactive = 13


?? Why is count > max

johnsbrn

unread,
Oct 3, 2008, 3:15:15 PM10/3/08
to Phusion Passenger Discussions
Ok, this is even more confusing

----------- General information -----------
max = 12
count = 14
active = 0
inactive = 14

----------- Applications -----------
/home/admin/mysite3/releases/20080810033939:
PID: 4784 Sessions: 0

/home/admin/mysite1/releases/20080810033925:
PID: 4549 Sessions: 0
PID: 4582 Sessions: 0
PID: 4651 Sessions: 0
PID: 4523 Sessions: 0
PID: 4678 Sessions: 0
PID: 4571 Sessions: 0

/home/admin/mysite2/releases/20080810033921:
PID: 4637 Sessions: 0
PID: 4751 Sessions: 0
PID: 4607 Sessions: 0
PID: 4686 Sessions: 0
PID: 4692 Sessions: 0
PID: 4728 Sessions: 0
PID: 4591 Sessions: 0

per application max is 4
I killed 4591 but it still shows up
passenger-memory-stats shows about 10 processes each for mysite1 and
mysite2 and none for mysite3

I'm still on 2.0.1, but will try moving back up to 2.0.3 and see if
that changes anything

johnsbrn

unread,
Oct 3, 2008, 4:37:00 PM10/3/08
to Phusion Passenger Discussions
Crashed again on 2.0.1 trying 2.0.3... I'm getting desperate, but I
can't change ruby right now because I can't afford for it to be down
more than a few seconds

johnsbrn

unread,
Oct 3, 2008, 5:09:40 PM10/3/08
to Phusion Passenger Discussions
2.0.3 is also having issues. Nearly ran out of memory as it started
way more processes that it was supposed to. If I can't get this fixed
by tonight, I have no choice but to dump passenger and go back to
Mongrel. I know some big sites are using, so I don't understand why
it's so unstable for me.

johnsbrn

unread,
Oct 3, 2008, 6:39:11 PM10/3/08
to Phusion Passenger Discussions
Ok, I now have standard ruby and conservative spawning. I hope this
works...

johnsbrn

unread,
Oct 4, 2008, 5:12:16 PM10/4/08
to Phusion Passenger Discussions
So far it seems more stable, but it hasn't been under load yet. One
thing I noticed is that occasionally I get a can't connect message on
my browser. It doesn't timeout, it comes back instantly with a message
saying it can't contact the server. Sometimes it seems to miss images
or javascript files too.

johnsbrn

unread,
Oct 6, 2008, 9:02:28 PM10/6/08
to Phusion Passenger Discussions
I think this is related to GC.start which is run by the fleximage
plugin. Is there an open issue with GC and REE?

Hongli Lai

unread,
Oct 7, 2008, 4:26:07 AM10/7/08
to phusion-...@googlegroups.com
johnsbrn wrote:
> I think this is related to GC.start which is run by the fleximage
> plugin. Is there an open issue with GC and REE?

There are some rumors about the REE garbage collector having an infinite
loop bug. It's not officially confirmed because so far we've been unable
to reproduce it, and nobody has been able to submit a test case in which
the problem is reproducible.

--
Phusion | The Computer Science Company

Web: http://www.phusion.nl/
E-mail: in...@phusion.nl
Chamber of commerce no: 08173483 (The Netherlands)

johnsbrn

unread,
Oct 7, 2008, 10:15:50 AM10/7/08
to Phusion Passenger Discussions
Well, I don't have a test case, but I do have plenty of these in the
error log prior to switching to standard Ruby

/releases/20080810033921/vendor/plugins/fleximage/lib/fleximage/
model.rb:354: [BUG] Segmentation fault

Line 354 is GC.start

Since switching I have none. The fleximage plugin is available here:
http://github.com/Squeegy/fleximage/tree/master

On Oct 7, 1:26 am, Hongli Lai <hon...@phusion.nl> wrote:
> johnsbrn wrote:
> > I think this is related to GC.start which is run by the fleximage
> > plugin. Is there an open issue with GC and REE?
>
> There are some rumors about the REE garbage collector having an infinite
> loop bug. It's not officially confirmed because so far we've been unable
> to reproduce it, and nobody has been able to submit a test case in which
> the problem is reproducible.
>
> --
> Phusion | The Computer Science Company
>
> Web:http://www.phusion.nl/
> E-mail: i...@phusion.nl

Hongli Lai

unread,
Oct 7, 2008, 11:12:57 AM10/7/08
to phusion-...@googlegroups.com
johnsbrn wrote:
> Well, I don't have a test case, but I do have plenty of these in the
> error log prior to switching to standard Ruby
>
> /releases/20080810033921/vendor/plugins/fleximage/lib/fleximage/
> model.rb:354: [BUG] Segmentation fault
>
> Line 354 is GC.start
>
> Since switching I have none. The fleximage plugin is available here:
> http://github.com/Squeegy/fleximage/tree/master\

I see that Fleximage uses RMagick.

Ruby Enterprise Edition is not necessarily binary compatible with gems
compiled for the system's Ruby. To be more specific, our observations so
far are as follows:
- On 32-bit Linux systems, Ruby Enterprise Edition works fine with gems
compiled for the system's Ruby, at least for RedHat and Debian-based
distros.
- On MacOS X, REE does *not* work with gems compiled for the system's
Ruby; it will result in a segmentation fault. So gems that provide
native extensions must be reinstalled for REE.

We don't know about 64-bit Linux, or FreeBSD. We do have a 32-bit
FreeBSD server, but we've been using REE on that server since day 1, and
haven't encountered a situation in which REE uses a native extension
compiled for the system's Ruby.

Could you try reinstalling RMagick for REE, and check whether Fleximage
still crashes?

--
Phusion | The Computer Science Company

Web: http://www.phusion.nl/
E-mail: in...@phusion.nl

johnsbrn

unread,
Oct 7, 2008, 12:01:56 PM10/7/08
to Phusion Passenger Discussions
I am nearly positive I did install rmagick for REE. I certainly did
not copy anything from the default ruby. Is there something special I
have to do besides /opt/ruby-enterprise-1.8.6-20080810/bin/ruby /opt/
ruby-enterprise-1.8.6-20080810/bin/gem install --no-rdoc --no-ri --no-
update-sources --backtrace rmagick
?
Unfortunately, this site is now in production so there is no way I can
experiment with it. Is there a way I can verify the version I was
using was built for REE?
> E-mail: i...@phusion.nl

Hongli Lai

unread,
Oct 7, 2008, 12:17:36 PM10/7/08
to phusion-...@googlegroups.com
johnsbrn wrote:
> I am nearly positive I did install rmagick for REE. I certainly did
> not copy anything from the default ruby. Is there something special I
> have to do besides /opt/ruby-enterprise-1.8.6-20080810/bin/ruby /opt/
> ruby-enterprise-1.8.6-20080810/bin/gem install --no-rdoc --no-ri --no-
> update-sources --backtrace rmagick

That command is correct. Then it seems there is a legit problem in your
case.

--
Phusion | The Computer Science Company

Web: http://www.phusion.nl/
E-mail: in...@phusion.nl

johnsbrn

unread,
Oct 7, 2008, 12:36:28 PM10/7/08
to Phusion Passenger Discussions
I just tested and I see the same problem on OS X

On Oct 7, 9:17 am, Hongli Lai <hon...@phusion.nl> wrote:
> johnsbrn wrote:
> > I am nearly positive I did install rmagick for REE. I certainly did
> > not copy anything from the default ruby. Is there something special I
> > have to do besides /opt/ruby-enterprise-1.8.6-20080810/bin/ruby /opt/
> > ruby-enterprise-1.8.6-20080810/bin/gem install --no-rdoc --no-ri --no-
> > update-sources --backtrace rmagick
>
> That command is correct. Then it seems there is a legit problem in your
> case.
>
> --
> Phusion | The Computer Science Company
>
> Web:http://www.phusion.nl/
> E-mail: i...@phusion.nl

amos

unread,
Oct 7, 2008, 4:04:19 PM10/7/08
to Phusion Passenger Discussions
I can reproduce the infinite loop by switching one server to REE and
pointing live traffic at it for about 10 minutes. Unfortunately,
because our app is so complicated, it's difficult to figure out when
the problem arises. We also use rmagick, but we do not resize on the
affected servers.

On Oct 7, 1:26 am, Hongli Lai <hon...@phusion.nl> wrote:
> johnsbrn wrote:
> > I think this is related to GC.start which is run by the fleximage
> > plugin. Is there an open issue with GC and REE?
>
> There are some rumors about the REE garbage collector having an infinite
> loop bug. It's not officially confirmed because so far we've been unable
> to reproduce it, and nobody has been able to submit a test case in which
> the problem is reproducible.
>
> --
> Phusion | The Computer Science Company
>
> Web:http://www.phusion.nl/
> E-mail: i...@phusion.nl

Hongli Lai

unread,
Oct 7, 2008, 4:43:30 PM10/7/08
to phusion-...@googlegroups.com
amos wrote:
> I can reproduce the infinite loop by switching one server to REE and
> pointing live traffic at it for about 10 minutes. Unfortunately,
> because our app is so complicated, it's difficult to figure out when
> the problem arises. We also use rmagick, but we do not resize on the
> affected servers.

Anybody not using RMagick experiencing similar problems?

--
Phusion | The Computer Science Company

Web: http://www.phusion.nl/
E-mail: in...@phusion.nl

johnsbrn

unread,
Oct 12, 2008, 12:51:30 PM10/12/08
to Phusion Passenger Discussions
Some preliminary testing on OS X is showing that this may not be an
issue with REE, but rather with smart spawning. I had trouble with
standard ruby and REE with smart spawning enabled. I switched to
standard ruby and conservative and my problems stopped. I then
switched to REE and conservative and it still appears to be working
ok. I'm hesitant to move back to REE on my production servers because
it was a nightmare last time, is there any advantage to running REE
over standard ruby without smart spawning?

On Oct 7, 1:43 pm, Hongli Lai <hon...@phusion.nl> wrote:
> amos wrote:
> > I can reproduce the infinite loop by switching one server to REE and
> > pointing live traffic at it for about 10 minutes. Unfortunately,
> > because our app is so complicated, it's difficult to figure out when
> > the problem arises. We also use rmagick, but we do not resize on the
> > affected servers.
>
> Anybody not using RMagick experiencing similar problems?
>
> --
> Phusion | The Computer Science Company
>
> Web:http://www.phusion.nl/
> E-mail: i...@phusion.nl

Hongli Lai

unread,
Oct 12, 2008, 1:21:09 PM10/12/08
to phusion-...@googlegroups.com
johnsbrn wrote:
> Some preliminary testing on OS X is showing that this may not be an
> issue with REE, but rather with smart spawning. I had trouble with
> standard ruby and REE with smart spawning enabled. I switched to
> standard ruby and conservative and my problems stopped. I then
> switched to REE and conservative and it still appears to be working
> ok. I'm hesitant to move back to REE on my production servers because
> it was a nightmare last time, is there any advantage to running REE
> over standard ruby without smart spawning?

Yes, improved performance because of the faster memory allocator. But in
your specific case I think it would be a better idea to stick with the
Ruby installation you already have.

--
Phusion | The Computer Science Company

Web: http://www.phusion.nl/
E-mail: in...@phusion.nl

johnsbrn

unread,
Nov 25, 2008, 2:20:55 PM11/25/08
to Phusion Passenger Discussions
This is definitely an issue with conservative / smart spawning. Using
REE or standard Ruby does not make any difference. I would really like
to be able to use smart spawning because the spawn time under
conservative spawning is just too slow for this app. I'm forced to set
an absurdly long PassengerPoolIdleTime so my instances rarely die. Any
idea why these problems might be occurring? Where to start looking?
One thing I can't remember if I tried or not is smart spawning with
standard Ruby, so I'll try that out and see if it's a problem with
smart spawning alone or a combination of smart spawning and REE.
Thanks.
> E-mail: i...@phusion.nl

johnsbrn

unread,
Nov 25, 2008, 2:52:17 PM11/25/08
to Phusion Passenger Discussions
Nope, it's smart spawning. Doesn't matter whether it's REE or standard
Java, freezes either way. What is it about smart spawning that would
cause this issue I wonder?

johnsbrn

unread,
Jan 24, 2009, 10:49:38 AM1/24/09
to Phusion Passenger Discussions
It looked like the issue may have been fixed in REE 20090113, but I
think it is just happening more infrequently. I found a frozen process
this morning running passenger 2.0.6 and REE 20090113 on Centos 4 64-
bit. I was really hoping this was the fix I have been looking for, but
it seems there is still an issue somewhere. Also, I still can't debug
the processes because sending SIGABRT doesn't kill them.
Reply all
Reply to author
Forward
0 new messages