Slow rolling restart

77 views
Skip to first unread message

daMirda

unread,
May 4, 2016, 4:36:50 AM5/4/16
to Phusion Passenger Discussions
I am running Passenger Enterprise 5.0.23 on CentOS7 server and lately we are facing many times "Resisting deployment error!"
What I tried first is to increase timeout (passenger_start_timeout), but that doesn't really solve anything and error still appears from time to time.

I tried to strace process to see what really happens and I found one block repeating over and over again:
===========================
select(17, [0 16], NULL, NULL, NULL)    = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=20649, si_uid=1000} ---
write(4, "!", 1)                        = 1
--- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_TKILL, si_pid=11872, si_uid=1000} ---
rt_sigreturn()                          = 1
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
write(2, "[ 2016-04-27 13:05:34.3980 11872"..., 45802) = 45802
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
write(2, "[ 2016-04-27 13:05:34.3990 11872"..., 283) = 283
sched_yield()                           = 0
select(17, [0 16], NULL, NULL, NULL)    = ? ERESTARTNOHAND (To be restarted if no handler)
. . .
===========================
Seems like passenger is trying to acquire some handler (socket or file), but it can't, so it retries over and over again until timeout is reached.

When everything works fine, it looks like this:
===========================
select(17, [0 16], NULL, NULL, NULL)    = 1 (in [0])
futex(0xed76a4, FUTEX_WAIT_BITSET_PRIVATE, 5, {32760110, 107706373}, ffffffff) = -1 ETIMEDOUT (Conne
ction timed out)
futex(0xed7718, FUTEX_WAKE_PRIVATE, 1)  = 0
===========================

I can't find these errors in any log.
What exactly is passenger trying to do or any ideas where to look for the cause of the problem?

Thanks!

Daniel Knoppel

unread,
May 4, 2016, 5:47:09 AM5/4/16
to Phusion Passenger Discussions
I would advise upgrading to the latest Passenger (5.0.28) first to exclude any potential known issues. Also check if your server isn't running out of resources (CPU / memory / file handles (ulimit)). 

A deployment error means your app didn't successfully spawn, and that must be mentioned somewhere in the logs (assuming the default loglevel 3). It's not clear what you did exactly and what is and isn't in your logs, so if the above doesn't help please send an email to the enterprise support desk (sup...@phusionpassenger.com) with your PHU- license number, a reference to this thread, and your logfiles.

- Daniel

daMirda

unread,
May 4, 2016, 6:03:53 AM5/4/16
to Phusion Passenger Discussions
I will upgrade it today, it is planned.
This does happen under heavier load, but still there is enough of RAM and there are no errors regarding file handles.

Deployment error is dispatched also if application returns nothing (no output) within timeout and that is exactly what happens.
You can find only error about not being able to start worker - "it did not write a startup response in time." - but that doesn't explain what went wrong while initializing worker.

https://github.com/phusion/passenger/wiki/Debugging-application-startup-problems - "Application startup freeze" describes it.

How can I get more information from preloader, or where would those errors end up? Still trying to figure out what happens (the loop in execution that I mentioned).

I will contact Enterprise support as well, just thought that maybe someone can help me out here.

Daniel Knoppel

unread,
May 4, 2016, 7:10:50 AM5/4/16
to Phusion Passenger Discussions
On Wednesday, May 4, 2016 at 12:03:53 PM UTC+2, daMirda wrote:
I will upgrade it today, it is planned.
This does happen under heavier load, but still there is enough of RAM and there are no errors regarding file handles.

And CPU? This is especially suspect when you get start timeouts. Maybe it's the 100% CPU error that was fixed in 5.0.24, or maybe it's just too many instances for the server to handle at full load.

- Daniel

daMirda

unread,
May 4, 2016, 7:14:22 AM5/4/16
to Phusion Passenger Discussions
Yes, CPU is higher (because of the load on the server), but not 100% and not because of the passenger restart.

Passenger (or ruby, to be more precise) is waiting (and retrying) for something. I can't find what is it.

Daniel Knoppel

unread,
May 4, 2016, 7:35:40 AM5/4/16
to Phusion Passenger Discussions
On Wednesday, May 4, 2016 at 1:14:22 PM UTC+2, daMirda wrote:
Yes, CPU is higher (because of the load on the server), but not 100% and not because of the passenger restart.

Passenger (or ruby, to be more precise) is waiting (and retrying) for something. I can't find what is it.

If you find the <process id> of an application process that is very slow at spawning (e.g. `ps -aux | grep ruby` and look for something like: "ruby /path/passenger/src/helper-scripts/rack-preloader.rb") you can ask Passenger to print backtraces of what that application is doing using the command: `gdb -batch -ex "attach <process id>" -ex "bt" 2>&1`
Reply all
Reply to author
Forward
0 new messages