Service restart failed after deploy task

niristotle okram

unread,

May 18, 2015, 3:14:52 PM5/18/15

to capis...@googlegroups.com

Versions:

Ruby 2.1.1
Capistrano 2
Rake / Rails / etc

Platform:

Working on.... RHEL 6
Deploying to... RHEL 6

A part of the Deploy.rb:

#before "deploy", "deploy:stop_app"

#after "deploy", "deploy:start_app"

after "deploy", "deploy:restart_app"

namespace :deploy do

task :update_code, :roles => :web, :except => { :no_release => true } do

on_rollback { puts "DO NOT WANT TO ROLL BACK?" }

strategy.deploy!

finalize_update

end

task :stop_app, :roles => :web do

run "sudo /etc/init.d/xyz stop", :shell => :bash

end

task :start_app, :roles => :web do

run "sudo /etc/init.d/xyz start", :shell => :bash

end

task :restart_app, :roles => :web do

run "sudo /etc/init.d/xyz restart", :shell => :bash

end

I have the parameter in the 'deploy.rb',

set :user, 'a_user'

Q: Which user performs the task to restart the service (xyz) after the deployment of app (xyz)? I am getting the errors saying the xyz.pid doesn't exist, when it actually does. This is a part of the shell script while stopping the service.

A part of the /etc/init.d/xyz

case "$1" in
start)
        printf "%-50s" "Starting $DAEMON_NAME..."
        cd $DIR
        [ -d $LOGPATH ] || mkdir $LOGPATH
  [ -f $LOGFILE ] || su $DAEMON_USER -c 'touch $LOGFILE'
        PID=`$PYTHON $DAEMON $DAEMON_OPTS > $LOGFILE  2>&1 & echo $!`
        #echo "Saving PID" $PID " to " $PIDFILE
        if [ -z $PID ]; then
            printf "%s\n" "Fail"
        else
            echo $PID > $PIDFILE
            printf "%s\n" "Ok"
        fi
;;
status)
        printf "%-50s" "Checking $DAEMON_NAME..."
        if [ -f $PIDFILE ]; then
            PID=`cat $PIDFILE`
            if [ -z "`ps axf | grep ${PID} | grep -v grep`" ]; then
                printf "%s\n" "Process dead but pidfile exists"
            else
                echo "Running"
            fi
        else
            printf "%s\n" "Service not running"
        fi
;;
stop)
        printf "%-50s" "Stopping $DAEMONNAME"
            PID=`cat $PIDFILE`
            cd $DIR
        if [ -f $PIDFILE ]; then
            kill -HUP $PID
            printf "%s\n" "Ok"
            rm -f $PIDFILE
        else
            printf "%s\n" "pidfile not found"
        fi
;;

restart)
        $0 stop
        $0 start
;;

*)
        echo "Usage: $0 {status|start|stop|restart}"
        exit 1
esac

Capistrano log

* executing `deploy:restart_app'

* executing multiple commands in parallel

-> "else" :: "sudo /etc/init.d/xyz restart"

servers: ["server1", "server2", "server3", "server4"]

[server1] executing command

[server2] executing command

[server3] executing command

[server4] executing command

** [out :: server1] Stopping

** [out :: server1] cat: /var/run/xyz.pid: No such file or directory

** [out :: server1] pidfile not found

** [out :: server1] Starting xyz...

** [out :: server2] Stopping

** [out :: server2] cat: /var/run/xyz.pid: No such file or directory

** [out :: server2] pidfile not found

** [out :: server2] Starting xyz...

** [out :: server2] Ok

** [out :: server1] Ok

** [out :: server3] Stopping

** [out :: server3] cat: /var/run/xyz.pid: No such file or directory

** [out :: server3] pidfile not found

** [out :: server4] Stopping

** [out :: server4] Ok

** [out :: server3] Starting xyz...

** [out :: server3] Ok

** [out :: server4] Starting xyz...

** [out :: server4] Ok

command finished in 659ms

Finished: SUCCESS

I can cat the file as the deploy user just fine.

Lee Hambley

unread,

May 18, 2015, 3:24:50 PM5/18/15

to Capistrano

You wrote that the /etc/init.d/xyz is done by "sudo" so the deploy user apparently has access to password-less sudo (at least for some actions), it would appear that the file is not visible to `root`. Which I don't believe or expect.

You included a part of the /etc/init.d/xyz, but didn't include the full thing for some reason, so I can't see what the value of $PIDFILE should be in this case (please, adhere to the list guidelines and paste long files in an external service, and link them), nor state where you got the template.

I also don't understand the logic behind setting shell: 'bash' on the run() lines that interface with the init script.

Your task:

task :stop_app, :roles => :web do

run "sudo /etc/init.d/xyz stop", :shell => :bash

end

I might suggest you extend that (or make a similar one, "debug_initd_stuff") that does something like:

task :debug_initd_stuff, :roles => :web do

run "sudo whoam"

run "sudo ls -l /etc/init.d"

run "sudo ls -l /var/run"

end

You might also want to run the init.d script through shellcheck.net, since there are quite a few violations and bad practices already in sight there, shellcheck might help you iron some of them out. (That said, honestly the problem is probably something much simpler.)

Lee Hambley

http://lee.hambley.name/

+49 (0) 170 298 5667

--
You received this message because you are subscribed to the Google Groups "Capistrano" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capistrano+...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/capistrano/56a2a2dd-fd26-4b14-a2da-0d7af37f8354%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

niristotle okram

unread,

May 18, 2015, 8:26:49 PM5/18/15

to capis...@googlegroups.com

hi Lee,

here is the full /etc/init.d/ script http://pastebin.com/02G5tpgH

So, i placed a task to stop the app, then deploy & then start the service/app. The task have the below commands to check

whoami -----> ** [out :: server4] root

ll /var/run/ ---> this shows the xyz.pid file

So i see the service stops just fine.

** [out :: server1] Stopping

** [out :: server1] Ok

And the service also starts just fins

** [out :: server1] Starting xyz...

** [out :: server1] Ok

But on checking manually -- "service xyz status", i get this " Process dead but pidfile exist". I can stop and start the service just fine manually.

Lee Hambley

unread,

May 19, 2015, 3:10:50 AM5/19/15

to Capistrano

Sorry, I can't see anything wrong with it. :-\

Lee Hambley

http://lee.hambley.name/

+49 (0) 170 298 5667

To view this discussion on the web, visit https://groups.google.com/d/msgid/capistrano/d47d5020-1915-4194-be85-b72e157b0c23%40googlegroups.com.

niristotle okram

unread,

May 19, 2015, 4:23:31 PM5/19/15

to capis...@googlegroups.com

I apologize, i made a mistake in reporting the issue. :(

The original report about " xyz.pid doesn't exist, when it actually does." is not an ERROR. Its working as design. The service was manually stopped before trying to stop again. And that is why it threw that error. The issue here is, When the capistrano starts the service as a part of the task "deploy:restart_app" after the deploy, the service doesn't start up fine. Checking the status "service xyz status" after the deploy returns "Process dead but pid exists".

This behaviors of service start failure is only seen post the cap deploy and cannot be reproduced manually.

Lee Hambley

unread,

May 19, 2015, 4:28:32 PM5/19/15

to Capistrano

It almost certainly has something to do with your process failing to daemonize properly. Ruby processes struggle too, check http://stackoverflow.com/a/688448/119669 it has to do with parts of the process that is forked (your python daemon in this case) still inheriting some resources which are attached to the Capistrano session.

This unfortunately falls outside stuff I can help you with reasonably or remotely. You *might* have some success learning enough strace to see your process, and how it behaves when Cap disconnects.

Lee Hambley

http://lee.hambley.name/

+49 (0) 170 298 5667

To view this discussion on the web, visit https://groups.google.com/d/msgid/capistrano/bdf4409b-4f59-42f2-be66-2bb4f895dbfe%40googlegroups.com.

niristotle okram

unread,

May 20, 2015, 7:43:25 PM5/20/15

to capis...@googlegroups.com

hey Lee,

just to put an end on this thread:

the cause was not the cap2, per se. It was due to the process sprawn by cap being killed when the cap exits its task. We had to modify the init script to fix this issue. The 'nohup' is used to keep the daemon in the bg even after the cap exist the ssh session. Below is the modified portion of the script.

case "$1" in

start)

#checking to see if the process is already running, if it is, display a message and exit

if [ -n "`ps aux |grep /opt/mount1/oss/current/src/main.py | grep -v grep`" ]; then

echo "Service is already running, try stopping it first with 'service oss stop'"

exit

fi

printf "%-50s" "Starting $DAEMON_NAME..."

cd $DIR

[ -d $LOGPATH ] || mkdir $LOGPATH

[ -f $LOGFILE ] || su $DAEMON_USER -c 'touch $LOGFILE'

nohup $PYTHON $DAEMON $DAEMON_OPTS > $LOGFILE 2>&1 &

echo $! > $PIDFILE

sleep 5

;;

thanks

Okram

niristotle okram

unread,

May 29, 2015, 7:34:10 PM5/29/15

to capis...@googlegroups.com

A quick question, if i am allowed to. i am 4 nodes and if one of them is offline, it appears that the entire job fails. Is this by design? Should i put in timeout values to move on?

I thought the commands are executed parallely

Deploying the ABC application via Capistrano

ffi-yajl/json_gem is deprecated, these monkeypatches will be dropped shortly

* executing `staging'

ffi-yajl/json_gem is deprecated, these monkeypatches will be dropped shortly

triggering start callbacks for `deploy'

* executing `multistage:ensure'

* executing `deploy'

triggering before callbacks for `deploy'

* executing `deploy:stop_app'

* executing multiple commands in parallel

servers: ["node1", "node2", "node3", "node4"]

connection failed for: node2 (Errno::ETIMEDOUT: Connection timed out - connect(2) for "node2" port 22)