Service restart failed after deploy task

94 views
Skip to first unread message

niristotle okram

unread,
May 18, 2015, 3:14:52 PM5/18/15
to capis...@googlegroups.com
Versions:
  • Ruby 2.1.1
  • Capistrano 2 
  • Rake / Rails / etc
Platform:
  • Working on.... RHEL 6
  • Deploying to... RHEL 6

A part of the Deploy.rb:

#before "deploy", "deploy:stop_app"

 #after "deploy", "deploy:start_app"

   after "deploy", "deploy:restart_app"

   namespace :deploy do

    task :update_code, :roles => :web, :except => { :no_release => true } do

      on_rollback { puts "DO NOT WANT TO ROLL BACK?" }

      strategy.deploy!

      finalize_update

    end

 

    task :stop_app, :roles => :web do

      run "sudo /etc/init.d/xyz stop", :shell => :bash

    end

 

    task :start_app, :roles => :web do

      run "sudo /etc/init.d/xyz start", :shell => :bash

    end

 

 

    task :restart_app, :roles => :web do

      run "sudo /etc/init.d/xyz restart", :shell => :bash

    end

  end










I have the parameter in the 'deploy.rb', 

set :user, 'a_user'

Q: Which user performs the task to restart the service (xyz) after the deployment of app (xyz)? I am getting the errors saying the xyz.pid doesn't exist, when it actually does. This is a part of the shell script while stopping the service.  


A part of the /etc/init.d/xyz

case "$1" in
start)
        printf "%-50s" "Starting $DAEMON_NAME..."
        cd $DIR
        [ -d $LOGPATH ] || mkdir $LOGPATH
  [ -f $LOGFILE ] || su $DAEMON_USER -c 'touch $LOGFILE'
        PID=`$PYTHON $DAEMON $DAEMON_OPTS > $LOGFILE  2>&1 & echo $!`
        #echo "Saving PID" $PID " to " $PIDFILE
        if [ -z $PID ]; then
            printf "%s\n" "Fail"
        else
            echo $PID > $PIDFILE
            printf "%s\n" "Ok"
        fi
;;
status)
        printf "%-50s" "Checking $DAEMON_NAME..."
        if [ -f $PIDFILE ]; then
            PID=`cat $PIDFILE`
            if [ -z "`ps axf | grep ${PID} | grep -v grep`" ]; then
                printf "%s\n" "Process dead but pidfile exists"
            else
                echo "Running"
            fi
        else
            printf "%s\n" "Service not running"
        fi
;;
stop)
        printf "%-50s" "Stopping $DAEMONNAME"
            PID=`cat $PIDFILE`
            cd $DIR
        if [ -f $PIDFILE ]; then
            kill -HUP $PID
            printf "%s\n" "Ok"
            rm -f $PIDFILE
        else
            printf "%s\n" "pidfile not found"
        fi
;;

restart)
        $0 stop
        $0 start
;;

*)
        echo "Usage: $0 {status|start|stop|restart}"
        exit 1
esac


Capistrano log 

  * executing `deploy:restart_app'

  * executing multiple commands in parallel

    -> "else" :: "sudo /etc/init.d/xyz restart"

    -> "else" :: "sudo /etc/init.d/xyz restart"

    -> "else" :: "sudo /etc/init.d/xyz restart"

    -> "else" :: "sudo /etc/init.d/xyz restart"

    servers: ["server1", "server2", "server3", "server4"]

    [server1] executing command

    [server2] executing command

    [server3] executing command

    [server4] executing command

 ** [out :: server1] Stopping

 ** [out :: server1] cat: /var/run/xyz.pid: No such file or directory

 ** [out :: server1] pidfile not found

 ** [out :: server1] Starting xyz...

 ** [out :: server2] Stopping

 ** [out :: server2] cat: /var/run/xyz.pid: No such file or directory

 ** [out :: server2] pidfile not found

 ** [out :: server2] Starting xyz...

 ** [out :: server2] Ok

 ** [out :: server1] Ok

 ** [out :: server3] Stopping

 ** [out :: server3] cat: /var/run/xyz.pid: No such file or directory

 ** [out :: server3] pidfile not found

 ** [out :: server4] Stopping

 ** [out :: server4] Ok

 ** [out :: server3] Starting xyz...

 ** [out :: server3] Ok

 ** [out :: server4] Starting xyz...

 ** [out :: server4] Ok

    command finished in 659ms

Finished: SUCCESS

 


I can cat the file as the deploy user just fine. 




Lee Hambley

unread,
May 18, 2015, 3:24:50 PM5/18/15
to Capistrano
You wrote that the /etc/init.d/xyz is done by "sudo" so the deploy user apparently has access to password-less sudo (at least for some actions), it would appear that the file is not visible to `root`. Which I don't believe or expect.

You included a part of the /etc/init.d/xyz, but didn't include the full thing for some reason, so I can't see what the value of $PIDFILE should be in this case (please, adhere to the list guidelines and paste long files in an external service, and link them), nor state where you got the template.

I also don't understand the logic behind setting shell: 'bash' on the run() lines that interface with the init script.

Your task:

task :stop_app, :roles => :web do
run "sudo /etc/init.d/xyz stop", :shell => :bash
end

​I might suggest you extend that (or make a similar one, "debug_initd_stuff") that does something like:

task :debug_initd_stuff, :roles => :web do
run "sudo whoam"
run "sudo ls -l /etc/init.d"
run "sudo ls -l /var/run"
end​

​You might also want to run the init.d script through shellcheck.net, since there are quite a few violations and bad practices already in sight there, shellcheck might help you iron some of them out. (That said, honestly the problem is probably something much simpler.)​

--
You received this message because you are subscribed to the Google Groups "Capistrano" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capistrano+...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/capistrano/56a2a2dd-fd26-4b14-a2da-0d7af37f8354%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

niristotle okram

unread,
May 18, 2015, 8:26:49 PM5/18/15
to capis...@googlegroups.com
hi Lee, 

here is the full /etc/init.d/ script  http://pastebin.com/02G5tpgH


So, i placed a task to stop the app, then deploy & then start the service/app. The task have the below commands to check

whoami    ----->   ** [out :: server4] root
ll /var/run/  ---> this shows the xyz.pid file 

So i see the service stops just fine. 

 ** [out :: server1] Stopping

 ** [out :: server1] Ok



And the service also starts just fins


** [out :: server1] Starting xyz...

 ** [out :: server1] Ok



But on checking manually -- "service xyz status", i get this " Process dead but pidfile exist". I can stop and start the service just fine manually




Lee Hambley

unread,
May 19, 2015, 3:10:50 AM5/19/15
to Capistrano
Sorry, I can't see anything wrong with it. :-\

niristotle okram

unread,
May 19, 2015, 4:23:31 PM5/19/15
to capis...@googlegroups.com
I apologize, i made a mistake in reporting the issue. :(

The original report about " xyz.pid doesn't exist, when it actually does." is not an ERROR. Its working as design. The service was manually stopped before trying to stop again. And that is why it threw that error. The issue here is, When the capistrano starts the service as a part of the task "deploy:restart_app" after the deploy, the service doesn't start up fine. Checking the status "service xyz status" after the deploy returns "Process dead but pid exists". 

This behaviors of service start failure is only seen post the cap deploy and cannot be reproduced manually.   

Lee Hambley

unread,
May 19, 2015, 4:28:32 PM5/19/15
to Capistrano
It almost certainly has something to do with your process failing to daemonize properly. Ruby processes struggle too, check http://stackoverflow.com/a/688448/119669 it has to do with parts of the process that is forked (your python daemon in this case) still inheriting some resources which are attached to the Capistrano session.

This unfortunately falls outside stuff I can help you with reasonably or remotely. You *might* have some success learning enough strace to see your process, and how it behaves when Cap disconnects. 

niristotle okram

unread,
May 20, 2015, 7:43:25 PM5/20/15
to capis...@googlegroups.com
hey Lee,
just to put an end on this thread: 
the cause was not the cap2, per se. It was due to the process sprawn by cap being killed when the cap exits its task. We had to modify the init script to fix this issue.  The 'nohup' is used to keep the daemon in the bg even after the cap exist the ssh session. Below is the modified portion of the script. 


case "$1" in
start)
    #checking to see if the process is already running, if it is, display a message and exit
    if [ -n "`ps aux |grep  /opt/mount1/oss/current/src/main.py | grep -v grep`" ]; then
      echo "Service is already running, try stopping it first with 'service oss stop'"
      exit
    fi

printf "%-50s" "Starting $DAEMON_NAME..."
cd $DIR
[ -d $LOGPATH ] || mkdir $LOGPATH
  [ -f $LOGFILE ] || su $DAEMON_USER -c 'touch $LOGFILE'
nohup $PYTHON $DAEMON $DAEMON_OPTS > $LOGFILE  2>&1 &
echo $! > $PIDFILE
    sleep 5
;;

thanks
Okram

niristotle okram

unread,
May 29, 2015, 7:34:10 PM5/29/15
to capis...@googlegroups.com

A quick question, if i am allowed to. i am 4 nodes and if one of them is offline, it appears that the entire job fails. Is this by design? Should i put in timeout values to move on? 
I thought the commands are executed parallely



Deploying the ABC application via Capistrano

ffi-yajl/json_gem is deprecated, these monkeypatches will be dropped shortly

  * executing `staging'

ffi-yajl/json_gem is deprecated, these monkeypatches will be dropped shortly

    triggering start callbacks for `deploy'

  * executing `multistage:ensure'

  * executing `deploy'

    triggering before callbacks for `deploy'

  * executing `deploy:stop_app'

 * executing multiple commands in parallel

       servers: ["node1", "node2", "node3", "node4"]

connection failed for: node2 (Errno::ETIMEDOUT: Connection timed out - connect(2) for "node2" port 22)

Build step 'Execute shell' marked build as failure

Finished: FAILURE

Reply all
Reply to author
Forward
0 new messages