strange issue with shell or command modules never returning

2,925 views
Skip to first unread message

Jonathan

unread,
Jun 29, 2012, 3:01:36 PM6/29/12
to ansible...@googlegroups.com
Hi everyone,

I am stumped on an issue trying to run a command (/sbin/service jetty start) via any combination of ansible command line, playbook with either the shell or command modules and ansible never exiting.  If I ask ansible to start my jetty instance via any of these methods the command never exits.  I am using ansible 0.4.  For overall context this work is intended as part of a web application deployment process.  If anyone has any input on this I'd be much obliged.

I'll walk through a command module and shell module example as they are easier to explain than the playbook.

here is the example command line:

ansible test2 -D --module-name=shell -a "/sbin/service jetty start"

or 

ansible test2 -D -a "/sbin/service jetty start"

Executing either of these commands will actually do the "work" of starting jetty on the remote server, but ansible never exits from the tasks above.  Executing this program on the target host directly happens right away with an exit code of 0  (ie: it works as expected when done manually).

the content of the test2 group is a single server, hosts content below:

----

[test2]
swup-ua-lt02v.swup

----

The commands above never exit, this command:

[root@swup-mgt-util01v playbooks]# ansible test2 -a "/sbin/service jetty start"
 
has been running for over an hour, however the "work" to start the jetty instance does happen, so I know the command is being executed, it just never finishes.  Ansible *will* exit if the init script exits with a non-zero exit code, say for example if the service is already running:

[root@swup-mgt-util01v playbooks]# ansible test2 -D --module-name=shell -a "service jetty start"
swup-ua-lt02v.swup | FAILED | rc=1 >>
Starting Jetty: Already Running!

[root@swup-mgt-util01v playbooks]# 


other commands seem to run fine:

[root@swup-mgt-util01v playbooks]# ansible test2 -D -a "/bin/ls -l /usr/share/ansible"
swup-ua-lt02v.swup | success | rc=0 >>
total 152
-rwxr-xr-x 1 root root  5299 Jun  1 16:40 apt
-rwxr-xr-x 1 root root  2761 Jun  1 16:40 async_status
-rwxr-xr-x 1 root root  6141 Jun  1 16:40 async_wrapper
-rwxr-xr-x 1 root root  2687 Jun  1 16:40 command
-rwxr-xr-x 1 root root  1880 Jun  1 16:40 copy
-rwxr-xr-x 1 root root   879 Jun  1 16:40 facter
-rwxr-xr-x 1 root root   948 Jun  1 16:40 failtest
-rwxr-xr-x 1 root root   857 Jun  1 16:40 fetch
-rwxr-xr-x 1 root root 10658 Jun  1 16:40 file
-rwxr-xr-x 1 root root  6053 Jun  1 16:40 git
-rwxr-xr-x 1 root root  4715 Jun  1 16:40 group
-rwxr-xr-x 1 root root   784 Jun  1 16:40 ohai
-rwxr-xr-x 1 root root   972 Jun  1 16:40 ping
-rwxr-xr-x 1 root root   876 Jun  1 16:40 raw
-rwxr-xr-x 1 root root  7199 Jun  1 16:40 service
-rwxr-xr-x 1 root root 14545 Jun  1 16:40 setup
-rw-r--r-- 1 root root   230 Jun  1 16:40 shell
-rwxr-xr-x 1 root root  1944 Jun  1 16:40 slurp
-rwxr-xr-x 1 root root   939 Jun  1 16:40 template
-rwxr-xr-x 1 root root 10640 Jun  1 16:40 user
-rwxr-xr-x 1 root root 11481 Jun  1 16:40 virt
-rwxr-xr-x 1 root root 10451 Jun  1 16:40 yum

and the /sbin/service jetty start command that is being executed runs fine when manually executed on the target host and exits with a code of 0 right away.

As far as I can tell there is something specific about the service command exiting with a 0 that is causing ansible to never exit.  I get the same behavior when calling the /etc/init.d/jetty script directly too.

The strace output for ansible in its "wait state" is not overly useful, though I honestly can't read this output very well:

wait4(27480, 0x7fffe2227414, WNOHANG, NULL) = 0
wait4(27485, 0x7fffe2227414, WNOHANG, NULL) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f75fd34a9d0) = 27490
wait4(27480, 0x7fffe2227414, WNOHANG, NULL) = 0
wait4(27485, 0x7fffe2227414, WNOHANG, NULL) = 0
wait4(27490, 0x7fffe2227414, WNOHANG, NULL) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f75fd34a9d0) = 27491
wait4(27480, 0x7fffe2227414, WNOHANG, NULL) = 0
wait4(27485, 0x7fffe2227414, WNOHANG, NULL) = 0
wait4(27491, 0x7fffe2227414, WNOHANG, NULL) = 0
wait4(27490, 0x7fffe2227414, WNOHANG, NULL) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f75fd34a9d0) = 27492
wait4(27480, 0x7fffe2227414, WNOHANG, NULL) = 0
wait4(27485, 0x7fffe2227414, WNOHANG, NULL) = 0
wait4(27491, 0x7fffe2227414, WNOHANG, NULL) = 0
wait4(27492, 0x7fffe2227414, WNOHANG, NULL) = 0
wait4(27490, 0x7fffe2227414, WNOHANG, NULL) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f75fd34a9d0) = 27494
wait4(27485, 0x7fffe2227414, WNOHANG, NULL) = 0
wait4(27492, 0x7fffe2227414, WNOHANG, NULL) = 0
wait4(27494, 0x7fffe2227414, WNOHANG, NULL) = 0
wait4(27480, 0x7fffe2227414, WNOHANG, NULL) = 0
wait4(27490, 0x7fffe2227414, WNOHANG, NULL) = 0
wait4(27491, 0x7fffe2227414, WNOHANG, NULL) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f75fd34a9d0) = 27497
wait4(27490, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 27490
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(27491, 0x7fffe22272b4, 0, NULL)   = ? ERESTARTSYS (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(27491, 0x7fffe22272b4, 0, NULL)   = ? ERESTARTSYS (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(27491, 0x7fffe22272b4, 0, NULL)   = ? ERESTARTSYS (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(27491,

If I can provide any other information I'd be happy to.

thanks for everyone's time,
Jonathan



Michael DeHaan

unread,
Jun 29, 2012, 3:06:12 PM6/29/12
to ansible...@googlegroups.com
I'd be suspicious of the init script and something it is doing with the console (for some reason, it seems common for Java application server init scripts to misbehave…. I am looking at you too WebLogic and JBoss).

You could possibly invoke it in async mode with a time limit, and ansible should kill the service start process after that time limit expires.

Dave Coutu

unread,
Jun 29, 2012, 5:32:35 PM6/29/12
to ansible...@googlegroups.com
Jonathan,

I ran into the same exact problems you are running into now on my RHEL boxes.  The jetty init scripts as shipped with recent versions of Jetty are somewhat broken, as they don't present any real exit status codes that Ansible can read from since they basically hand off things to java and the JVM, which never return status after the shell detach.    I made things work by re-writing portions of the init scripts, getting them to source the RedHat functions library script (like most daemons on RHEL) and compiling the daemonize wrapper that will launch the java process, detach and properly provide exit status to the system and Ansible.  If you want, I could share my init scripts after I sanitize them and you should pulled down the daemonize source and compile it into an RPM.  You can get it at http://software.clapper.org/daemonize/

I tried the async mode that Michael suggested when I had the same problems, but it caused more issues at the time since it always fired off and reported successful even if jetty bailed somewhere down the line.  If you are on RHEL or a derivative, let me know if you'd like the help.  Even if you aren't, you could probably tweak them and make them work.  The key piece for me was the daemonize wrapper.  Thanks!

Dave

Jonathan Claybaugh

unread,
Jun 29, 2012, 7:20:01 PM6/29/12
to ansible...@googlegroups.com
Hi Michael and Dave,

So as far as I can tell our init scripts (from jetty-hightide-server-8.1.3) were working properly in terms of providing exit codes.  Actually stopping all the java processes that are spawned by jetty is another matter entirely. That said I'd be happy to hoover up a sanitized copy of your init scripts if you are willing, but of low priority.

In terms of the "not exiting issue" I've re-factored my approach as the step of restarting jetty was part of a long list of handlers for an RPM installation (ie: restart app on yum update) and I've moved the somewhat complicated logic associated with this (moving the machine out of the load balancer, stop/start, test app / pre-populate caches, re-attach to LB) into a script that is in the package that is being updated.  This single script is then the sole handler associated to the yum module play.  

After going through this process of moving the jetty start into a separate script I was *still* running into the issue of ansible not exiting (either w/ command or shell as the module) when a co-worker told me what I was facing was something to do w/ ssh waiting for output.

I changed my command line to:


to

/opt/foo/restart_application.pl >& /dev/null

and that got things working.

I went back to the init script to see if the same fix would work ...

ansible test2 -m command -a "/sbin/service jetty start >& /dev/null"

and it didn't... so really not sure what the story is but I suspect this same issue could happen to others relying on ssh for transport.

cheers,
Jonathan

Michael DeHaan

unread,
Jun 29, 2012, 7:45:41 PM6/29/12
to ansible...@googlegroups.com

>
> and that got things working.
>
> I went back to the init script to see if the same fix would work ...
>
> ansible test2 -m command -a "/sbin/service jetty start >& /dev/null"

You will want the 'shell' module if you are going to be doing shell things.

The command module does not use the shell.

james....@clientcatalyst.com.au

unread,
Sep 24, 2013, 8:05:32 PM9/24/13
to ansible...@googlegroups.com
I'm still having very similar issues using a nodejs npm module pm2.  actions like
shell: pm2 start pm2-start.json chdir=~/ executable=/bin/bash
will correctly run the script but won't move on afterwards.  Using >& /dev/null will allow ansible to correctly move forward but is not ideal.

Michael DeHaan

unread,
Sep 24, 2013, 8:24:59 PM9/24/13
to ansible...@googlegroups.com
Hi James,

I can definitely see that would be confusing.

I'm not familiar with node pm 2, but does that normally return when you run it from the shell?

It may be that it is not daemonizing properly from what you said, so the redirect may be a good option, the other might be to just launch it in async mode in Ansible (fire and forget) and not poll.




--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Michael DeHaan <mic...@ansibleworks.com>
CTO, AnsibleWorks, Inc.
http://www.ansibleworks.com/

james....@clientcatalyst.com.au

unread,
Sep 24, 2013, 8:38:14 PM9/24/13
to ansible...@googlegroups.com
It does return when run from the shell, the modules purpose is to run and maintain node servers in the background.  I'll give the async option a go.  thanks.
Reply all
Reply to author
Forward
0 new messages