Re: Jobs hanging

1,485 views
Skip to first unread message

JC

unread,
Aug 23, 2012, 1:50:13 PM8/23/12
to rundeck...@googlegroups.com
Luiz,

I'll try that!

Thanks!

Em quinta-feira, 23 de agosto de 2012 12h44min45s UTC-3, Luiz Casey escreveu:
On Wednesday, August 22, 2012 3:26:59 PM UTC-4, JC wrote:
> I have created some jobs to restart different Linux services, but always a try to start or restart some service (service something restart/start) the job runs and hangs.
>
>
> Where am I getting wrong?
>
>
> Thanks!

You should try that latest dev build. This was a bug related to this that was fixed.

Luiz Casey

unread,
Aug 23, 2012, 1:53:04 PM8/23/12
to rundeck...@googlegroups.com
Also make sure its not a tty issue but an actual hang of the job.

Anthony Shortland

unread,
Aug 24, 2012, 10:38:49 AM8/24/12
to rundeck...@googlegroups.com
If the services are not properly setup as daemons they may hold I/O channels (i.e. standard input) open when they spawn which will prevent the Rundeck ssh connection closing.

Can you post an specific example?

Anthony.

On Aug 22, 2012, at 12:26 PM, JC <julio....@trustux.com.br> wrote:

I have created some jobs to restart different Linux services, but always a try to start or restart some service (service something restart/start) the job runs and hangs.

Where am I getting wrong?

Thanks!

Anthony Shortland
Professional Services | DTO Solutions, Inc. | mobile: 650.215.3117 aim: anthony....@me.com yahoo: anthony.shortland skype: anthony.shortland ]

JC

unread,
Aug 29, 2012, 4:58:11 PM8/29/12
to rundeck...@googlegroups.com
Anthony,

Here is an example:

05:51:59 [sysop@PBX  1-exec][INFO] Shutting down asterisk: [  OK  ]
05:51:59 [sysop@PBX  1-exec][INFO] Starting asterisk: [  OK  ] (Hangs...)
05:52:42 [sysop@PBX  1-exec][SEVERE] java.lang.InterruptedException (Kill the job)
05:52:42 [null@null null null][SEVERE] Execution failed on the following 1 nodes: {PBX=[Workflow : exception: com.dtolabs.rundeck.core.execution.workflow.WorkflowStepFailureException: Step 1 of the workflow threw exception: Execution failed on the following 1 nodes: {PBX=[jsch-ssh] result was failure, resultcode: -1: java.lang.InterruptedException}]}

Note that init script was written with "/etc/rc.d/init.d/functions" (RedHat based server), the strange fact is that another init script (httpd) don't hang's.

Can I do something in those scripts that hangs to not hold I/O channels?

Thanks!

Julio Cesar

Anthony Shortland

unread,
Aug 29, 2012, 5:29:46 PM8/29/12
to rundeck...@googlegroups.com
Good to see the command output, but could you post your Rundeck job definition and the init script it calls? 

Are you starting your service using the "daemon" function in the init.d functions? It is possibly to close I/O channels a prevent ssh hanging.

Anthony.

JC

unread,
Aug 29, 2012, 6:29:04 PM8/29/12
to rundeck...@googlegroups.com
Sorry man! Now I've attached the job definition file and the init script.

Thanks!
77d55f17-7efc-4fc5-a134-46db97a2bb6e.yaml
asteriskd.zip

Anthony Shortland

unread,
Aug 30, 2012, 11:26:46 AM8/30/12
to rundeck...@googlegroups.com
Hi Julio,

Great. So your Rundeck job simply executes "sudo /sbin/service asterisk ${option.method}" and I guess from the log below your calling it with "restart".

It is a well written init script whose start command ultimately calls "daemon nice -n -20 $DAEMON $ASTARGS" (a function from /etc/rc.d/init.d/functions).

Given you're not using the "--user" option to daemon, of my system you end up starting the service as: $cgroup $nice /bin/bash -c "$corelimit >/dev/null 2>&1 ; $*"

I'm looking at CentOS version 6.2, what version of Redhat/CentOS are you on? It'd be great to see the equivalent line from your version of the OS.

All this said, though, ultimately the function assumes that your program ("DAEMON=$AST_SBIN/asterisk") is a well-behaved Linux daemon: i.e. it should behave per the daemon(3) system library function.

You can examine the running process to check whether this is the case. For example, my login shell has:

[anthony@centos62-mcollective ~]$ ps -p 28299 
  PID TTY          TIME CMD
28299 pts/0    00:00:00 bash
[anthony@centos62-mcollective ~]$ lsof -p 28299
COMMAND   PID    USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
bash    28299 anthony  cwd    DIR  253,0     4096 916192 /home/anthony
bash    28299 anthony  rtd    DIR  253,0     4096      2 /
bash    28299 anthony  txt    REG  253,0   938672    125 /bin/bash
bash    28299 anthony  mem    REG  253,0   156872 261519 /lib64/ld-2.12.so
bash    28299 anthony  mem    REG  253,0    22536 261522 /lib64/libdl-2.12.so
bash    28299 anthony  mem    REG  253,0  1979000 261520 /lib64/libc-2.12.so
bash    28299 anthony  mem    REG  253,0   138280 261560 /lib64/libtinfo.so.5.7
bash    28299 anthony  mem    REG  253,0    65928 261152 /lib64/libnss_files-2.12.so
bash    28299 anthony  mem    REG  253,0 99158704 394755 /usr/lib/locale/locale-archive
bash    28299 anthony    0u   CHR  136,0      0t0      3 /dev/pts/0
bash    28299 anthony    1u   CHR  136,0      0t0      3 /dev/pts/0
bash    28299 anthony    2u   CHR  136,0      0t0      3 /dev/pts/0
bash    28299 anthony  255u   CHR  136,0      0t0      3 /dev/pts/0
[anthony@centos62-mcollective ~]$ ls -l /proc/28299/cwd 
lrwxrwxrwx. 1 anthony anthony 0 Aug 27 22:47 /proc/28299/cwd -> /home/anthony

... a terminal ("pts/0"), standard output, error and input attached to the terminal (file descriptors 0,1,2) and a current working directory of "/home/anthony".

Whereas the Apache http server has:

[root@centos62-mcollective ~]# ps -p 16760
  PID TTY          TIME CMD
16760 ?        00:00:00 httpd
[root@centos62-mcollective ~]# lsof -p 16760
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
.
.
.
httpd   16760 root    0r   CHR    1,3      0t0   3820 /dev/null
httpd   16760 root    1w   CHR    1,3      0t0   3820 /dev/null
httpd   16760 root    2w   REG  253,0     1475 783622 /var/log/httpd/error_log
httpd   16760 root    3u  sock    0,6      0t0  28885 can't identify protocol
httpd   16760 root    4u  IPv6  28886      0t0    TCP *:http (LISTEN)
httpd   16760 root    5r  FIFO    0,8      0t0  28903 pipe
httpd   16760 root    6w  FIFO    0,8      0t0  28903 pipe
httpd   16760 root    7w   REG  253,0     4544 783638 /var/log/httpd/access_log
httpd   16760 root    8r   CHR    1,9      0t0   3825 /dev/urandom
[root@centos62-mcollective ~]# ls -l /proc/16760/cwd
lrwxrwxrwx. 1 root root 0 Aug 27 22:51 /proc/16760/cwd -> /

... no terminal, and hence no file descriptors attached to a terminal, and a current working directory of "/".

I believe you hanging problem is caused by $AST_SBIN/asterisk not handling the terminal appropriately to properly become a background process.  After checking whether this is the case, you could try setting "DAEMON=nohup $AST_SBIN/asterisk" since nohup(1) handles this issue (or you could fix $AST_SBIN/asterisk itself).

Anthony.

<77d55f17-7efc-4fc5-a134-46db97a2bb6e.yaml><asteriskd.zip>




JC

unread,
Aug 31, 2012, 12:15:21 PM8/31/12
to rundeck...@googlegroups.com
Hi Anthony!

Even with nohup I got the same issue. The CentOS version is 4.4.

I got the same problem with others distros (debian, CentOS5.x) and FreeBSD (with others scripts).

Thanks for helping!

Old Schepperhand

unread,
Sep 1, 2012, 10:51:12 PM9/1/12
to rundeck...@googlegroups.com
Hi Julio

Once i had a very similar problem which was not related to Rundeck,
but with similar effects,
where a service started with ssh stopped after the connection was disconnected.

On RedHat/CentOS5 sudoers require per default a tty and with some
services this doesnt work,
even when there is a pty assigned and started with nohup.

In my case it helped to set

Defaults:username !requiretty

in /etc/sudoers

Regards
Markus

Anthony Shortland

unread,
Sep 4, 2012, 11:01:08 AM9/4/12
to rundeck...@googlegroups.com
Hi Julio,

Have you tried switching Rundeck to use OpenSSH as opposed to the JSch node executor it uses by default? I posted Using sudo and OpenSSH with the Rundeck script-plugin on the subject some time ago ... the terminal handling is probably different with OpenSSH and I'd like to hear if this fixes your problem.

Did you run lsof against the Asterisk daemon process? It'd be interesting to see what file descriptors it has open.

Anthony.

JC

unread,
Sep 6, 2012, 3:52:52 PM9/6/12
to rundeck...@googlegroups.com
Hi Anthony,

I have tried this too, but without success... For now, I've tried some others tricks, like include "-t/-ttt" arguments for the ssh command, but without success...

Is not a problem only for asterisk daemon, others scripts are having the same problem... For now I'm using the option of killing the job manually at the end of execution...

Julio Cesar.

Sebastien Wains

unread,
Aug 23, 2013, 5:17:03 AM8/23/13
to rundeck...@googlegroups.com
I'm having the same problem with Salt Minion restart.

Here's the init script :

RHEL 5.x/6.x servers.

Steven Liu

unread,
Sep 12, 2014, 1:46:11 AM9/12/14
to rundeck...@googlegroups.com
I have the same problem with a script to start a JAVA program, always shows running on Rundeck but the script have successed on the node.

在 2012年8月23日星期四UTC+8上午3时26分59秒,JC写道:

Alex Honor

unread,
Sep 12, 2014, 11:17:22 AM9/12/14
to rundeck...@googlegroups.com
The script that starts the java program might not be closing file handles or properly backgrounding. It's possible the salt-minion script has the same problem. 

--
You received this message because you are subscribed to the Google Groups "rundeck-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rundeck-discu...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Alex Honor

[SimplifyOps, Inc | a...@simplifyops.com ]

Be sure to comment and vote on Rundeck Feature Development!

Sebastien Wains

unread,
Sep 12, 2014, 12:40:32 PM9/12/14
to rundeck...@googlegroups.com
It was indeed a bug in Salt, which has been fixed since then.

--
You received this message because you are subscribed to a topic in the Google Groups "rundeck-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rundeck-discuss/9VZ_J9bGaCE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rundeck-discu...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages