Re: [ansible-project] Service Module: Service state recognition

1,373 views
Skip to first unread message

Michael DeHaan

unread,
Jun 14, 2012, 4:50:03 PM6/14/12
to ansible...@googlegroups.com


    running = False
    if status_stdout.find("stopped") != -1 or rc == 3:
        running = False
    elif status_stdout.find("running") != -1 or rc == 0:
        running = True
    elif name == 'iptables' and status_stdout.find("ACCEPT") != -1:
        # iptables status command output is lame
        # TODO: lookup if we can use a return code for this instead?
        running = True

I've tested on Ubuntu 10.04 because we mainly use this release by now.

Let me give you a few examples of the output of the service <servicename> status command.

MySQL:
  • Output in stopped state: mysql stop/waiting

Ok so it would return "running = False" in this case, which is correct.
 
  • Return Code in stopped state: 0
  • Output in started state: mysql start/running, process 20846
  • Return Code in started state: 0

So the math above would find running first in the above logic and return running = True, which is ALSO correct.
 

Apache2:
  • Output in stopped state: Apache is NOT running.
  • Return Code in stopped state: 1
  • Output in started state: Apache is running (pid 21454).
  • Return Code in started state: 0

The Problem here is, that the current service module always sees my Apache in the state running, even if it's not.
Since Ubuntu is using two ways for their init scripts. The old init way and the new upstart.

This means it the code could also make a check to make sure the string does not contain "not".   Patches accepted, sounds like a trivial fix.

Maybe it would be a better solution to check first if the service has an upstart script. (e.g. with initctl list)
With upstart the service status output is standardized and you can check for the keywords.

Not sure this is necessary per the above.    

The output of the old init scripts is just "free human text" and therefor not reliable for a keyword based check.

Yeah though I think the "not" is the only case we have had of this so far.  


P.S.: sorry for the long post, but this topic is really important to me and i think it's a core function of ansible that's broken

FWIW, you're the first one out of hundreds to mention it.   

Patch is pretty simple though.  Send me a pull request.




Ingo Gottwald

unread,
Jun 15, 2012, 1:51:34 PM6/15/12
to ansible...@googlegroups.com


Am Donnerstag, 14. Juni 2012 22:50:03 UTC+2 schrieb Michael DeHaan:


    running = False
    if status_stdout.find("stopped") != -1 or rc == 3:
        running = False
    elif status_stdout.find("running") != -1 or rc == 0:
        running = True
    elif name == 'iptables' and status_stdout.find("ACCEPT") != -1:
        # iptables status command output is lame
        # TODO: lookup if we can use a return code for this instead?
        running = True

I've tested on Ubuntu 10.04 because we mainly use this release by now.

Let me give you a few examples of the output of the service <servicename> status command.

MySQL:
  • Output in stopped state: mysql stop/waiting

Ok so it would return "running = False" in this case, which is correct.

No it would not return "running = False". It would return "running = True" because of the output "stop" (not stopped, like the find is looking for) and because of the return code of the service binary. It could tell us correctly that the service is not running so it's return code is 0 which leads to a "running = True" in this if clause.
 
 
  • Return Code in stopped state: 0
  • Output in started state: mysql start/running, process 20846
  • Return Code in started state: 0

So the math above would find running first in the above logic and return running = True, which is ALSO correct.
 

Apache2:
  • Output in stopped state: Apache is NOT running.
  • Return Code in stopped state: 1
  • Output in started state: Apache is running (pid 21454).
  • Return Code in started state: 0

The Problem here is, that the current service module always sees my Apache in the state running, even if it's not.
Since Ubuntu is using two ways for their init scripts. The old init way and the new upstart.

This means it the code could also make a check to make sure the string does not contain "not".   Patches accepted, sounds like a trivial fix.

This would fix the problem for apache. 
But this would be only for this one case. (another example at the bottom of this text)

Maybe it would be a better solution to check first if the service has an upstart script. (e.g. with initctl list)
With upstart the service status output is standardized and you can check for the keywords.

Not sure this is necessary per the above.    

The output of the old init scripts is just "free human text" and therefor not reliable for a keyword based check.

Yeah though I think the "not" is the only case we have had of this so far.  


P.S.: sorry for the long post, but this topic is really important to me and i think it's a core function of ansible that's broken

FWIW, you're the first one out of hundreds to mention it.   

Patch is pretty simple though.  Send me a pull request.



Try atop on ubuntu 10.04 for example.
The atop init script is not yet an upstart job (on 10.04) and has no status keyword.
So there are no words like "running" or "stopped" and the return code is 1.
We could not determine the running state of this process, but we still state that it is not running and we would run the start script, which will fail when it's already running.

I know, it's not the fault of ansible that there are crappy init scripts around, but they're out there and I'm trying to figure out a better way of handling them.

Maybe I should just try it in a branch of my fork on github and you could review it when I'm done. 
Would that be ok for you?

Michael DeHaan

unread,
Jun 15, 2012, 2:01:44 PM6/15/12
to ansible...@googlegroups.com

Maybe I should just try it in a branch of my fork on github and you could review it when I'm done. 
Would that be ok for you?

That sounds good

I am ok with some upstart specific code, though we should also add the "not" code above and also change the search to look for "stop" and not stopped -- since not everyone is using upstart.

--Michael

 

Ingo Gottwald

unread,
Jun 17, 2012, 10:30:29 AM6/17/12
to ansible...@googlegroups.com
Hi,

I've created a patch now.

Basicly what I've done is:

I created a method for the state recognition because it got bigger.
There's now an initial state "None", which prevents from falsely running scripts.

Then I've ordered the state recognition methods by their safety of the outputs.
Upstart is fist (when it's there), because the output is always consistent.
If not found by upstart, then it'll be checked by init script response code.

If not found by that, then it'll be checked by the output of the init script.
Additionally the init script output gets cleaned from the service name und transformed to lower case.
This should prevent false positives in case a daemon is called "notify-daemon" or something like that.
Otherwise the search for the word "not" would lead to a false positive...

I did leave the special section for iptables in there, but I think these special cases might be better covered by writing init scripts that do have a status method.
Atop on Ubuntu 10.04 is just the same. It has no status method and therefore always sends the response code 1.
Since atop creates a pid file when running it should be quite simple to fix that init script instead of tweaking this in ansible.

So here's the commit. Work's very good for me.

Could you please test it on your systems?

Tell me if you want to change something or if I should send you the pull request.

Best Regards 

Ingo

Michael DeHaan

unread,
Jun 17, 2012, 11:25:13 AM6/17/12
to ansible...@googlegroups.com


Could you please test it on your systems?

Tell me if you want to change something or if I should send you the pull request.


Yeah, send the pull request.  It's easier for folks to test when it's already in the main tree. 
Thanks!

Ingo Gottwald

unread,
Jun 17, 2012, 12:35:31 PM6/17/12
to ansible...@googlegroups.com
done.

Rodney Quillo

unread,
Jul 9, 2012, 10:01:07 AM7/9/12
to Ansible Project
Hi,

Hmm. I have also nginx on Ubuntu 12.04 which results to the following
on it's status if it's not running:

(ansible)ubuntu@ubuntu:~/ansible$ sudo /etc/init.d/nginx status
* could not access PID file for nginx
(ansible)ubuntu@ubuntu:~/ansible$ echo $?
4

>there are no words like "running" or "stopped" and the return code is 1
nginx here returns 4 instead of 1.

My simple solution is to fix the init script (hmm. but there might be
more init scripts out there that might be doing the same?)

Ingo Gottwald

unread,
Jul 9, 2012, 11:27:49 AM7/9/12
to ansible...@googlegroups.com
Hi,

you did the right thing. 
This is a bug in the init file and it should get fixed in the init file.

According to the Linux Standard Base Core Specification, the exit code "4" stands for "program or service status is unknown"
So when the init file itself says that the status is unknown, we just have no other choice.

Seems like ubuntu has a few crappy init files.
Especially the ones not ported to upstart.

Regards,

Ingo

Jérémie Tarot

unread,
Jul 9, 2012, 2:15:46 PM7/9/12
to ansible...@googlegroups.com
Hi,

2012/7/9 Ingo Gottwald <in.go...@gmail.com>:
> Hi,
>
> Seems like ubuntu has a few crappy init files.
> Especially the ones not ported to upstart.
>

Same trouble here with ejabberd on Debian... had to fix the init
script from top to bottom :-/

Bests

PS: Thanks for reminding me giving a look at LSB core to verify what I
did is not even worse !

--
Jérémie Tarot
http://about.me/silopolis

Mark Theunissen

unread,
Jul 10, 2012, 11:39:37 AM7/10/12
to ansible...@googlegroups.com
I just ran into this with nginx. Has anyone added bug reports to Ubuntu/Debian, perhaps with their fixed init scripts?

Rodney Quillo

unread,
Jul 10, 2012, 9:01:46 PM7/10/12
to Ansible Project
Hmm.. I'm not sure if there is a bug report to this.
It's my idea is create a init script of nginx and include that in
playbook to setup for now.

Got another problem with service status for uwsgi
As Ingo pointed out, there are more crappy init scripts out there. :)


On Jul 10, 11:39 pm, Mark Theunissen <mark.theunis...@gmail.com>
wrote:
> I just ran into this with nginx. Has anyone added bug reports to
> Ubuntu/Debian, perhaps with their fixed init scripts?
>
>
>
>
>
>
>
> On Monday, July 9, 2012 1:15:46 PM UTC-5, silopolis wrote:
>
> > Hi,
>
> > 2012/7/9 Ingo Gottwald <in.gottw...@gmail.com>:

Lorin Hochstein

unread,
Jul 12, 2012, 10:28:15 PM7/12/12
to ansible...@googlegroups.com
I'm not sure what the issue is with the nginx startup script. I reported an issue to Ubuntu (since they package it), was marked invalid: https://bugs.launchpad.net/ubuntu/+source/nginx/+bug/1023389

(They mentioned checking the return value, not sure what Ansible does with this right now).

Today I ran into an issue where I tried to restart the (stopped) nginx service from ansible, and it didn't start. Haven't tried to reproduce it, though.


Take care,

Lorin
--
Lorin Hochstein
Lead Architect - Cloud Services
Nimbis Services, Inc.




Rodney Quillo

unread,
Jul 12, 2012, 11:35:34 PM7/12/12
to Ansible Project
> (They mentioned checking the return value, not sure what Ansible does with this right now).

On ansible, when /etc/init.d/nginx status is run, it returns 4 (which
means unknows), when nginx is not running.
I'm not sure if service can be coded as not running for non-zero exit
status?

Ingo Gottwald

unread,
Jul 13, 2012, 3:21:36 AM7/13/12
to ansible...@googlegroups.com
Nope, that would be a very bad idea!
An unknown state is an unknown state, not a stopped state:

Imagine you have a running process and some guy accidently removes your pid file.
If we treat this as a stopped state you would soon be having 2 running processes which in some case can cause big trouble...

I registered at launchpad and left a comment on this bug.
Please read it, there's the information included how you can fix this properly.
It's just 1 line you have to change, since one LSB function is used a little bit incorrectly.

Regards 

Ingo

Rodney Quillo

unread,
Jul 13, 2012, 9:37:06 PM7/13/12
to Ansible Project
Ingo, nice comments. I agree with it.:)
Reply all
Reply to author
Forward
0 new messages