Re: [ansible-project] Feature RFC: allow task failures to trigger non-zero exit codes

473 views
Skip to first unread message

Michael DeHaan

unread,
Sep 14, 2012, 4:47:01 PM9/14/12
to ansible...@googlegroups.com
The new command line option is not something I want.

Why not just return non-zero if any host has failures?

Both ansible and ansible-playbook should be included, and it shouldn't
require any changes to the core.


On Fri, Sep 14, 2012 at 4:35 PM, Brice Burgess <bric...@gmail.com> wrote:
> I've introduced a feature that allows ansible-playbook to return non-zero
> exit codes via a command line option, and wanted to gather the community's
> response before submitting a pull request.
>
> You can see the commit (against Ansible 0.7) here:
> https://github.com/briceburg/ansible/commit/ab550bee941d12164eeea6a6caa8817c8febead4
>
> In a nutshell it adds the`--host-failure-threshold` command line option,
> which when set, will direct the playbook to exit with a non-zero return if a
> defined amount of host failures have been reached. The command line option
> accepts 'any', 'all', or a specific number of host failures to allow before
> triggering the command to exit as an error. It is disabled by default (e.g.
> doesn't break existing scripts, and always returns 0). Please none that
> 'any' is equivalent to --host-failure-threshold=0, and that 'all' is
> equivalent to --host-failure-threshold=[number of hosts in playbook - 1].
>
> The reason *I* needed this change was we needed Jenkins to fail a build if
> the [ansible-playbook based] deployment had a task failure.
>
> Do you think this patch is OK as-is? Any features to change? I'd like to add
> unit-tests, although was unable to get them to run. What about also allowing
> it for the standalong [non playbook] ansible command?
>
> Thanks,
>
> ~ Brice

Brice Burgess

unread,
Sep 14, 2012, 4:56:30 PM9/14/12
to ansible...@googlegroups.com
Michael,

Thanks for the feedback. I thought the flexibility of specifying the number of host failures would be helpful/necessary for some scenarios. E.g. I could imagine an administrator that desires a non-zero exit if, any only if all hosts had failures. Does this make sense?

As for not making any changes to core -- I imagine that I've done this by modifying bin/ansible-playbook and am at a loss as to how to accomplish this without modifying this (or the bin/ansible) file. Can you point me in the right direction?

~ Brice

Brice Burgess

unread,
Sep 14, 2012, 4:58:25 PM9/14/12
to ansible...@googlegroups.com
Ack forgot to answer...


On Friday, September 14, 2012 3:47:01 PM UTC-5, Michael DeHaan wrote:

Why not just return non-zero if any host has failures?


Wouldn't this potentially break backwards compatibility? I imagine some production scripts pre this merge will now exit 1 vs. 0.

Michael DeHaan

unread,
Sep 14, 2012, 4:58:24 PM9/14/12
to ansible...@googlegroups.com
On Fri, Sep 14, 2012 at 4:56 PM, Brice Burgess <bric...@gmail.com> wrote:
> Michael,
>
> Thanks for the feedback. I thought the flexibility of specifying the number
> of host failures would be helpful/necessary for some scenarios. E.g. I could
> imagine an administrator that desires a non-zero exit if, any only if all
> hosts had failures. Does this make sense?

Not really.

>
> As for not making any changes to core -- I imagine that I've done this by
> modifying bin/ansible-playbook and am at a loss as to how to accomplish this
> without modifying this (or the bin/ansible) file. Can you point me in the
> right direction?

You misread.

Modify /bin/ansible, and /bin/ansible-playbook, do NOT modify the source code.

In other words, no new command line parameters, just iterate over the results.

Michael DeHaan

unread,
Sep 14, 2012, 5:01:29 PM9/14/12
to ansible...@googlegroups.com
>
> Wouldn't this potentially break backwards compatibility? I imagine some
> production scripts pre this merge will now exit 1 vs. 0.

No, if anything, it's a bugfix.

Brice Burgess

unread,
Sep 14, 2012, 5:29:19 PM9/14/12
to ansible...@googlegroups.com
Michael,

Got it. I think I'll maintain my patch -- as the above is such a trivial modification && doesn't provide the flexibility we'll eventually need. If you want me to submit a pull request I will.

Having 'all' in the command line option will become necessary for us in the future when we start dealing with multiple (100s) of webservers. I imagine a task failing on one or two of them for some reason or another [such as failed provisioning or unresponsiveness], but not on all. All would indicate a problem with the playbook / task itself.

The command line option also doesn't break backwards compat which I wouldn't want to be responsible for ;)

~ Brice

Michael DeHaan

unread,
Sep 14, 2012, 6:21:09 PM9/14/12
to ansible...@googlegroups.com
I strongly disagree that this is a "break".

I do agree that your use case is a little weird, and probably should
be it's own API script.

Michael DeHaan

unread,
Sep 14, 2012, 6:24:08 PM9/14/12
to ansible...@googlegroups.com
On Fri, Sep 14, 2012 at 5:23 PM, Peter Mancini <peter....@gmail.com> wrote:
> It sounds very reasonable to me and the consideration for existing
> production by excluding the behaviror from existing scripts without the
> command line addition is wise.

On the contrary, it should be true that failures result in non-zero
execute codes -- it's logical Unix behavior. Having this a
non-default behavior guarded by the CLI flag is confusing, and
increases the list of options folks have to understand.

If folks want to ignore the code, it's easy to do so.

I would bet most folks are already expecting ansible to return
non-zero in failure cases and are unaware that it only does so in the
case of syntax error with the playbook.

--Michael

Mike Bain

unread,
Jan 10, 2014, 2:01:36 PM1/10/14
to ansible...@googlegroups.com
+1 Having an option for ansible to return an exit code if a task fails would be good for us. We also have automated ansible scripts and rely on exit codes to know if the ansible commands worked or not.

I think of ansible as being great tool to run a users tasks. From this user centric point of view when a task fails the whole process has failed, even if technically ansible itself hasn't actually failed. I do care if ansible fails, and also care if one of my tasks fails. 

The current ansible return code setup feels like a browser not showing me an error response because it handled the web servers error correctly. 

Currently, to get a non zero exit code on ansible task failures I pipe the ansible output into perl: 

ansible-playbook <options> | perl -pe 'END { exit $status } $status=1 if /FAILED:|Failed:/;'

Cheers

Mike

Stephen Bunn

unread,
Oct 18, 2016, 8:57:50 PM10/18/16
to Ansible Project
+1 here; having ansible-playbook return a 0 exit code when tasks fail make automating with a CI server more difficult then it has to be.

Josh Smift

unread,
Oct 18, 2016, 9:37:25 PM10/18/16
to ansible...@googlegroups.com
SB> +1 here; having ansible-playbook return a 0 exit code when tasks fail
SB> make automating with a CI server more difficult then it has to be.

Do you have an example of this behavior? I feel like Ansible generally
exits non-zero if a task fails. This playbook, for example:

- hosts: host1
gather_facts: False
tasks:
- command: touch /tmp/exit-code-test

- hosts: host1:host2
gather_facts: False
tasks:
- command: rm /tmp/exit-code-test

The rm fails on host2, where the file doesn't exist, and ansible-playbook
exits non-zero.

Similarly:

ansible host1 -a 'rm /tmp/exit-code-test'

(when the file doesn't exist) exits non-zero.

-Josh (j...@care.com)

(apologies for the automatic corporate disclaimer that follows)

This email is intended for the person(s) to whom it is addressed and may contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized use, distribution, copying, or disclosure by any person other than the addressee(s) is strictly prohibited. If you have received this email in error, please notify the sender immediately by return email and delete the message and any attachments from your system.
Reply all
Reply to author
Forward
0 new messages