how to exit the ansible run, or handle it, when any remote node fails a task

2,794 views
Skip to first unread message

Kathy Allen

unread,
Aug 6, 2015, 3:05:11 PM8/6/15
to Ansible Project
Hi.

I'm working on some orchestration where I need to run a task across sets of N remote nodes. If that task fails on any one of the remote nodes, the orchestration needs to halt (or be handled somehow). In my test, I cause one node to fail and I expected the entire ansible run to bomb out, but that's not what happened. The failed node is reported, but the playbook continues on.

How can I make ansible exit upon the failure of any one of these nodes?

Or, how can I have some kind of handler to pause the run before continuing? (I've not yet looked into handlers)

Playbook, plays, tasks, and output are shown below. One question about the output: for the node that failed, the task "debug: var=output" is absent. That task only fires for the successful node. Should I expect that task to also fire for the failed node? I was surprised by that.

Thanks!
kallen


$ cat testplaybook
.yml
---
- hosts: 127.0.0.1
  connection
: local
  gather_facts
: False
  tasks
:

- include: "{{ playbook_dir }}/common/test.yml myhosts=app_set1"
- include: "{{ playbook_dir }}/common/test2.yml"


$ cat common
/test.yml
---
- hosts: '{{ myhosts }}'
  gather_facts
: false
  tasks
:
 
- debug: msg="working on {{ play_hosts }}."
 
- name: Simple script exits 1 or 0
    shell
: /usr/local/bin/check
    sudo
: yes
   
register: output

 
- debug: var=output
 
- debug: msg="FAILED - script exited non-zero"
    failed_when
: output['rc'] != 0


$ cat common
/test2.yml
---
- hosts: 127.0.0.1
  connection
: local
  gather_facts
: false
  tasks
:
 
- debug: msg="If any hosts failed prior to this, you shouldn't see this message."



$ ansible
-playbook -i inventory/inv.ini testplaybook.yml

PLAY
[127.0.0.1] **************************************************************
Thursday 06 August 2015  18:51:10 +0000 (0:00:00.018)       0:00:00.018 *******
===============================================================================

PLAY
[app_set1] ***********************************************
Thursday 06 August 2015  18:51:10 +0000 (0:00:00.000)       0:00:00.018 *******
===============================================================================

TASK
: [debug msg="working on {{ play_hosts }}."] ******************************
Thursday 06 August 2015  18:51:10 +0000 (0:00:00.004)       0:00:00.022 *******
ok
: [webapp01b.aue1t.example.com] => {
   
"msg": "working on ['webapp01b.aue1t.example.com', 'webapp01e.aue1t.example.com']."
}
ok
: [webapp01e.aue1t.example.com] => {
   
"msg": "working on ['webapp01b.aue1t.example.com', 'webapp01e.aue1t.example.com']."
}

TASK
: [Simple script exits 1 or 0] ********************************************
Thursday 06 August 2015  18:51:10 +0000 (0:00:00.087)       0:00:00.110 *******
failed
: [webapp01b.aue1t.example.com] => {"changed": true, "cmd": "/usr/local/bin/check", "delta": "0:00:00.014861", "end": "2015-08-06 18:51:11.585755", "rc": 1, "start": "2015-08-06 18:51
:11.570894"
, "warnings": []}
stdout
: exiting 1
changed
: [webapp01e.aue1t.example.com]

TASK
: [debug var=output] ******************************************************
Thursday 06 August 2015  18:51:11 +0000 (0:00:01.466)       0:00:01.576 *******
ok
: [webapp01e.aue1t.example.com] => {
   
"output": {
       
"changed": true,
       
"cmd": "/usr/local/bin/check",
       
"delta": "0:00:00.009886",
       
"end": "2015-08-06 18:51:11.654924",
       
"invocation": {
           
"module_args": "/usr/local/bin/check",
           
"module_name": "shell"
       
},
       
"rc": 0,
       
"start": "2015-08-06 18:51:11.645038",
       
"stderr": "",
       
"stdout": "exiting 0",
       
"stdout_lines": [
           
"exiting 0"
       
],
       
"warnings": []
   
}
}

TASK
: [debug msg="FAILED - script exited non-zero"] ***************************
Thursday 06 August 2015  18:51:11 +0000 (0:00:00.023)       0:00:01.600 *******
ok
: [webapp01e.aue1t.example.com] => {
   
"failed": false,
   
"failed_when_result": false,
   
"msg": "FAILED - script exited non-zero"
}

PLAY
[127.0.0.1] **************************************************************
Thursday 06 August 2015  18:51:11 +0000 (0:00:00.019)       0:00:01.619 *******
===============================================================================

TASK
: [debug msg="If any hosts failed prior to this, you shouldn't see this message."] ***
Thursday 06 August 2015  18:51:11 +0000 (0:00:00.000)       0:00:01.620 *******
ok
: [127.0.0.1] => {
   
"msg": "If any hosts failed prior to this, you shouldn't see this message."
}

PLAY RECAP
********************************************************************
Thursday 06 August 2015  18:51:11 +0000 (0:00:00.002)       0:00:01.622 *******
===============================================================================
           to
retry, use: --limit @/home/kallen/testplaybook.retry

127.0.0.1                  : ok=1    changed=0    unreachable=0    failed=0
webapp01b
.aue1t.example.com : ok=1    changed=0    unreachable=0    failed=1
webapp01e
.aue1t.example.com : ok=4    changed=1    unreachable=0    failed=0






Karl E. Jorgensen

unread,
Aug 6, 2015, 3:50:41 PM8/6/15
to ansible...@googlegroups.com
Hi

On Thu, 2015-08-06 at 12:05 -0700, Kathy Allen wrote:
> Hi.
>
> I'm working on some orchestration where I need to run a task across
> sets of N remote nodes. If that task fails on any one of the remote
> nodes, the orchestration needs to halt (or be handled somehow). In my
> test, I cause one node to fail and I expected the entire ansible run
> to bomb out, but that's not what happened. The failed node is
> reported, but the playbook continues on.

That is by design.

>
> How can I make ansible exit upon the failure of any one of these
> nodes?

http://docs.ansible.com/ansible/playbooks_delegation.html#maximum-failure-percentage

You can set mail_fail_percentage: 0


>
> Or, how can I have some kind of handler to pause the run before
> continuing? (I've not yet looked into handlers)

Don't think so ....

>
> Playbook, plays, tasks, and output are shown below. One question about
> the output: for the node that failed, the task "debug: var=output" is
> absent. That task only fires for the successful node. Should I expect
> that task to also fire for the failed node? I was surprised by that.

No - once a node fails (without "ignore_errors: True"), it is no longer
part of the remainder of the play, so no further tasks will be executed
on the failed node.

Hope this helps

--
Karl E. Jorgensen

Kathy Allen

unread,
Aug 6, 2015, 4:15:36 PM8/6/15
to Ansible Project
Ah! Fantastic. Thank you. I put in max_fail_percentage, and the thing I wanted to happen happened.

I do wonder about how to more elegantly handle one of the nodes failing, with a handler. Like something simple to start: "prompt: pause here, go fix that node if you can. If you can't, ctrl-c now."  Perhaps I should add "ignore_failures: true" and experiment?

It's strange .. I do have another task that runs per webapp node that runs a local check script -- it's a ruby program that will exit non-zero upon error condition. When any node has failed that check, the ansible run comes to a screeching halt. That play contains no max_fail_percentage and no ignore_failure: true.

We use ansible 1.8.2.

I'll move forward with your advice. And, FWIW ... this bombs out the entire run when any node fails:

---
- hosts: '{{ myhosts }}'
  gather_facts
: False
  serial
: '{{ serial }}'
  tasks
:

 
- name: App port check
    shell
: app-port-check --config /opt/app/conf/config.yaml
   
when: port_check
   
register: oslout
 
- debug: var=oslout
   
when: debug and port_check

 
- name: Message private hipchat room
    hipchat
: token={{ hipchat_token }} room={{ priv_hipchat_room }} from={{ hipchat_user }} msg="App sidedoor check successful for {{ inventory_hostname }}"
   
when: verbose|bool and msg_private|bool and port_check|bool
    ignore_errors
: true



Reply all
Reply to author
Forward
0 new messages