ansible-playbook exits if host list empty, even if later plays could still run

1,526 views
Skip to first unread message

var...@seas.upenn.edu

unread,
Sep 13, 2014, 7:07:37 PM9/13/14
to ansibl...@googlegroups.com
Given the following inventory:

[common:children]
groupA
groupB

[groupA]
hostA

[groupB]
hostB

And the following playbook (site.yml):

---
- include: groups/common.yml
- include: groups/groupA.yml
- include: groups/groupB.yml

If we run "ansible-playbook site.yml" when hostA is down, it runs the "common" play on hostB (while noticing that hostA doesn't respond), then fails on the "groupA" play saying "FATAL: no hosts matched or all hosts have already failed -- aborting".  It then exits without proceeding to run groupB.yml, even though hostB is up and could run its plays.  I can't tell whether this is desired behavior or a bug.  I think it would make more sense to continue on and run the next play(s) on any remaining hosts.  The following patch changes this behavior.  The comment on the existing code doesn't tell me a whole lot about why it currently works the way it does.  Is this something that should be changed?  If so, maybe the other places that playbook._run_play returns False should also be changed for consistency?

Kris
0001-playbook-should-continue-to-the-next-play-even-if-th.patch

Michael DeHaan

unread,
Sep 15, 2014, 4:17:28 PM9/15/14
to var...@seas.upenn.edu, ansibl...@googlegroups.com
This is intended behavior.

Ansible removes a host from a pool and will stop a deployment if no members in a group were successful.

This number can actually be made more strict with max_percentage_fail adjustments - which is not exclusively restricted to rolling updates and the serial keyword.


If you include plays in the second part, just run that second playbook directly and you can skip the offending portion.

--
You received this message because you are subscribed to the Google Groups "Ansible Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-deve...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tony Kinsley

unread,
Nov 23, 2014, 1:41:27 AM11/23/14
to ansibl...@googlegroups.com, var...@seas.upenn.edu
I also have a use case where I would like the playbook to continue even though I have plays that act on one host and that host may have failed earlier. I tried to set the max_fail_percentage = 100% which I expected could not be exceeded so it would continue, but the playbook still fails out. Would that be expected behavior or is there no way to continue on in the playbook? 

Michael DeHaan

unread,
Nov 24, 2014, 5:56:49 PM11/24/14
to Tony Kinsley, ansibl...@googlegroups.com, var...@seas.upenn.edu
So let's make sure I'm understanding the use case -- you are using a rolling update and you want it to continue on and update as many hosts as possible, and keep going even if an entire previous "batch" fails?

This seems a bit dangerous so wanting to understand the "why", which may help answer the "how".

In the future this is probably a good question for ansible-project list, as it is usage related versus about developing code for Ansible.

Thanks!


Tony Kinsley

unread,
Nov 24, 2014, 9:19:59 PM11/24/14
to ansibl...@googlegroups.com, tkins...@gmail.com, var...@seas.upenn.edu
I am actually not using a rolling update. I currently use a single large playbook ( with lots of roles and includes ) to install our entire environment. The system includes 5 different server types and some different clients as well. But for us, if one of the components fails a part of the install ( like setting up on the apache servers ) it doesnt affect the installation of the rest of the system. We can go back and fix that one component after all the other components install. We cannot always do this, and all the components are installed together because there is a level of coordination that happens sometimes. For instance you cannot setup apache until you enroll the machine with the pki system which you cannot do until the pki system has been installed, however there are multiple apache servers and if one of them fails to be configured correctly, there is no component that would fail later in the playbook. Obviously the system will not work correctly until the failure is fixed, but the one host failing kills all the other hosts from finishing. I guess I was just looking to let all the others finish, and fix the one failure seperately. I have included a snippet of our playbook with where I want to put the max_fail_percentage: 100. I do not want to ignore the failures, I just would like to move on in the playbook ( without the host that failed ). 

- hosts: common:!windows
  sudo: yes

  roles:
    - { role: common, tags: [ install ] }
    - auditd
    - ssh

- hosts: common:&windows 
  sudo: yes

  roles:
    - { role: win-common, tags: [ install ] }

#------------------------------------------------------------------------------
# Install specific server type configuration
#------------------------------------------------------------------------------
- hosts: pki # just one of the hosts
  sudo: yes

  roles:
    - pki

- hosts: common

  roles:
    - cert-setup

- hosts: central # just one host our of all of the hosts
  max_fail_percentage: 100
  sudo: yes

  roles:
    - central-gui

- hosts: ldap-servers # just two hosts out of all the hosts
  max_fail_percentage: 100
  sudo: yes

  roles:
    - ldap

- hosts: common
  
  roles:
    - deploy-config



My guess is the solution is to break up this playbook and use a wrapper script to run the plays seperately ( which could be nice, cause I could run some of the one off plays in parallel then ). Just seems like it would be simpler to implement in ansible.

Tony

Also is there a way to transfer this to the ansible-project list? or 

Mike Biancaniello

unread,
Mar 11, 2016, 2:51:36 PM3/11/16
to Ansible Development
I've run into the same thing and am curious if there is known better way to do it.

In my case, I've separated my playbooks and site.yml includes them all.

The problem is that if all hosts from play1 fail, then play2 never executes.


cat stuff-pass.yml
---
- name: expected to pass
  hosts
: localhost

  tasks
:
 
- name: show stuff
    debug
: msg="this should pass"


cat stuff-fail.yml
---
- name: expected to fail
  hosts
: should_fail
  connection
: local
  gather_facts
: no

  tasks
:
 
- name: This should fail
    debug
: msg="This should fail"
    failed_when
: true


If the pass runs first, they all run:

cat stuff.yml
---
- include: stuff-pass.yml
- include: stuff-fail.yml


ansible-playbook -i hosts stuff.yml

PLAY [expected to pass] ********************************************************

TASK
[setup] *******************************************************************
ok
: [localhost]

TASK
[expected to pass] ********************************************************
ok
: [localhost] => {
   
"msg": "this should pass"
}

PLAY
[expected to fail] ********************************************************

TASK
[This should fail] ********************************************************
fatal
: [testhost1]: FAILED! => {
   
"changed": false,
   
"failed": true,
   
"failed_when_result": true,
   
"msg": "This should fail"
}

NO MORE HOSTS LEFT
*************************************************************
    to
retry, use: --limit @stuff.retry

PLAY RECAP
*********************************************************************
localhost                  
: ok=2    changed=0    unreachable=0    failed=0  
testhost1                  
: ok=0    changed=0    unreachable=0    failed=1  




If the pass runs second, they don't:

cat stuff.yml
---
- include: stuff-fail.yml
- include: stuff-pass.yml

ansible-playbook -i hosts stuff.yml

PLAY [expected to fail] ********************************************************

TASK
[This should fail] ********************************************************
fatal
: [testhost1]: FAILED! => {
   
"changed": false,
   
"failed": true,
   
"failed_when_result": true,
   
"msg": "This should fail"
}

NO MORE HOSTS LEFT
*************************************************************
    to
retry, use: --limit @stuff.retry

PLAY RECAP
*********************************************************************
testhost1                  
: ok=0    changed=0    unreachable=0    failed=1

 



Mike Biancaniello

unread,
Mar 11, 2016, 2:58:31 PM3/11/16
to Ansible Development, var...@seas.upenn.edu
Never mind my last post, I didn't see this one when I replied.

I think the confusion is that 'stop deployment' apparently means 'stop deployment for all groups'. The documentation is ambiguous and my interpretation was always that it meant 'stop deployment for that group'.

I'm going to assume that there is no mechanism to change this behavior.
Reply all
Reply to author
Forward
0 new messages