dynamically generated inventory (add_host) and playbook failures

108 views
Skip to first unread message

Dmitry Makovey

unread,
Jul 14, 2016, 1:03:09 PM7/14/16
to Ansible Project
I'm building a playbook for patching our servers, however I keep on getting 

2016-07-14 09:30:29,186 p=55840 u=dimon |  PLAY [report] ****************************************************************** 
2016-07-14 09:30:29,211 p=55840 u=dimon |  ERROR! invalid host (somerandomhost1.stanford.edu) specified for playbook iteration
2016-07-14 09:33:06,935 p=65806 u=dimon |   [WARNING]: provided hosts list is empty, only localhost is available


In a nutshell: I'm using our server inventory DB (Pakiti) to extract list of hosts registered (wrote my own module for that). Then I walk through those hosts and select ones that are "alive" according to Pakiti into "pakiti_hosts" group. Then we have Facter facts on machines identifying their patching priority, so I do a round of "facter fact gathering" (which fails for some machines) and where it fails I set patch priority to 0. Then I'd like to execute certain set of commands across all those pakiti_hosts (which at present is a mere template being generated for report), but my playbook intermittently fails due to some hosts either not responding or some other things and I have to re-launch entire playbook from start. 

It seems that Ansible is OK with connection failures and skipping over those hosts, however for some reason some of the other errors lead to the above message. I've tried to work around this by introducing "blacklist" group in playbook, but when working with 500+ machines - there's always one or two that would fail. I'd like to complete the execution and revisit those boxes later. I've tried "ignore_errors" but I'd rather not add it to *every* block. What are my options? 

I realize that I could collect hosts via dynamic inventory as well but this way it seemed more natural to me, leveraging ansible facilities for that. 

In other words: how can I make my playbook more resilient? 


J Hawkesworth

unread,
Jul 14, 2016, 4:22:20 PM7/14/16
to Ansible Project
Maybe setting a max failure percentage for the play would help?

http://docs.ansible.com/ansible/playbooks_delegation.html#maximum-failure-percentage

I've not used maximum-failure-percentage myself so I don't know how easy it would be to identify the failed hosts to revisit though.

Jon

Dmitry Makovey

unread,
Jul 14, 2016, 5:06:09 PM7/14/16
to Ansible Project


On Thursday, July 14, 2016 at 1:22:20 PM UTC-7, J Hawkesworth wrote:
Maybe setting a max failure percentage for the play would help?

http://docs.ansible.com/ansible/playbooks_delegation.html#maximum-failure-percentage

I've not used maximum-failure-percentage myself so I don't know how easy it would be to identify the failed hosts to revisit though.


Thanks for the link Jon.

Interestingly enough documentation states: "By default, Ansible will continue executing actions as long as there are hosts in the group that have not yet failed." However in my case it seems not to be the case as playbook aborts with a single failure. Which is what getting me confused. 
Reply all
Reply to author
Forward
0 new messages