I like the idea of having a powerful "--retry @retryfile" option
with sensible defaults. The retry file could be as simple as a yaml
file like the following:
---
hosts:
- host_who_failed_1
- host_who_failed_2
- host_who_failed_3
start_at: "The task that caused abnormal interruption"
notify:
- A hander already notified before the
abnormal interruption
- Another handler already notified before the
abnormal interruption
tags: all
The "hosts" list would be auto-generated with the hosts that have
failed, but it will be possible to remove/add some hosts, as well as
use a host selection pattern instead of a list.
The "start_at" would be auto-set to the task that has failed, since
usually you don't want to retry from the beginning. But could be
removed to retry from the beginning or changed to another task
before or after the task that failed. The last option (to retry
starting after the failed task) could become useful in case you
think that the failure is not that important and you don't want to
spend time fixing it at the time of the occurrence, but want a quick
workaround by bypassing it at first and fixing it later.
The "notify" directive would force the notification of the handlers
in the list. This list would initially be auto-generated with
handlers that had already been notified before the failure. The
ansible user will have the option to manipulate the list according
to what he thinks is best for recovering from the failure. He could
remove some items from the list or remove the whole list. He could
even add any extra handlers he thinks that are necessary.
The "tags" directive would be auto-set to "all" to retry tasks
whatever their tags may be, but could also be restricted by passing
a list with specific tags.
To make it even more powerful, the retry file could even support
"pre_tasks" and "post_tasks" lists of one-time, ad-hoc tasks that
the ansible user could quickly write to quickly work around
unpredicted problems caused from an unexpected failure, before
making a proper fix in his playbooks.
What do you think?