Absurd run_once behavior, skipping entirely if first node fails a when test

136 views
Skip to first unread message

Alex Hunt

unread,
Mar 19, 2018, 2:49:12 PM3/19/18
to Ansible Project
When running a task with run_once, if the first node is skipped, the entire task is skipped, rather than running on the first host that is not skipped.

This behavior is not what is intuitively understood, this behavior is not mentioned in the docs, and this behavior is almost certainly not what most of people want it to do. There are discussions of this in multiple github issues, the most detailed of which is at https://github.com/ansible/ansible/issues/19966, but there are also at least https://github.com/ansible/ansible/issues/11496, https://github.com/ansible/ansible/issues/13226, and https://github.com/ansible/ansible/issues/23594.

I was told by @bcoco to take this here, rather than discuss it in the existing github issues.

Below is an untested simple example of a scenario that would skip the run_once task, when it should (according to the docs, and common sense) run on one of either host2 or host3.

Inventory
[all]
host1
host2
host3

Playbook
- name: Test Play
  hosts
: all
  tasks
:
   
- include: outer-task.yml

outer-task.yml
- name: Outer task
  include
: inner-task.yml
 
when: inventory_hostname != 'host1'

inner-task.yml
- name: Inner task
  command
: do_something
  run_once
: True

This issue is exacerbated by the fact that the inner task may have no idea why the first host is skipped (IE: we're including a reusable task that may get run many times in different ways). In those cases, there is no way to work around the issue with a simple `when: inventory_hostname == something`, since we don't know what to check against.

In https://github.com/ansible/ansible/issues/19966, @bcoco proposes a scenario where one would rely on the existing behavior, but in my opinion and that of the other commenters on that ticket, that use case is an incredibly bad practice, as it relies on the specific order of the inventory file. I can think of no sane reason to want the current behavior. If the user is doing crazy things like this, they should just stay on the old broken versions forever, as any update is likely to break their fragile buggy code.

flowerysong

unread,
Mar 19, 2018, 3:15:13 PM3/19/18
to Ansible Project
On Monday, March 19, 2018 at 2:49:12 PM UTC-4, Alex Hunt wrote:
When running a task with run_once, if the first node is skipped, the entire task is skipped, rather than running on the first host that is not skipped.

This behavior is not what is intuitively understood, this behavior is not mentioned in the docs, and this behavior is almost certainly not what most of people want it to do. There are discussions of this in multiple github issues, the most detailed of which is at https://github.com/ansible/ansible/issues/19966, but there are also at least https://github.com/ansible/ansible/issues/11496, https://github.com/ansible/ansible/issues/13226, and https://github.com/ansible/ansible/issues/23594.

It may confuse some people, but it's both the documented behaviour and the least surprising way to do things. Conditionals should not affect the number of times a run_once task is evaluated even if they result in the task being skipped.

https://docs.ansible.com/ansible/latest/playbooks_delegation.html#run-once says: "When “run_once” is not used with “delegate_to” it will execute on the first host, as defined by inventory, in the group(s) of hosts targeted by the play - e.g. webservers[0] if the play targeted “hosts: webservers”."
 
Below is an untested simple example of a scenario that would skip the run_once task, when it should (according to the docs, and common sense) run on one of either host2 or host3.

Inventory
[all]
host1
host2
host3

Playbook
- name: Test Play
  hosts
: all
  tasks
:
   
- include: outer-task.yml

outer-task.yml
- name: Outer task
  include
: inner-task.yml
 
when: inventory_hostname != 'host1'

inner-task.yml
- name: Inner task
  command
: do_something
  run_once
: True

This issue is exacerbated by the fact that the inner task may have no idea why the first host is skipped (IE: we're including a reusable task that may get run many times in different ways). In those cases, there is no way to work around the issue with a simple `when: inventory_hostname == something`, since we don't know what to check against.

You're mixing different ways of limiting where a task runs, with predictable results (the task is assigned to one host, and the conditional results in it being skipped). If you don't care which host it runs on, use run_once without a conditional. If you want to run it on a specific host, use delegate_to with run_once or a conditional without run_once.

Alex Hunt

unread,
Mar 19, 2018, 4:33:09 PM3/19/18
to Ansible Project
I think you're confused by what the issue is. Whether I use delegate_to or not is irrelevant. I don't care which host it runs on, and if I did, I would use the delegate_to. Even if I use delegate_to, it will still be skipped, since it evaluates whether to run the task at all based on the first host. I'm sorry I didn't include a delegate_to in my example, which lead to this confusion.

http://docs.ansible.com/ansible/latest/playbooks_delegation.html#run-once makes no mention of the fact that even with delegate_to it decides whether to run at all based on the first host. The mention of delegate_to actually makes this more confusing, since that The delegate_to should control where the execution happens (something irrelevant to this issue), not whether it runs at all. That part is at least consistent, since delegate_to does not control whether to run it.

The issue is that run_once is not actually running once. It is "run only if the first node in the play says to run it", not "run one time if it should run for any host in the play". The latter is intuitive behavior. You talk of predictable results, and it is not predictable to have behavior that changes based on the order of hosts in your inventory file (the current behavior).

Please note that in my example, the when clause is NOT on the task with run_once. If we make reusable code, we may be including that piece in many places, with or without the when clause.

Matt Martz

unread,
Mar 19, 2018, 4:54:06 PM3/19/18
to ansible...@googlegroups.com
The behavior is documented via that information provided above. `run_once` in it's current form is designed to be consistent and predictable in which host is picked to execute the task against.

> When “run_once” is not used with “delegate_to” it will execute on the first host, as defined by inventory, in the group(s) of hosts targeted by the play

If the first host is failed, it is removed from the play list, and run_once will therefore be skipped.

Using `delegate_to` allows you to define what you believe is consistent or predictable. If you don't care what host it executes on, using `delegate_to` can be made to do what you want:

- command: whoami
  run_once: true
  delegate_to: "{{ ansible_play_hosts|first }}"

`ansible_play_hosts` is updated as hosts fail.

So if it started as:

    "ansible_play_hosts": [
        "host0",
        "host1",
        "host2",
        "host3",
        "host4"
    ]

and `host0` failed, that `delegate_to` above will utilize `host1`.  Instead of first, something like `random` could be used too.

If you wish to add further, constructive, clarification to the docs, and potentially examples such as the one I provide above, feel free to submit a documentation pull request.

--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-project+unsubscribe@googlegroups.com.
To post to this group, send email to ansible-project@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/1b542a8f-8f75-462a-8c11-0fd0c76054dc%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Matt Martz
@sivel
sivel.net

Alex Hunt

unread,
Mar 19, 2018, 5:18:31 PM3/19/18
to Ansible Project
The issue has nothing to do with delegate_to. The issue has to do with whether it decides to run at all, which delegate_to has no effect on. I don't care which host it runs on, and if I did, I could use delegate_to as you have noted. The delegate_to directive works properly.

It is totally fine for it to execute on the first host in the inventory, as long as it runs when that host is skipped for the included task book. I apologize if I'm not being clear about what the issue is.

Here's the same example, updated with a delegate_to, since everyone seems to think that matters.

Inventory
[all]
host1
host2
host3

Playbook
- name: Test Play
  hosts
: all
  tasks
:
   
- include: outer-task.yml

outer-task.yml
- name: Outer task
  include
: inner-task.yml
 
when: inventory_hostname != 'host1'

inner-task.yml
- name: Inner task
  command
: do_something
  run_once
: True
  delegate_to: 'host2'

In this example, it should run on host2, but it does not, since host1 skips the entire inner-task.yml. This is the problem.

In my original example, I didn't care if it ran on host1, as long as it ran, but it doesn't run at all.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To post to this group, send email to ansible...@googlegroups.com.

Marcos Alano

unread,
Mar 19, 2018, 5:23:53 PM3/19/18
to ansible...@googlegroups.com
You mean the tasks should run in the first node available? If host1 is
unavailable the tasks should run in the host2, but if host2 is also
unavailable should run in host3 and so on?
I think is a valid concern. Even delegate_to could be set to a host
which may be is unavailable. The idea here is the tasks must run,
don't care which host, but must run anyway. May be an option?
> https://groups.google.com/d/msgid/ansible-project/38cf4ed5-6285-464a-a81a-68a44d04709a%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Marcos H. Alano
Linux System Administrator
marcos...@gmail.com

Matt Martz

unread,
Mar 19, 2018, 5:27:46 PM3/19/18
to ansible...@googlegroups.com
I fully understand what you are saying. However the difference here is that you have a misunderstanding about the feature.  You have an idea in your head, that doesn't match the implementation.

The way `run_once` works, is that it defaults to execute on the first host in the list of hosts on the play, as defined by inventory.  If that host is failed, that task is then skipped.  Using `delegate_to` offers you a way to avoid your specific scenario, as it permits you to change what host ansible targets.  Take special care to re-read what I wrote, instead of ignoring it.  I recommend using `ansible_play_hosts` in `delegate_to` to ensure it always targets an available host.  But that may not meet every persons requirements.  You will have to implement a `delegate_to` on that host that properly reflects what host to operate on if the "first" host is not available.

Unfortunately, your expectation doesn't align with the implementation and our definition of what is expected here.

I'm telling you how to do what you want, within the context of how `run_once` actually works. We have no intentions on changing how `run_once` works.  You'll have to operate within the confines of how run_once *actually* operates.

To unsubscribe from this group and stop receiving emails from it, send an email to ansible-project+unsubscribe@googlegroups.com.
To post to this group, send email to ansible-project@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/38cf4ed5-6285-464a-a81a-68a44d04709a%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Alex Hunt

unread,
Mar 19, 2018, 5:28:10 PM3/19/18
to Ansible Project
That'd be a perfectly fine solution, yes. I honestly don't even care if it always chooses to run on host1, as long as it doesn't only use host1 to determine if it should run at all.

Alex Hunt

unread,
Mar 19, 2018, 5:53:44 PM3/19/18
to Ansible Project
@Matt Martz I have no problem with how delegate_to works. I don't care if it executes on host1. The stuff you wrote does not actually change if it gets run at all, only where it would be run if host1 had failed a prior task. It isn't that it tries to run it and fails, but that it doesn't even try to run. The host1 is still in the list of play_hosts, even if it is skipped, so it is still used to determine if we should run. I have actually run the code you wrote, and it does not solve this issue.

If you're worried about breaking some obscure code that relies on skipping the task entirely based on the order of the hosts in the inventory, that's fine, but the community needs a way to reliably decide to run exactly one time. It's totally fine for it to be a new directive "actually_run_once".

Where it executes doesn't matter, but that it executes at all, does. It is trivial to use the properly working "delegate_to" clause to control where the task is actually run, but it has no effect on if the ansible tries to run it in the first place.

My interpretation of your code is that you are trying to supply a host to execute the task on in the case that host1 has failed out of the execution due to a prior task failure. In my example, host1 is reachable, working properly, and has not failed any tasks. It is simply skipped due to a when clause that is not attached to the task with run_once. It would be perfectly acceptable and in line with the documentation for the task to execute on host1, but instead the entire task is skipped.

The problem is that the decision to run the task is tied to the first host in the play, not that the execution defaults to the first host in the play.

Alex Hunt

unread,
Mar 19, 2018, 6:06:13 PM3/19/18
to Ansible Project
Heres some actual execution output for my second example, and for the one from @Matt Martz.

Mine
(.env) [exabeam@ip-10-10-2-162 test]$ ll
total
16
-rw-rw-r-- 1 exabeam exabeam  79 Mar 19 21:57 inner-task.yml
-rw-rw-r-- 1 exabeam exabeam 206 Mar 19 21:59 inventory
-rw-rw-r-- 1 exabeam exabeam  83 Mar 19 21:57 outer-task.yml
-rw-rw-r-- 1 exabeam exabeam  70 Mar 19 21:56 play.yml
(.env) [exabeam@ip-10-10-2-162 test]$ cat inventory
[all]
host1 ansible_host
=10.10.2.162
host2 ansible_host
=10.10.2.173
host3 ansible_host
=10.10.2.206

[all:vars]
ansible_port
=22
ansible_ssh_private_key_file
=/home/exabeam/devkey.pem
ansible_ssh_user
=exabeam
(.env) [exabeam@ip-10-10-2-162 test]$ cat play.yml
- name: Test Play
  hosts
: all
  tasks
:
   
- include: outer-task.yml
(.env) [exabeam@ip-10-10-2-162 test]$ cat outer-task.yml
- name: Outer task
  include
: inner-task.yml
 
when: inventory_hostname != 'host1'
(.env) [exabeam@ip-10-10-2-162 test]$ cat inner-task.yml
- name: Inner task
  command
: hostname
  run_once
: True
  delegate_to
: 'host2'
(.env) [exabeam@ip-10-10-2-162 test]$ ansible-playbook -i inventory play.yml

PLAY
[Test Play] ***************************************************************

TASK
[setup] *******************************************************************
ok
: [host1]
ok
: [host3]
ok
: [host2]

TASK
[Inner task] **************************************************************
skipping
: [host1]

PLAY RECAP
*********************************************************************
host1                      
: ok=1    changed=0    unreachable=0    failed=0
host2                      
: ok=1    changed=0    unreachable=0    failed=0
host3                      
: ok=1    changed=0    unreachable=0    failed=0

@Matt Martz' (the only difference is the delegate_to line):
(.env) [exabeam@ip-10-10-2-162 test]$ cat inner-task.yml
- name: Inner task
  command
: hostname
  run_once
: True
  delegate_to
: "{{ ansible_play_hosts|first }}"
(.env) [exabeam@ip-10-10-2-162 test]$ ansible-playbook -i inventory play.yml

PLAY
[Test Play] ***************************************************************

TASK
[setup] *******************************************************************
ok
: [host1]
ok
: [host3]
ok
: [host2]

TASK
[Inner task] **************************************************************
skipping
: [host1]

PLAY RECAP
*********************************************************************
host1                      
: ok=1    changed=0    unreachable=0    failed=0
host2                      
: ok=1    changed=0    unreachable=0    failed=0
host3                      
: ok=1    changed=0    unreachable=0    failed=0

In both cases, the task in inner-task.yml is skipped, since host1 does not match the when clause in outer-task.yml. The delegate_to makes no difference. If I left that out, it would behave the same. The issue is not where it runs, but that it doesn't run at all. It would be perfectly acceptable for it to execute on host1, which is in line with the docs, but it doesn't run at all.

Josh Smift

unread,
Mar 19, 2018, 9:42:58 PM3/19/18
to Ansible Project
My use case involved roles. I had something like

- hosts: web:app:db
  roles:
    - role: myrole
      when: color == "blue"

In the role, there was a task that ran on localhost (via delegate_to), but only once (via run_once) for the whole batch of hosts.

Everything worked fine, except that if the first host in inventory happened not to be blue, the run_once caused the localhost task to be skipped. The order of the hosts in inventory was completely arbitrary -- these were EC2 instances at AWS.

The eventual workaround was to add the `when` to every single task in the role *except* the run_once one, which made both the playbook and the role less readable.

I don't have any hope that the Ansible team will ever address this; for whatever reason, this use case is relatively common among people who aren't on the Ansible team, and impossible to explain to the Ansible team in a way that anyone finds convincing. We haven't yet found a blocker that we couldn't work around in one ugly-ass way or another.

flowerysong

unread,
Mar 20, 2018, 1:13:58 AM3/20/18
to Ansible Project
On Monday, March 19, 2018 at 9:42:58 PM UTC-4, Josh Smift wrote:
My use case involved roles. I had something like

- hosts: web:app:db
  roles:
    - role: myrole
      when: color == "blue"

In the role, there was a task that ran on localhost (via delegate_to), but only once (via run_once) for the whole batch of hosts.

Everything worked fine, except that if the first host in inventory happened not to be blue, the run_once caused the localhost task to be skipped. The order of the hosts in inventory was completely arbitrary -- these were EC2 instances at AWS.

The eventual workaround was to add the `when` to every single task in the role *except* the run_once one, which made both the playbook and the role less readable.

Disregarding the implementation details, `run_once: true` is effectively the same as adding `when: inventory_hostname == ansible_play_batch.0` to the task . As I said before, if that's not what you want you should instead write a conditional that expresses your actual intent.

Here's one approach:

- group_by:
    key: color_{{ color | default('octarine') }}
    
- name: Run on localhost once
  delegate_to: localhost
  debug:
    msg: His pills, his hands, his jeans
  when: inventory_hostname == groups.color_blue.0

Or you might opt for something like this, which is overly clever and requires a very recent version of Jinja:

- name: Run on localhost once
  delegate_to: localhost
  debug:
    msg: Suddenly I was a lilac sky
  vars:
    first_host: "{{ ansible_play_hosts | map('extract', hostvars) | selectattr('color', 'defined') | selectattr('color', 'equalto', 'blue') | first }}"
  when: inventory_hostname == first_host.inventory_hostname

Or you might restructure the playbook so it only runs the role on blue hosts and doesn't need a separate conditional, and use `run_once` on the task. The best approach depends on personal taste and other decisions made in writing the playbook and setting up your Ansible environment.

Josh Smift

unread,
Mar 20, 2018, 9:56:54 AM3/20/18
to Ansible Project
Yep, your suggestions there are the kind of things I had in mind with the phrase "ugly-ass workaround". :^) (They're task-level, and can't be applied to the inclusion of the role in the playbook; they require baking logic about the way you manage colors into the role; etc.)

`run_once: true` is effectively the same as adding `when: inventory_hostname == ansible_play_batch.0` to the task .

This is a very clear and concise way to put it, and highlights exactly how run_once works, and why it doesn't mean "run this task once", but "run this task on the first host in the list of hosts in the play, not the first host that you're actually running tasks on". It's not a guarantee that a task will run once, it's an alias for a common `when` pattern.

I'm juggling too many other things to want to put in a documentation PR right now, but if anyone else does, I think this would be useful to clarify. In particular, where the docs say

When “run_once” is not used with “delegate_to” it will execute on the first host, as defined by inventory, in the group(s) of hosts targeted by the play - e.g. webservers[0] if the play targeted “hosts: webservers”. 
 
This approach is similar to applying a conditional to a task
 
I think it'd be clearer if this said something about how it *always* executes in the context of the first host, as defined by inventory, in the group(s) of hosts targeted by the play -- the delegate_to part doesn't change that, it just changes which host actually runs the task -- and that this isn't just "similar" to applying a conditional to a task, it's *identical* to supplying a conditional to a task, and that in particular, this condition is logical-AND-ed with any other conditions on the task, such that if the task has conditions that cause it to get skipped on the first host in inventory, this condition will cause it to get skipped on all the other hosts as well.

Brian Coca

unread,
Mar 20, 2018, 10:07:27 AM3/20/18
to Ansible Project
A couple of clarifications, these are important when you hit the
corner case in which it matters:

- its not 'run on the first host in play/inventory' its 'run on the
first host that reaches the task' which means that hosts that fail
and/or are removed in previous tasks are not considered. Normally (in
the absence of failure) this does mean the first host in
play/inventory, changing to other strategies can affect this.

- it is 'mostly' equivalent to `when: inventory_hostname ==
ansible_play_batch.0` but there is one major difference, other hosts
are not 'skipped', they are all given the same status/return from the
single execution.


The feature should really be named
'only_first_host_tries_to_run_and_applies_status_to_rest'. To make it
work as 'run_first_host_that_matches_when' would make the part of
applying the status to all hosts a lot more difficult to do sanely ...
do we set 'skipped' for the ones we skipped? do we set same status for
all hosts?

At this point I don't see us modifying the feature (maybe clarifying
docs?), but I'm open to create a new set of keywords that allows for
the difference in range of behaviors not already available via
conditional construction.

--
----------
Brian Coca

Alex Hunt

unread,
Mar 20, 2018, 1:46:38 PM3/20/18
to Ansible Project
Thanks everyone, I think the root of the issue is finally clear.

I'd love to have a 'run_first_host_that_matches_when' keyword, though I understand there might be technical issues getting in the way. I think the most intuitive way to set the status would be to set it for all hosts that match the when, but that's a bit of an arbitrary thing.

The current set of keywords makes it very difficult to make reusable code. If we want to include roles or tasks conditionally, we simply can't use run_once safely. Even the "ugly-ass workarounds" mentioned above are not possible in some instances, since the conditions might be completely different depending on where it gets included.

Another alternative would be to expose the list of hosts skipped by outer when clauses in another variable that can be used in inner when clauses. Something like "ansible_not_already_skipped_hosts". I know the logic is to skip the task, not the host, but when using includes, it's effectively the same. If we haven't done any includes with when clauses, it'd be effectively the same as ansible_play_hosts.

If we had a variable like that, I could put something like this in the inner-task.yml:
- name: Inner task
  command
: hostname

 
when: inventory_hostname == ansible_not_already_skipped_hosts[0]
If I needed to save the result to all hosts, I could register the result and then follow up with a set_fact using the var from that one host.

This isn't as clean as having a 'run_first_host_that_matches_when' keyword, but still prevents the "ugly-ass workarounds" from being needed on EVERY included task.

James 'zofrex' Sanderson

unread,
Mar 23, 2018, 6:23:39 AM3/23/18
to Ansible Project
Thank you everyone who chimed in on this discussion, it's really helped me understand how run_once works.

As for workarounds - Alex I think you were on the right track with the include, but it needs to be dynamic. I think this does what you want:

inventory:

machine-a
machine
-b

playbook.yml:

- hosts: all
  gather_facts: no
  tasks:
    - name: first task
      ping:

    - name: include for second task
      include_tasks: task.yml
      when: inventory_hostname == 'machine-b'

task.yml:

- name: second task
  ping:
  run_once: true

Because this include is dynamic instead of static, the first host - "machine-a" - doesn't encounter the "run_once" task at all, so it runs when the second host reaches it. (The reason this doesn't happen with your example in the first message on this thread is because the import is static: the include task runs for all machines, and the condition applies to the tasks it includes, so it's equivalent to having the "when" written on the run_once task.)

This example is trivial, but in a larger setup this could be used to run a task only once, on a host matching certain conditions, in a way that is far less fragile than depending on inventory order. Hopefully this is helpful to anyone trying to combine run_once and conditions.

Brian Coca

unread,
Mar 23, 2018, 9:51:01 AM3/23/18
to Ansible Project
I updated the docs in an effort to clarify this,
https://github.com/ansible/ansible/pull/37754, any suggestions that
help avoid more confusion on this subject are welcomed.


--
----------
Brian Coca

Alex Hunt

unread,
Mar 23, 2018, 3:06:11 PM3/23/18
to Ansible Project
Thank you Brian for updating the docs. That makes it much more clear.

And thank you James for the workaround!!! The change to dynamic include_tasks rather than the older static include statement seems to work great.

Note for people on older releases, you have to be running at least Ansible 2.4 to have include_tasks.

James 'zofrex' Sanderson

unread,
Apr 3, 2018, 3:15:27 PM4/3/18
to Ansible Project
Further note for people on older releases (2 or newer I believe, < 2.4):

You can achieve the same thing as include_tasks with include and static: no, like so:

    - name: include for second task
      include
: task.yml
     
static: no
     
when: inventory_hostname == 'machine-b'

Hope that helps!
Reply all
Reply to author
Forward
0 new messages