Force gathering facts on all hosts when using --tags or --limit

8,376 views
Skip to first unread message

Nick Groenen

unread,
Sep 10, 2013, 9:22:34 AM9/10/13
to ansible...@googlegroups.com
I have a very large playbook which configures our entire
infrastructure. Because of this, various steps are tagged so that only
specific parts of the playbook can be run, cutting down on runtime
when required.

Parts of this setup use facts/hostvars to automatically create correct
configuration files. For example, nginx config adding all the
application servers that are defined in the inventory to the correct
upstream definitions, and iptables on the appservers automatically
opening up the correct ports to the loadbalancers.

However, when running the playbook with --limit, or --tags, not all
hosts are contacted, and as a result, facts aren't available on every
system in the infrastructure. This causes all kinds of problems for my
setup, obviously.

Is there any way to force gathering of facts on all hosts, even when
specifying one of these options? Or another way to deal with this
situation that I haven't thought of?

Right now, I'm solving it for the --tags case by having one task at
the start of the playbook, which simply calls the ping module and has
every tag that's used listed. This way, this task is kicked off no
matter which tag is specified, causing facts to be gathered on every
system in our inventory.

Obviously, this isn't a practical solution however, nor does it solve
the case where limit it used.

Michael DeHaan

unread,
Sep 10, 2013, 10:00:33 AM9/10/13
to ansible...@googlegroups.com
Fact caching is something we want to look into for the 1.4 release.






--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Michael DeHaan <mic...@ansibleworks.com>
CTO, AnsibleWorks, Inc.
http://www.ansibleworks.com/

Walid

unread,
Sep 20, 2013, 5:05:18 PM9/20/13
to ansible...@googlegroups.com
Hi,

How about being able to specify which facts you need back ( some sort of white/black listing) so that in a large scale infrastructure  that does not cause any extra overhead on the network nor on the Ansible managment/console where these data gets reported back.

kind regards

Walid

Brian Coca

unread,
Sep 20, 2013, 5:17:44 PM9/20/13
to ansible...@googlegroups.com
@walid, setup module has this option, use the filters, from the examples:

ansible all -m setup -a 'filter=ansible_eth[0-2]'

Walid

unread,
Sep 21, 2013, 2:48:37 AM9/21/13
to ansible...@googlegroups.com
Thank you Brian, my mistake i should have read the documentation first. this is perfect, thanks


On 21 September 2013 00:17, Brian Coca <bria...@gmail.com> wrote:
@walid, setup module has this option, use the filters, from the examples:

ansible all -m setup -a 'filter=ansible_eth[0-2]'

Michael DeHaan

unread,
Sep 21, 2013, 9:23:52 AM9/21/13
to ansible...@googlegroups.com
Hi Walid,

It's also true that there really isn't a ginormous amount of data coming back from the facts modules, so it shouldn't be a problem.

If you are writing some modules of your own, you'll likely have a few facts modules that you might choose to not call on a case by case basis if they are intensive (walid_facts, etc)

--Michael

Kerry Kurian

unread,
Jan 10, 2014, 2:20:01 PM1/10/14
to ansible...@googlegroups.com
Have you found a way to solve this for the --limit case?

Nick Groenen

unread,
Jan 13, 2014, 11:52:35 AM1/13/14
to ansible...@googlegroups.com
On Fri, Jan 10, 2014 at 8:20 PM, Kerry Kurian <kku...@gmail.com> wrote:
> Have you found a way to solve this for the --limit case?

No I haven't. In our case, we're simply not using --limit for the time
being, and have our tags set up to still allow pretty granular
targeting. Fact caching is going to help somewhat with this problem,
once it's implemented.

--
Nick Groenen | zoni | @NickGroenen
https://zoni.nl | GnuPG/GPG key ID: 0xAB5382F6

Brian Coca

unread,
Jan 13, 2014, 11:59:22 AM1/13/14
to ansible...@googlegroups.com
implementing the offline caches might solve this as most host data should be available from previous runs.

Strahinja Kustudić

unread,
Mar 14, 2014, 12:28:46 PM3/14/14
to ansible...@googlegroups.com
I know this is now and old topic, but it didn't make sense to open a new one.

Having offline cache would be nice, but wouldn't be easier to make something like:

gather_facts: force or gather_facts_force: True

or something similar, so that this overrides --limit / --tags and always gathers facts.

Michael DeHaan

unread,
Mar 14, 2014, 12:42:26 PM3/14/14
to ansible...@googlegroups.com
We've discussed this and what we want to do with gather_facts is make a config setting

gather_facts_tendancy:  always or lazy, default always

and then if you want to force when it's lazy, you could just call the '- setup' module in the tasks section.





--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.

Strahinja Kustudić

unread,
Mar 14, 2014, 1:01:00 PM3/14/14
to ansible...@googlegroups.com
That is a cool solution. Do you have an estimate when will this feature be added?

Michael DeHaan

unread,
Mar 14, 2014, 1:11:02 PM3/14/14
to ansible...@googlegroups.com
I'm not actively working on it -- I believe Brian Coca had expressed interest in getting this in, also open to someone going ahead and doing it.

--Michael


Grzegorz Nosek

unread,
Mar 24, 2014, 1:43:34 PM3/24/14
to ansible...@googlegroups.com
Hi,

Sorry to dig up this thread again but it's also an issue for me.

AIUI, Strahinja Kustudić needs (just as I do) a way to always gather facts on all hosts, regardless of tags/limits. What your proposal does is (again, AIUI) improve performance without the need of gather_facts: false in every play. It's cool but different. We need different functionality, not better performance.

Consider:

---
- hosts: vpn_clients
  gather_facts: force
  tasks:
  - apt: pkg=openvpn-or-whatever

- hosts: vpn_hub
  tasks:
  - template: src=uses_facts_from_vpn_clients.j2

I'd love this playbook to do exactly the same thing on vpn_hub when ran with -l vpn_hub and without. Otherwise whenever I want to simply regenerate the vpn_hub's config, I'll either reconfigure all the clients again (time consuming), generate a totally broken config (no clients' facts on hub), or write a separate top-level playbook just for this one single task (not DRY and/or ravioli code).

FWIW, I'm totally fine with calling setup manually on these hosts, so maybe something like the following would be better? I'm willing to implement anything that stands a chance of being accepted.

- hosts: vpn_clients
  tasks:
  - setup:
    ignore_tags: true
    ignore_limit: true
  - apt: pkg=openvpn-or-whatever

- hosts: vpn_hub
  tasks:
  - template: src=uses_facts_from_vpn_clients.j2

Or make the ignore_* attributes of the play instead of a particular task if that makes anything easier.

All comments appreciated.

Best regards,
 Grzegorz Nosek

Brian Coca

unread,
Mar 24, 2014, 2:54:31 PM3/24/14
to ansible...@googlegroups.com
I was thinking something like:

```
setup: target={{item}}
with_items: groups['webservers']
```

this gives people more fine grained control, target will populate hostvars[target], vs current host.

or would it be better to use delegate_to?

Grzegorz Nosek

unread,
Mar 24, 2014, 3:00:34 PM3/24/14
to ansible...@googlegroups.com
W dniu 24.03.2014 19:54, Brian Coca pisze:
I'd say delegate_to (with a group, which isn't allowed ATM, I think?)
would be cleaner as it would transparently support other modules (like
custom facts) without modifying them. Also, to support this AFAIK you
need local action plugins, not just a module to run on the remote host.

So either this:

- setup:
delegate_to: webservers

Or this, which probably isn't valid Ansible either:

- setup:
delegate_to: "{{ item }}"
with_inventory_hostnames: webservers

Any of these would be fine with me.

Best regards,
Grzegorz Nosek

Brian Coca

unread,
Mar 24, 2014, 3:02:56 PM3/24/14
to ansible...@googlegroups.com
it would require changing current behavior, now if you do setup + delegate_to, you populate 'current' host with delegated host facts.

Grzegorz Nosek

unread,
Mar 24, 2014, 3:20:37 PM3/24/14
to ansible...@googlegroups.com
W dniu 24.03.2014 20:02, Brian Coca pisze:
> it would require changing current behavior, now if you do setup +
> delegate_to, you populate 'current' host with delegated host facts.

Right, my bad.

Still, I don't really like the target= as a parameter of the setup
module. It complicates the implementation too much IMHO. It effectively
duplicates the whole support of delegate_to with a minor change in
functionality (which I forgot and you pointed out).

Another play-level option (delegate_facts: true?) might be more
universal and would keep writing custom fact modules trivial.

Best regards,
Grzegorz Nosek

Strahinja Kustudić

unread,
Mar 24, 2014, 7:26:28 PM3/24/14
to ansible...@googlegroups.com
I know that I suggested this, but I really don't see a downside in using gather_facts: force. It is simple, seems the easiest to implement, it is easy to read and feels natural. Only downside I see is that gather_facts is a Boolean, and if we allowed "force", it will end being that and might feel a little strange.

On the other hand adding additional options to setup module like ignore_tags and ignore_limit is cool, but what I don't like about it is that you would need to do gather_facts: False before that. The same goes for using setup with delegate_to.

Don't get me wrong those additional parameters for the setup module are great ideas and I think Ansible should have them as well, so that you can gather custom facts, but Ansible is always being promoted as being extremely simple and gather_facts: force/always is as simple as it can be.

Michael DeHaan

unread,
Mar 24, 2014, 8:09:00 PM3/24/14
to ansible...@googlegroups.com
So the force/always stuff was in fact made configurable in ansible.cfg on 1.6.

It's not "per play", but it can be selected.   Thanks to Brain for this patch:

# plays will gather facts by default, which contain information about
# the remote system.
#
# smart - gather by default, but don't regather if already gathered
# implicit - gather by default, turn off with gather_facts: False
# explicit - do not gather by default, must say gather_facts: True
gathering = implicit




--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To post to this group, send email to ansible...@googlegroups.com.

Brian Coca

unread,
Mar 24, 2014, 8:10:20 PM3/24/14
to ansible...@googlegroups.com
it does not ignore limit or hosts clause though, which I think is what they are looking for

Michael DeHaan

unread,
Mar 24, 2014, 8:17:26 PM3/24/14
to ansible...@googlegroups.com
Right, just pointing that out in case the thread had changed scope.

As we have too many fish in the basket right now (mixed metaphors FTW) I don't think this is going to get stomped anytime soon with respect to gathering hosts outside the loop.




On Mon, Mar 24, 2014 at 8:10 PM, Brian Coca <bria...@gmail.com> wrote:
it does not ignore limit or hosts clause though, which I think is what they are looking for

--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To post to this group, send email to ansible...@googlegroups.com.

Grzegorz Nosek

unread,
Mar 25, 2014, 2:26:19 AM3/25/14
to ansible...@googlegroups.com
W dniu 25.03.2014 01:17, Michael DeHaan pisze:
> Right, just pointing that out in case the thread had changed scope.
>
> As we have too many fish in the basket right now (mixed metaphors FTW) I
> don't think this is going to get stomped anytime soon with respect to
> gathering hosts outside the loop.

If time is the only issue, please choose a solution and I'll implement
it. The candidates so far are (Brian, Strahinja, please correct me if I
got something wrong/omitted something):

1.

- hosts: foo
gather_facts: always

Pros: simple syntax, probably does 90% of the job
Cons: no way to support custom fact modules

2.

- hosts: localhost
tasks:
- setup: target={{ item }}
with_inventory_hostnames: foo

Pros: compact (no separate play required just to gather facts)
Cons: probably (relative) hell to implement, custom fact modules have to
reimplement target= by themselves

3.

- hosts: localhost
tasks:
- setup:
delegate_to: "{{ item }}"
delegate_facts: true
with_inventory_hostnames: foo

Pros: transparent wrt. custom facts, probably easy to implement
Cons: increasing line count (also: the variable name?)

4.

- hosts: foo
ignore_tags: true
ignore_limit: true
tasks:
- setup

Pros: doesn't require decorating every task if there are several,
explicit about what happens
Cons: "oh yeah, I lied, tags/limits do not always apply" suprise,
verbose for the 90% use case of just gathering some facts


If I were to decide, I'd go for 1 + 3 to make the easy thing easy (just
gather these facts whatever happens) and the hard thing possible
(explicitly collect custom facts from other hosts even when excluded by
tags/limit)


Best regards,
Grzegorz Nosek

Kerry Kurian

unread,
Mar 24, 2014, 2:31:14 PM3/24/14
to ansible...@googlegroups.com
re: always gathering facts on all hosts, regardless of tags/limits.

What I’ve been doing is creating a facts.yml file that looks like this:

- name: gather facts for api_endpoints
  hosts: api_endpoints

- name: gather facts for zookeepers
  hosts: zookeepers

… and so on …


Then, as the first directive of every top-level playbook I write this:

- include: facts.yml


Maybe this doesn’t hit all the cases that you need (?) but it’s worked for me so far. Hope that helps.
--
You received this message because you are subscribed to a topic in the Google Groups "Ansible Project" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ansible-project/f90Y4T4SJfQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ansible-proje...@googlegroups.com.

To post to this group, send email to ansible...@googlegroups.com.

Grzegorz Nosek

unread,
Mar 25, 2014, 3:33:08 PM3/25/14
to ansible...@googlegroups.com
W dniu 24.03.2014 19:31, Kerry Kurian pisze:
> re: always gathering facts on all hosts, regardless of tags/limits.
>
> What I’ve been doing is creating a facts.yml file that looks like this:
>
> —
> - name: gather facts for api_endpoints
> hosts: api_endpoints
>
> - name: gather facts for zookeepers
> hosts: zookeepers
>
> … and so on …
>
>
> Then, as the first directive of every top-level playbook I write this:
>
> - include: facts.yml
>
>
> Maybe this doesn’t hit all the cases that you need (?) but it’s worked
> for me so far. Hope that helps.

When you run ansible-playbook something.yml -l zookeepers, you won't get
facts for the api_endpoints hosts (if they do not overlap).

So unless I'm mistaken, no, this doesn't help me at all.

Best regards,
Grzegorz Nosek

Thijs Cadier

unread,
Jul 15, 2014, 9:38:09 AM7/15/14
to ansible...@googlegroups.com
I'm also running into this. Would be great if there was a way to enable fact gathering for all (or possibly a subset of) hosts when scoping on tags or hosts. Without something like that you always have to run on all machines to be able to get a list of ip addresses of machines for a firewall config, for example.

Henry Finucane

unread,
Jul 16, 2014, 12:28:41 AM7/16/14
to ansible...@googlegroups.com

I have a similar problem with a more limited scope- I'd like to be able to inspect group variables as applied to hosts without gathering facts everywhere- I use them to generate monitoring configuration.

It's a little intractable because they could be dynamic and depend on fact gathered variables, but I'd be happy dealing with that restriction.

--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.

To post to this group, send email to ansible...@googlegroups.com.

Thijs Cadier

unread,
Aug 14, 2014, 12:26:33 PM8/14/14
to ansible...@googlegroups.com
Any news or other workarounds for this? We've now converted our staging system to Ansible, but not sure how I can roll out to production. The problem is that we use the inventory set up hosts files and firewall rules, but we can't run Ansible on the entire production cluster for the migration. We need to do it host by host and check the state in between. Does anybody know of any workarounds to do this?

Michael DeHaan

unread,
Aug 14, 2014, 6:33:00 PM8/14/14
to ansible...@googlegroups.com
Yes.

Apologies for the weird archive link instead of the forum, but this is what Google juice turned up when I was looking for my post.





Thijs Cadier

unread,
Aug 15, 2014, 9:21:53 AM8/15/14
to ansible...@googlegroups.com
Awesome, works like a charm. The docs in this commit were helpful as well: https://github.com/ansible/ansible/commit/160ddf6b046c1a7976f356ed02d506223b6cd0ae


On Friday, August 15, 2014 12:33:00 AM UTC+2, Michael DeHaan wrote:
Yes.

Apologies for the weird archive link instead of the forum, but this is what Google juice turned up when I was looking for my post.

Michael DeHaan

unread,
Aug 15, 2014, 10:54:30 AM8/15/14
to ansible...@googlegroups.com

Craig Tracey

unread,
Oct 18, 2014, 4:31:58 PM10/18/14
to ansible...@googlegroups.com
I also have this issue with some playbooks that we have.  

Michael, nice work on the caching! While fact caching is definitely useful and something that we will likely implement, I also liked the idea of 'gather_facts_force.'  Therefore, I have gone ahead an implemented this via pull request:


Let me know what you all think.

Best,
Craig

Dmitry Zelenkovsky

unread,
Dec 19, 2014, 8:38:35 AM12/19/14
to ansible...@googlegroups.com
That's exactly what I'm looking for, please implement it with "gather_facts: force" - in playbook to override --limit & --tags settings.
Is there any way to push it to official release? vote?

BR
Dmtiry Z.

Michael DeHaan

unread,
Dec 19, 2014, 9:44:10 AM12/19/14
to ansible...@googlegroups.com
You're bumping a bit of an old thread.

The trick here is to use fact caching, have one play gather your facts, and other plays can just rely on those facts.



Reply all
Reply to author
Forward
0 new messages