Optimizing 'gather_facts' response time over networking devices

502 views
Skip to first unread message

jean-christophe manciot

unread,
Apr 9, 2021, 1:31:32 PM4/9/21
to Ansible Project
With:
- ansible 3.2.0 (pip3)
- ansible-base 2.10.7 (pip3)

The goal is twofold:
- gather the facts only from the local cache when the cache is not empty and its timeout <fact_caching_timeout> has not expired
- contact the remote network device only when the cache is empty or the timeout has expired

I made some tests with the following settings in /etc/ansible/ansible.cfg:
        fact_caching = redis
        fact_caching_timeout = 3600
        fact_caching_connection = localhost:6379:0:<redis_password>

Making sure that redis is running with requirepass <redis_password> in /etc/redis/redis.conf:
        $ sudo systemctl status redis
        ● redis-server.service - Advanced key-value store
             Loaded: loaded (/lib/systemd/system/redis-server.service; enabled; vendor preset: enabled)
             Active: active (running) since Fri 2021-04-09 12:09:54 CEST; 7h ago
               Docs: http://redis.io/documentation,
                     man:redis-server(1)
           Main PID: 1610 (redis-server)
             Status: "Ready to accept connections"
              Tasks: 5 (limit: 18973)
             Memory: 5.7M
             CGroup: /system.slice/redis-server.service
                     └─1610 /usr/bin/redis-server 127.0.0.1:6379

        Apr 09 12:09:54 host systemd[1]: Starting Advanced key-value store...
        Apr 09 12:09:54 host systemd[1]: Started Advanced key-value store.

Running the following simple play over an IOS device:
-  "Gathering Facts" took several seconds indicating that the remote device has been contacted
- "Pinging remote device to create/update facts cache" was almost instantaneous

        - name: Gathering remote device facts preferably from the cache 
          hosts: all
          vars:
                ansible_connection: network_cli
          gather_facts: yes
          strategy: debug
          tasks:
                - name: Pinging remote device to create/update facts cache
                  ping:             

leads to:

        ansible-playbook 2.10.7
          config file = /etc/ansible/ansible.cfg
          ansible python module location = /usr/local/lib/python3.9/dist-packages/ansible
          executable location = /usr/local/bin/ansible-playbook
          python version = 3.9.4 (default, Apr  4 2021, 19:38:44) [GCC 10.2.1 20210401]
        Using /etc/ansible/ansible.cfg as config file
        Parsed lab/hosts inventory source with ini plugin
        redirecting (type: cache) ansible.builtin.redis to community.general.redis
        Redis connection: Redis<ConnectionPool<Connection<host=localhost,port=6379,db=0>>>
        redirecting (type: callback) ansible.builtin.yaml to community.general.yaml
        redirecting (type: callback) ansible.builtin.yaml to community.general.yaml
        Skipping callback 'default', as we already have a stdout callback.
        Skipping callback 'minimal', as we already have a stdout callback.
        Skipping callback 'oneline', as we already have a stdout callback.
         __________________________________________
        < PLAYBOOK: sdxlive_gather_facts_tests.yml >
         ------------------------------------------
                \   ^__^
                 \  (oo)\_______
                    (__)\       )\/\
                        ||----w |
                        ||     ||

        1 plays in sdxlive_gather_facts_tests.yml
         __________________________________________________________
        / PLAY [Gathering remote device facts preferably from the \
        \ cache]                                                   /
         ----------------------------------------------------------
                \   ^__^
                 \  (oo)\_______
                    (__)\       )\/\
                        ||----w |
                        ||     ||

         ________________________
        < TASK [Gathering Facts] >
         ------------------------
                \   ^__^
                 \  (oo)\_______
                    (__)\       )\/\
                        ||----w |
                        ||     ||

        task path: playbooks/sdxlive_gather_facts_tests.yml:1
        redirecting (type: connection) ansible.builtin.network_cli to ansible.netcommon.network_cli
        [WARNING]: Ignoring timeout(30) for cisco.ios.ios_facts
        Using module file /opt/ansible_collections/cisco/ios/plugins/modules/ios_facts.py
        Pipelining is enabled.
        <172.21.16.79> ESTABLISH LOCAL CONNECTION FOR USER: admin
        <172.21.16.79> EXEC /bin/bash -c '/usr/bin/python3 && sleep 0'
        ok: [XEv_Spine_31]
        META: ran handlers
         ___________________________________________________________
        < TASK [Pinging remote device to create/update facts cache] >
         -----------------------------------------------------------
                \   ^__^
                 \  (oo)\_______
                    (__)\       )\/\
                        ||----w |
                        ||     ||

        task path: playbooks/sdxlive_gather_facts_tests.yml:8
        redirecting (type: connection) ansible.builtin.network_cli to ansible.netcommon.network_cli
        Using module file /usr/local/lib/python3.9/dist-packages/ansible/modules/ping.py
        Pipelining is enabled.
        <172.21.16.79> ESTABLISH LOCAL CONNECTION FOR USER: admin
        <172.21.16.79> EXEC /bin/bash -c '/usr/bin/python3 && sleep 0'
        ok: [XEv_Spine_31] => changed=false 
          invocation:
            module_args:
              data: pong
          ping: pong
        META: ran handlers
        META: ran handlers

If I use gather_facts: no instead, the network facts are not gathered from the cache at all.

How can we ensure that the facts are read only from the cache, except when it is empty for that device or expired?

jean-christophe manciot

unread,
Apr 10, 2021, 1:39:08 AM4/10/21
to Ansible Project
I retried the playbook and I rectify a previous assertion: with 'gather_facts: no', the facts are read from the cache as expected.
However,  I confirm that with  'gather_facts: yes', the facts are always gathered from the remote host, regardless of the cache  timeout value.

I also tried the same playbook over a compute node (Ubuntu server) (without 'ansible_connection: network_cli' of course) , and I got the same results: the facts are always gathered from the remote device at each run with  'gather_facts: yes'

It seems that I'm misunderstanding the real meaning of gather_facts and the primary goal of the thread does not seem to be implemented by ansible and should be manually implemented somehow by the user.
I suppose that it also means that when the cache timeout expires, all the 'ansible_facts' data disappear from the cache and that's it. If nothing is done by the user to gather them from the remote device, they are not accessible anymore.
If the timeout expiration is supposed to trigger some background facts gathering from the remote device, it must happen during some playbook run, otherwise it is lost.

- If I run the first playbook with 'gather_facts: yes'then run it before the cache timeout expires with 'gather_facts: no', the 'ansible_facts' continue to be accessible.
- however, if I run the first playbook with 'gather_facts: yes', wait for the cache timeout to expire and then run it with 'gather_facts: no', the 'ansible_facts' are not accessible anymore. Nothing is triggered.

Is the last behavior expected?

Brian Coca

unread,
Apr 12, 2021, 12:37:10 PM4/12/21
to Ansible Project
https://docs.ansible.com/ansible/latest/reference_appendices/config.html#default-gathering
^ set to smart and you can ignore `gather_facts` except for those
plays in which you want to force it.




--
----------
Brian Coca

jean-christophe manciot

unread,
Apr 13, 2021, 8:57:13 AM4/13/21
to Ansible Project
@Brian Coca

I have already tried to set gathering=smart in ansible.cfg or export ANSIBLE_GATHERING=smart
Nothing happens during the second play run when the fact_caching_timeout expires after the first one.

Anyway, even if it worked, there would be a major drawback: its unpredictability.

For instance, let's assume:
  • fact_caching_timeout expires after the first play run
  • one of the ansible_facts is used at the beginning of the second run to perform a group_by with ansible_net_version for instance

Even if the smart feature works and kicks in after the second run begins, there is a high probability that the playbook will fail due to ansible_net_version being undefined, depending on when exactly it does kick in and how long it takes to retrieve the data from the remote device. 

On top of that, there is no way to run a module to gather facts when that variable is undefined, because there is no setup equivalence in the networking ecosystem. 
IIUC, when gather_facts: yes is used, it launches the correct platform-dependent gathering facts module based on the value of predefined ansible_network_os. And there is not a single umbrella module to take care of that logic.

Hence my initial wrong belief that gather_facts: yes would first check if the fact_caching_timeout is about to expire (for instance halfway through the timeout) before deciding whether to gather facts from the remote device or not.  With that type of logic, we could count on the fact that after that call, the ansible_facts would be accessible for sure.

As a summary:
  • the smart feature does not work over networking devices
  • even if it did, it would be:
    •    unpredictable
    •    unusable in some use cases where a gather_facts should be avoided unless absolutely necessary
Workaround:
  • gathering = explicit
  • run a background playbook every halfway through the fact_caching_timeout for all remote devices

jean-christophe manciot

unread,
Apr 13, 2021, 8:59:20 AM4/13/21
to Ansible Project
... with gather_facts: yes. (end of last sentence)

Brian Coca

unread,
Apr 13, 2021, 12:44:11 PM4/13/21
to Ansible Project
the time out only affects facts when first fetched, once in memory
they should not expire or you would lose facts mid run and become
inconsistent. if you want to force retry you can check datetime facts
yourself and make fact gathering conditional on that on the 2nd play.

--
----------
Brian Coca

Reply all
Reply to author
Forward
0 new messages