'ansible_system_vendor' is undefined

490 views
Skip to first unread message

cl...@netsandbox.de

unread,
Apr 22, 2021, 5:59:51 AM4/22/21
to Ansible Project

Hi, we have a playbook that runs once a day on all our Linux hosts.
On each day, the same task fails on a different host  with "'ansible_system_vendor' is undefined".
I know that facts are gathered
 because previous tasks to the failing one are already use facts.
This started after upgrading Ansible from 2.3.3 to 2.10.8.

Any idea why this happens?

Regards Chris

Brian Coca

unread,
Apr 22, 2021, 10:00:44 AM4/22/21
to Ansible Project
facts are 'best effort' sometimes permissions or a busy system will
make a specific fact fail, you should get a warning though.



--
----------
Brian Coca

cl...@netsandbox.de

unread,
Apr 30, 2021, 2:22:56 AM4/30/21
to Ansible Project
I can't find any warning.
Permissions is also not a problem because we gather facts as root.
The host was also not busy at the time of facts gathering.

Also,  if I understand it correctly, facts should be "NA" if they can't be gathered, and not undefined.

What is also strange is, that it started after using a newer Ansible version (2.10.8).
We never saw anything like this in your old Ansible version (2.3.3).

Brian Coca

unread,
Apr 30, 2021, 10:55:47 AM4/30/21
to Ansible Project
Sadly not all the facts gathering code consistently uses N/A or
warnings. But in this case, for system_vendor, it can be populated by
either VM detection (hardcoded), query of /sys/devices or executing
dmidecode (these all seem to use the N/A standard). So afaict you
should not be getting undefined unless you are using the `subset`
option.

--
----------
Brian Coca

cl...@netsandbox.de

unread,
May 11, 2021, 4:03:23 AM5/11/21
to Ansible Project
That is the point, I'm getting "undefined".
So I'm looking for someone who have an idea why I get here "undefined".

This happens only sometimes when ansible-playbook is run in one of our Jenkins pipelines.
When I try to reproduce this with the ansible-playbook run from my workstation, no facts are undefined.

I can't find anything in the target hosts logs, Jenkins logs, or Jenkins host logs.

Brian Coca

unread,
May 11, 2021, 9:27:07 AM5/11/21
to Ansible Project
from the code, you don't have access to stat
/sys/devices/virtual/dmi/id/product_name (or if you do, you cannot
access /sys/devices/virtual/dmi/id/sys_vendor)
and executing dmidecode it does not provide this info (not installed,
lack of permissions, etc).

--
----------
Brian Coca

cl...@netsandbox.de

unread,
May 11, 2021, 9:50:20 AM5/11/21
to Ansible Project
As stated before we gather facts as root user, so we have access to /sys/devices/virtual/dmi/id/{product_name,sys_vendor}.
Also dmidecode is installed, but is not used because before mentioned paths are accessible.

Also this happens only from time to time on one or two of our hosts (we have 1300 hosts).
On each ansible-playbook run, different 1 to 2 hosts appear with a undefined ansible_system_vendor fact,
sometimes also ansible_product_name is undefined.
Sometimes a ansible-playbook run finishes with no undefined Ansible facts.

And as also stated before, this happened after updating our Ansible from 2.3.3 to 2.10.8.
We never ever saw this problem with Ansible 2.3.3, which was running fine for years.

Brian Coca

unread,
May 11, 2021, 10:54:43 AM5/11/21
to Ansible Project
2.3.3 didn't do timeouts correctly, that might be the reason you are
seeing this now, but you should also get a warning about it.


--
----------
Brian Coca

cl...@netsandbox.de

unread,
May 11, 2021, 11:30:41 AM5/11/21
to Ansible Project
I just created the following test with a faked lsblk command, because this is called in

* created a /usr/local/bin/lsblk bash script with "sleep 10"
* check that my faked lsblk command is used: which lsblk: /usr/local/bin/lsblk
* set in ansible.cfg "gather_timeout = 1"
* run "time ansible testhost -b -m setup", this took 12 seconds, no warning shown
* run "time ansible-playbook facts.yaml -b -l testhost" (facts.yaml is a playbook which just gather facts), his took 12 seconds, no warning shown

I'm sure that my faked lsblk command is used, because when I change the sleep from 10 to 20,
the ansible and ansible-playbook runs take 22 instead of the previous 12 seconds.

I would expect a warning from the above ansible and ansible-playbook runs, but nothing is shown.

Brian Coca

unread,
May 12, 2021, 11:46:54 AM5/12/21
to Ansible Project
probably related to this then https://github.com/ansible/ansible/issues/74657

--
----------
Brian Coca

cl...@netsandbox.de

unread,
May 26, 2021, 6:09:00 AM5/26/21
to Ansible Project
and beside the fake lsblk I also created a fake udevadm (sleep 10) because this is called in _udevadm_uuid (https://github.com/ansible/ansible/blob/953aa26286db433c3509785e24f89f6616233841/lib/ansible/module_utils/facts/hardware/linux.py#L440-L463)
and then re-run the steps from my above comment and even then don't get a timeout warning.

cl...@netsandbox.de

unread,
May 26, 2021, 6:23:43 AM5/26/21
to Ansible Project
self.module.warn("Timed out while attempting to get extra information.")

cl...@netsandbox.de

unread,
Jun 8, 2021, 2:57:20 AM6/8/21
to Ansible Project

Thanks Brian for helping debugging this.

Brian Coca

unread,
Jun 8, 2021, 4:16:46 AM6/8/21
to Ansible Project
Expected, the 2 first are meant to show better errors and allow for
debugging, while the last one fixes concurrency issues with threads
for modules that call run_command.

----------
Brian Coca

cl...@netsandbox.de

unread,
Jun 8, 2021, 4:33:48 AM6/8/21
to Ansible Project
One last question regarding this:
https://github.com/ansible/ansible/pull/74791 is currently labeled with affects_2.12, so do we have to wait for 2.12 or will this fix back ported to 2.11 and 2.10?
Reply all
Reply to author
Forward
0 new messages