'ansible_system_vendor' is undefined

cl...@netsandbox.de

unread,

Apr 22, 2021, 5:59:51 AM4/22/21

to Ansible Project

Hi, we have a playbook that runs once a day on all our Linux hosts.

On each day, the same task fails on a different host with "'ansible_system_vendor' is undefined".
I know that facts are gathered
because previous tasks to the failing one are already use facts.

This started after upgrading Ansible from 2.3.3 to 2.10.8.

Any idea why this happens?

Regards Chris

Brian Coca

unread,

Apr 22, 2021, 10:00:44 AM4/22/21

to Ansible Project

facts are 'best effort' sometimes permissions or a busy system will
make a specific fact fail, you should get a warning though.

--
----------
Brian Coca

cl...@netsandbox.de

unread,

Apr 30, 2021, 2:22:56 AM4/30/21

to Ansible Project

I can't find any warning.
Permissions is also not a problem because we gather facts as root.

The host was also not busy at the time of facts gathering.

Also, if I understand it correctly, facts should be "NA" if they can't be gathered, and not undefined.

What is also strange is, that it started after using a newer Ansible version (2.10.8).
We never saw anything like this in your old Ansible version (2.3.3).

Brian Coca

unread,

Apr 30, 2021, 10:55:47 AM4/30/21

to Ansible Project

Sadly not all the facts gathering code consistently uses N/A or
warnings. But in this case, for system_vendor, it can be populated by
either VM detection (hardcoded), query of /sys/devices or executing
dmidecode (these all seem to use the N/A standard). So afaict you
should not be getting undefined unless you are using the `subset`
option.

--
----------
Brian Coca

cl...@netsandbox.de

unread,

May 11, 2021, 4:03:23 AM5/11/21

to Ansible Project

That is the point, I'm getting "undefined".

So I'm looking for someone who have an idea why I get here "undefined".

This happens only sometimes when ansible-playbook is run in one of our Jenkins pipelines.

When I try to reproduce this with the ansible-playbook run from my workstation, no facts are undefined.

I can't find anything in the target hosts logs, Jenkins logs, or Jenkins host logs.

Brian Coca

unread,

May 11, 2021, 9:27:07 AM5/11/21

to Ansible Project

from the code, you don't have access to stat
/sys/devices/virtual/dmi/id/product_name (or if you do, you cannot
access /sys/devices/virtual/dmi/id/sys_vendor)
and executing dmidecode it does not provide this info (not installed,
lack of permissions, etc).

--
----------
Brian Coca

cl...@netsandbox.de

unread,

May 11, 2021, 9:50:20 AM5/11/21

to Ansible Project

As stated before we gather facts as root user, so we have access to /sys/devices/virtual/dmi/id/{product_name,sys_vendor}.

Also dmidecode is installed, but is not used because before mentioned paths are accessible.

Also this happens only from time to time on one or two of our hosts (we have 1300 hosts).
On each ansible-playbook run, different 1 to 2 hosts appear with a undefined ansible_system_vendor fact,

sometimes also ansible_product_name is undefined.
Sometimes a ansible-playbook run finishes with no undefined Ansible facts.

And as also stated before, this happened after updating our Ansible from 2.3.3 to 2.10.8.
We never ever saw this problem with Ansible 2.3.3, which was running fine for years.

Brian Coca

unread,

May 11, 2021, 10:54:43 AM5/11/21

to Ansible Project

2.3.3 didn't do timeouts correctly, that might be the reason you are
seeing this now, but you should also get a warning about it.

--
----------
Brian Coca

cl...@netsandbox.de

unread,

May 11, 2021, 11:30:41 AM5/11/21

to Ansible Project

I just created the following test with a faked lsblk command, because this is called in

https://github.com/ansible/ansible/blob/devel/lib/ansible/module_utils/facts/hardware/linux.py#L400

* created a /usr/local/bin/lsblk bash script with "sleep 10"

* check that my faked lsblk command is used: which lsblk: /usr/local/bin/lsblk

* set in ansible.cfg "gather_timeout = 1"

* run "time ansible testhost -b -m setup", this took 12 seconds, no warning shown

* run "time ansible-playbook facts.yaml -b -l testhost" (facts.yaml is a playbook which just gather facts), his took 12 seconds, no warning shown

I'm sure that my faked lsblk command is used, because when I change the sleep from 10 to 20,
the ansible and ansible-playbook runs take 22 instead of the previous 12 seconds.

I would expect a warning from the above ansible and ansible-playbook runs, but nothing is shown.

Brian Coca

unread,

May 12, 2021, 11:46:54 AM5/12/21

to Ansible Project

probably related to this then https://github.com/ansible/ansible/issues/74657

--
----------
Brian Coca

cl...@netsandbox.de

unread,

May 26, 2021, 6:09:00 AM5/26/21

to Ansible Project

I have applied your patch from https://github.com/ansible/ansible/issues/74657#issuecomment-841457582

and beside the fake lsblk I also created a fake udevadm (sleep 10) because this is called in _udevadm_uuid (https://github.com/ansible/ansible/blob/953aa26286db433c3509785e24f89f6616233841/lib/ansible/module_utils/facts/hardware/linux.py#L440-L463)

which is called in get_mount_info (https://github.com/ansible/ansible/blob/953aa26286db433c3509785e24f89f6616233841/lib/ansible/module_utils/facts/hardware/linux.py#L516-L525)

and then re-run the steps from my above comment and even then don't get a timeout warning.

cl...@netsandbox.de

unread,

May 26, 2021, 6:23:43 AM5/26/21

to Ansible Project

I think https://github.com/ansible/ansible/blob/953aa26286db433c3509785e24f89f6616233841/lib/ansible/module_utils/facts/hardware/linux.py#L584-L588 is missing a :

self.module.warn("Timed out while attempting to get extra information.")

cl...@netsandbox.de

unread,

Jun 8, 2021, 2:57:20 AM6/8/21

to Ansible Project

I have applied the changes from https://github.com/ansible/ansible/pull/74714 and https://github.com/ansible/ansible/pull/74885

and now see the same KeyError as in https://github.com/ansible/ansible/pull/74714#pullrequestreview-673126168
which seams to be fixed in https://github.com/ansible/ansible/pull/74791

Thanks Brian for helping debugging this.

Brian Coca

unread,

Jun 8, 2021, 4:16:46 AM6/8/21

to Ansible Project

Expected, the 2 first are meant to show better errors and allow for
debugging, while the last one fixes concurrency issues with threads
for modules that call run_command.

----------
Brian Coca

cl...@netsandbox.de

unread,

Jun 8, 2021, 4:33:48 AM6/8/21

to Ansible Project

One last question regarding this:

https://github.com/ansible/ansible/pull/74791 is currently labeled with affects_2.12, so do we have to wait for 2.12 or will this fix back ported to 2.11 and 2.10?

Reply all

Reply to author

Forward