Gathering Facts hangs when using become

1,187 views
Skip to first unread message

Chris Thro

unread,
Jan 13, 2020, 6:46:56 PM1/13/20
to Ansible Project
Out of hundreds of hosts we have one host always hangs when gathering facts when using become.

Create a playbook with the following:
- hosts: all

call the playbook with the -b flag

The last thing reported in the log is "Escalation succeeded". 
No matter how long I wait it never returns the prompt and there are two processes running on the remote host. One as my username and one as root.

There is no nfs (I have seen other issues where facts hang on nfs).

This is a RHEL5 host with python 2.7 installed.  1/4 of our hosts are RHEL5 with python 2.7 installed and they don't have this issue

Anyone see this before? Any ideas how to troubleshoot it further?
I have tried the highest verbosity but there doesn't appear to be any helpful information; compared it to successful runs and nothing is different.

Jean-Yves LENHOF

unread,
Jan 14, 2020, 1:01:04 AM1/14/20
to Chris Thro, Ansible Project
Hi,

Problems I already encounter :
- NFS Stale
- rpm database corruption
- lock on lvm

Regards,

Jy
--
Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez excuser ma brièveté.

Kai Stian Olstad

unread,
Jan 14, 2020, 1:47:20 AM1/14/20
to ansible...@googlegroups.com
On 14.01.2020 00:46, Chris Thro wrote:
> Out of hundreds of hosts we have one host always hangs when gathering
> facts
> when using become.
>
> Create a playbook with the following:
> - hosts: all
>
> call the playbook with the -b flag
>
> The last thing reported in the log is "Escalation succeeded".
> No matter how long I wait it never returns the prompt and there are two
> processes running on the remote host. One as my username and one as
> root.
>
> There is no nfs (I have seen other issues where facts hang on nfs).

I almost always IO related, mounts, LVM, filsystem...


> Anyone see this before? Any ideas how to troubleshoot it further?
> I have tried the highest verbosity but there doesn't appear to be any
> helpful information; compared it to successful runs and nothing is
> different.

It happens sometimes and the easiest way(IMHO) to find where it hangs is
to debug the setup module
https://docs.ansible.com/ansible/latest/dev_guide/debugging.html#debugging-remote

Then I run the module manually as described in the link with strace to
identify where it hangs.

--
Kai Stian Olstad

Chris Thro

unread,
Jan 14, 2020, 6:53:30 PM1/14/20
to Ansible Project
Thank you. I did the strace and it shows that it is just repeating the same two lines over and over again.
select(7, [4 6], [], [4 6], {1, 0})     = 0 (Timeout)
wait4(29548, 0x7fff6a145c84, WNOHANG, NULL) = 0

When I checked the details of the select I get the following:
lsof -p 10984 -ad 4,6
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF      NODE NAME
python2.7 10984 root    4r  FIFO    0,7      0t0 819132068 pipe
python2.7 10984 root    6r  FIFO    0,7      0t0 819132069 pipe


Kai Stian Olstad

unread,
Jan 15, 2020, 9:40:25 AM1/15/20
to ansible...@googlegroups.com
What happens right before it goes into this loop is probably the interesting part and can identifies what it trying to access.
If not you probably need to add print statements in the python code to identifies where it hangs.


--
Kai Stian Olstad

Chris Thro

unread,
Jan 15, 2020, 6:31:16 PM1/15/20
to Ansible Project
Thank you for the print idea.  I was able to trace it to the following commands:

/usr/bin/facter --puppet –json


Looks like the version of facter doesn't like the --puppet option.  WIll probably have to look into uprading it.

Thanks again.
Reply all
Reply to author
Forward
0 new messages