Gathering Facts "hanging" for a single host

4,937 views
Skip to first unread message

Drew Decker

unread,
Dec 6, 2013, 3:08:06 PM12/6/13
to ansible...@googlegroups.com
I have a host that, when I run a playbook against it, it fails at gathering facts.  Unfortunately, it just hangs:

# ansible-playbook -i test playbooks/get_release_versions/site.yml -f 10 -vvvv

PLAY [Grabs all release versions for auditing purposes] ***********************

GATHERING FACTS ***************************************************************
<psl-qan1recsp01> ESTABLISH CONNECTION FOR USER: ddecker
<psl-qan1recsp01> EXEC ['ssh', '-tt', '-vvv', '-o', 'ControlMaster=auto', '-o', 'ControlPersist=60s', '-o', 'ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r', '-o', 'Port=22', '-o', 'KbdInteractiveAuthentication=no', '-o', 'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', '-o', 'PasswordAuthentication=no', '-o', 'User=ddecker', '-o', 'ConnectTimeout=10', 'testhost01', "/bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-1386355046.1-188203869680440 && chmod a+rx $HOME/.ansible/tmp/ansible-1386355046.1-188203869680440 && echo $HOME/.ansible/tmp/ansible-1386355046.1-188203869680440'"]
<psl-qan1recsp01> REMOTE_MODULE setup
<psl-qan1recsp01> PUT /tmp/tmpEvKhjU TO /home/ddecker/.ansible/tmp/ansible-1386355046.1-188203869680440/setup
<psl-qan1recsp01> EXEC ['ssh', '-tt', '-vvv', '-o', 'ControlMaster=auto', '-o', 'ControlPersist=60s', '-o', 'ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r', '-o', 'Port=22', '-o', 'KbdInteractiveAuthentication=no', '-o', 'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', '-o', 'PasswordAuthentication=no', '-o', 'User=ddecker', '-o', 'ConnectTimeout=10', 'testhost01', '/bin/sh -c \'dzdo -k && dzdo -H -S -p "[sudo via ansible, key=eeszsiqmrnksgtzkolmtvmhddnepwrbh] password: " -u root /bin/sh -c \'"\'"\'/usr/bin/python /home/ddecker/.ansible/tmp/ansible-1386355046.1-188203869680440/setup; rm -rf /home/ddecker/.ansible/tmp/ansible-1386355046.1-188203869680440/ >/dev/null 2>&1\'"\'"\'\'']
^CTraceback (most recent call last):
  File "/usr/bin/ansible-playbook", line 268, in <module>
    sys.exit(main(sys.argv[1:]))
  File "/usr/bin/ansible-playbook", line 208, in main
    pb.run()
  File "/usr/lib/python2.6/site-packages/ansible/playbook/__init__.py", line 262, in run
    if not self._run_play(play):
  File "/usr/lib/python2.6/site-packages/ansible/playbook/__init__.py", line 505, in _run_play
    self._do_setup_step(play)
  File "/usr/lib/python2.6/site-packages/ansible/playbook/__init__.py", line 452, in _do_setup_step
    accelerate=play.accelerate, accelerate_port=play.accelerate_port,
  File "/usr/lib/python2.6/site-packages/ansible/runner/__init__.py", line 968, in run
    results = [ self._executor(h, None) for h in hosts ]
  File "/usr/lib/python2.6/site-packages/ansible/runner/__init__.py", line 382, in _executor
    exec_rc = self._executor_internal(host, new_stdin)
  File "/usr/lib/python2.6/site-packages/ansible/runner/__init__.py", line 471, in _executor_internal
    return self._executor_internal_inner(host, self.module_name, self.module_args, inject, port, complex_args=complex_args)
  File "/usr/lib/python2.6/site-packages/ansible/runner/__init__.py", line 659, in _executor_internal_inner
    result = handler.run(conn, tmp, module_name, module_args, inject, complex_args)
  File "/usr/lib/python2.6/site-packages/ansible/runner/action_plugins/normal.py", line 54, in run
    return self.runner._execute_module(conn, tmp, module_name, module_args, inject=inject, complex_args=complex_args)
  File "/usr/lib/python2.6/site-packages/ansible/runner/__init__.py", line 348, in _execute_module
    res = self._low_level_exec_command(conn, cmd, tmp, sudoable=sudoable)
  File "/usr/lib/python2.6/site-packages/ansible/runner/__init__.py", line 708, in _low_level_exec_command
    rc, stdin, stdout, stderr = conn.exec_command(cmd, tmp, sudo_user, sudoable=sudoable, executable=executable)
  File "/usr/lib/python2.6/site-packages/ansible/runner/connection_plugins/ssh.py", line 219, in exec_command
    rfd, wfd, efd = select.select([p.stdout, p.stderr], [], [p.stdout, p.stderr], 1)
KeyboardInterrupt

 For the above, I wait more than 10 minutes and it still just hangs.  If I REMOVE the host, then it proceeds (and completes all other hosts).  Additionall if I add  gather_facts: False to my site.yml (for the playbook), it completes with no issue:

# ansible-playbook -i test playbooks/get_release_versions/site.yml -f 10

PLAY [Grabs all release versions for auditing purposes] ***********************

TASK: [Get release version] ***************************************************
changed: [testhost01]

PLAY RECAP ********************************************************************
testhost01            : ok=1    changed=1    unreachable=0    failed=0

Does anyone know why this happens or what I can do to better debug and find out the reason why?  When I do a ansible all -u ddecker -m setup it processes all hosts except this one, so it has something to do with the host and something that is happening that the fact gathering is requesting. 

Thanks,
Drew

James Tanner

unread,
Dec 6, 2013, 3:14:31 PM12/6/13
to ansible...@googlegroups.com
--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Let's narrow this down further.

1) Run ansible-playbook with ANSIBLE_KEEP_REMOTE_FILES=1 and -vvvv
2) Check the debug output and file the filename that is sent to the remote host that has a "setup" filename.
3) Go to the remote machine and run the script by hand with python <filepath>

If the script hangs there, then we know that the setup script is having and issue, if not there may be a connection issue.

Drew Decker

unread,
Dec 6, 2013, 4:31:45 PM12/6/13
to ansible...@googlegroups.com

Yup - it hangs on the client:

cd /home/ddecker/.ansible/tmp/ansible-1386364299.05-209026778523282

# python setup

^Cc^C^C^C^C^Z^Z^Z^Z

Michael DeHaan

unread,
Dec 6, 2013, 4:40:22 PM12/6/13
to ansible...@googlegroups.com
OS and ansible version?

If you control-C where what do you get in the traceback when running locally?


--
Michael DeHaan <mic...@ansibleworks.com>
CTO, AnsibleWorks, Inc.
http://www.ansibleworks.com/

Drew Decker

unread,
Dec 6, 2013, 4:44:00 PM12/6/13
to ansible...@googlegroups.com
control-C also just hangs (as you can see from my output in my previous comment) - so no traceback at all.  If you mean the traceback from where Ansible runs when I run (and there is no output or failure), then for that it shows:

OS (of Client):  RHEL 6.4 x86_64
Ansible Version: 1.3.4

Thanks!

Michael DeHaan

unread,
Dec 6, 2013, 5:00:32 PM12/6/13
to ansible...@googlegroups.com
This looks like accelerate not supporting sudo with password just yet? 

 I agree it should error at least...

-- Michael

James Tanner

unread,
Dec 6, 2013, 5:04:00 PM12/6/13
to ansible...@googlegroups.com
No, I think one of the fact functions is stuck waiting for input. We just need to narrow it down.

Drew, In the setup script on the remote system there is an init function that calls all the major groups of facts ...

    def __init__(self):
        self.facts = {}
        self.get_platform_facts()
        self.get_distribution_facts()
        self.get_cmdline()
        self.get_public_ssh_host_keys()
        self.get_selinux_facts()
        self.get_pkg_mgr_facts()
        self.get_lsb_facts()
        self.get_date_time_facts()
        self.get_user_facts()
        self.get_local_facts()
        self.get_env_facts()

Comment out all of those self.get_* calls and see if the the script completes. If it does, then uncomment each get one by one till the script hangs again.

Drew Decker

unread,
Dec 6, 2013, 5:18:44 PM12/6/13
to ansible...@googlegroups.com
Guys,

It still hangs, however, I tried running it with Python's trace options:

python -m trace --trace ./setup
re.py(142):     return _compile(pattern, flags).search(string)
 --- modulename: re, funcname: _compile
re.py(231):     cachekey = (type(key[0]),) + key
re.py(232):     p = _cache.get(cachekey)
re.py(233):     if p is not None:
re.py(234):         return p
setup(696):                 if m:
setup(694):             for folder in os.listdir(sysdir):
setup(695):                 m = re.search("(" + diskname + "\d+)", folder)
 --- modulename: re, funcname: search
re.py(142):     return _compile(pattern, flags).search(string)
 --- modulename: re, funcname: _compile
re.py(231):     cachekey = (type(key[0]),) + key
re.py(232):     p = _cache.get(cachekey)
re.py(233):     if p is not None:
re.py(234):         return p
setup(696):                 if m:
setup(694):             for folder in os.listdir(sysdir):
setup(707):             d['rotational'] = get_file_content(sysdir + "/queue/rotational")
 --- modulename: setup, funcname: get_file_content
setup(2064):     data = default
setup(2065):     if os.path.exists(path) and os.access(path, os.R_OK):
 --- modulename: genericpath, funcname: exists
genericpath.py(17):     try:
genericpath.py(18):         st = os.stat(path)
genericpath.py(19):     except os.error:
genericpath.py(20):         return False
setup(2069):     return data
setup(708):             d['scheduler_mode'] = ""
setup(709):             scheduler = get_file_content(sysdir + "/queue/scheduler")
 --- modulename: setup, funcname: get_file_content
setup(2064):     data = default
setup(2065):     if os.path.exists(path) and os.access(path, os.R_OK):
 --- modulename: genericpath, funcname: exists
genericpath.py(17):     try:
genericpath.py(18):         st = os.stat(path)
genericpath.py(21):     return True
setup(2066):         data = open(path).read().strip()

Let me know if I can offer any more tests that may help with debugging this issue.

Thanks
Drew

Drew Decker

unread,
Dec 12, 2013, 7:21:31 PM12/12/13
to ansible...@googlegroups.com
Was anyone else able to give me some pointers that would able to let me debug this issue further?

-- 
Drew Decker
You received this message because you are subscribed to a topic in the Google Groups "Ansible Project" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ansible-project/eIiKhjtH_Nk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ansible-proje...@googlegroups.com.

James Tanner

unread,
Dec 12, 2013, 8:16:12 PM12/12/13
to ansible...@googlegroups.com
It seems that python is hanging when trying to read the scheduler file for one of your disks. Add a print statement before line 2066 …

print “PATH:”,path

Drew Decker

unread,
Dec 12, 2013, 10:11:12 PM12/12/13
to ansible...@googlegroups.com, James Tanner
Here's where it dies:

PATH: /sys/block/../devices/pci0000:00/0000:00:1d.7/usb2/2-2/2-2:1.1/host14/target14:0:0/14:0:0:0/block/sr0/queue/rotational
PATH: /sys/block/../devices/pci0000:00/0000:00:1d.7/usb2/2-2/2-2:1.1/host14/target14:0:0/14:0:0:0/block/sr0/queue/scheduler
PATH: /sys/block/../devices/pci0000:00/0000:00:1d.7/usb2/2-2/2-2:1.1/host14/target14:0:0/14:0:0:0/block/sr0/size
PATH: /sys/block/../devices/pci0000:00/0000:00:1d.7/usb2/2-2/2-2:1.1/host14/target14:0:0/14:0:0:0/block/sr0/queue/hw_sector_size
PATH: /sys/block/../devices/pci0000:00/0000:00:07.0/0000:0e:00.1/host5/rport-5:0-0/target5:0:0/5:0:0:2/block/sdl/removable
PATH: /sys/block/../devices/pci0000:00/0000:00:07.0/0000:0e:00.1/host5/rport-5:0-0/target5:0:0/5:0:0:2/block/sdl/queue/scheduler
^C^C^C^C^C^C^C^C^C^C^C^C   <------- Dies right after the previous line ——

-- 
Drew Decker

James Tanner

unread,
Dec 12, 2013, 10:26:17 PM12/12/13
to ansible...@googlegroups.com
Are you able to cat that file?

jtanner@u1304:~$ cat /sys/block/../devices/pci0000:00/0000:00:06.0/virtio2/block/vda/queue/scheduler
noop deadline [cfq]

Drew Decker

unread,
Dec 12, 2013, 11:37:15 PM12/12/13
to ansible...@googlegroups.com, James Tanner, ansible...@googlegroups.com
Nope - can’t cat it at all.  I can ls it, and see the following:

# ls -l /sys/block/../devices/pci0000:00/0000:00:07.0/0000:0e:00.1/host5/rport-5:0-0/target5:0:0/5:0:0:2/block/sdl/queue/scheduler
-rw-r--r-- 1 root root 4096 Sep 11 02:26 /sys/block/../devices/pci0000:00/0000:00:07.0/0000:0e:00.1/host5/rport-5:0-0/target5:0:0/5:0:0:2/block/sdl/queue/scheduler

Is there something I can do to map this device to a physical device to know what the cause could be?

-- 
Drew Decker

James Tanner

unread,
Dec 12, 2013, 11:41:36 PM12/12/13
to ansible...@googlegroups.com
“sdl” is the device name per the path. /dev/sdl

If you can’t read the scheduler file, you may have a broken device or a buggy driver. You should check your dmesg outputs and the syslog for any obvious errors. Beyond that, you need to work with the relevant OS and hardware vendors to sort out why things are hanging.

Drew Decker

unread,
Dec 13, 2013, 1:02:48 AM12/13/13
to James Tanner, ansible...@googlegroups.com
James,

Thanks for the input.  I went ahead and tested this on another system of the same Product type (Dell R810) that runs the same apps, etc (with possible BIOS upgrades and firmware updates already installed), and the setup script ran just fine.  This system that is failing might just need some firmware updates installed on them - so I'll get on that and post results in the future.  Thanks for helping me sort this out from being a Ansible issue vs a server issue, and this clearly appears to be a server issue.

Thanks again!
-- 
Drew Decker

Drew Decker

unread,
Dec 13, 2013, 8:22:39 AM12/13/13
to James Tanner, ansible...@googlegroups.com
This morning I went ahead and applied some firmware updates and rebooted the problem server.  Once the server came up, I tested the “setup” script again, this time it ran all the way.

Thanks for the help in troubleshooting!

-- 
Drew Decker
Sent with Airmail

James Tanner

unread,
Dec 13, 2013, 10:23:33 AM12/13/13
to ansible...@googlegroups.com
No problem. I learned something too ...

python -m trace --trace
Reply all
Reply to author
Forward
0 new messages