apt: upgrade=dist play hangs

365 views
Skip to first unread message

Bob Tanner

unread,
Jan 13, 2015, 2:13:36 PM1/13/15
to ansible...@googlegroups.com
git pulled the latest code 

$ git rev-parse HEAD
f1fdddb640e1dfcd8fdc930031db267947d8cb70

and now the following play hangs

- name: package upgrade (apt)
  apt: upgrade=dist
  sudo: yes

All computers have minimal load. Logs on the remote computer show:

ansible-apt: Invoked with dpkg_options=force-confdef,force-confold upgrade=None force=False package=None purge=False state=present update_cache=True default_release=None cache_valid_time=3600 deb=None install_recommends=True

The computer running ansible shows:

<www.X.com> ESTABLISH CONNECTION FOR USER: ansible-user on PORT 22 TO www.X.com
<www.X.com> REMOTE_MODULE apt upgrade=dist
<www.X.com> EXEC /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1421171817.66-121308835316644 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1421171817.66-121308835316644 && echo $HOME/.ansible/tmp/ansible-tmp-1421171817.66-121308835316644'
<www.X.com> PUT /var/folders/d_/bm7rvz154jb_2djkqybb503h0000gn/T/tmpgwRxMi TO /home/ansible-user/.ansible/tmp/ansible-tmp-1421171817.66-121308835316644/apt
<www.Xl.com> EXEC /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=XXXXXX] password: " -u root /bin/sh -c '"'"'echo SUDO-SUCCESS-XXXXXXX; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/ansible-user/.ansible/tmp/ansible-tmp-1421171817.66-121308835316644/apt; rm -rf /home/rte/.ansible/tmp/ansible-tmp-1421171817.66-121308835316644/ >/dev/null 2>&1'"'"''

What is odd is the remote computer does not show python running or ssh connections so I think the command had completed on the remote host and it's ansible computer that is hung or slow waiting to process things. 

Here is what I see on the ansible host for a process list

 501 82141 31032   0 12:05PM ttys004    0:00.80 python ansible-playbook --ask-pass --ask-sudo -i customers/XXXX.com linux-servers.yml --tags apt
 501 82151 82141   0 12:06PM ttys004    0:00.04 python ansible-playbook --ask-pass --ask-sudo -i customers/XXXX.com linux-servers.yml --tags apt
 501 82154 82141   0 12:06PM ttys004    0:09.39 python ansible-playbook --ask-pass --ask-sudo -i customers/XXXX.com linux-servers.yml --tags apt
 501 82155 82141   0 12:06PM ttys004    0:00.00 (python)
 501 82156 82141   0 12:06PM ttys004    0:00.00 (python)

PID 82154 continues to tick CPU time so I do not think it's hung.

I've rebooted the www.X.com host and still the ansible control computer hangs on this play.

I'm looking for tips on how to debug this problem.

Thanks.


Bob Tanner

unread,
Jan 13, 2015, 2:19:13 PM1/13/15
to ansible...@googlegroups.com

Here is what I see on the ansible host for a process list

 501 82141 31032   0 12:05PM ttys004    0:00.80 python ansible-playbook --ask-pass --ask-sudo -i customers/XXXX.com linux-servers.yml --tags apt
 501 82151 82141   0 12:06PM ttys004    0:00.04 python ansible-playbook --ask-pass --ask-sudo -i customers/XXXX.com linux-servers.yml --tags apt
 501 82154 82141   0 12:06PM ttys004    0:09.39 python ansible-playbook --ask-pass --ask-sudo -i customers/XXXX.com linux-servers.yml --tags apt
 501 82155 82141   0 12:06PM ttys004    0:00.00 (python)
 501 82156 82141   0 12:06PM ttys004    0:00.00 (python)

PID 82154 continues to tick CPU time so I do not think it's hung.


If I kill 82154 the ansible control computer moves past the "hung" www.X.com host and continues with the rest of the plays in the playbook. 

Bob Tanner

unread,
Jan 13, 2015, 8:22:02 PM1/13/15
to ansible...@googlegroups.com
Have this play hang now too.

- name: package update (apt)
  apt: update_cache=yes cache_valid_time=3600
  sudo: yes
  always_run: yes

Verbose run shows this on multiple hosts:

<host.domain.com> ESTABLISH CONNECTION FOR USER: ansible-user on PORT 22 TO host.domain.com

<host.domain.com> REMOTE_MODULE apt update_cache=yes cache_valid_time=3600

<host.domain.com> EXEC /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1421180023.91-7480599726320 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1421180023.91-7480599726320 && echo $HOME/.ansible/tmp/ansible-tmp-1421180023.91-7480599726320'

<host.domain.com> PUT /var/folders/d_/bm7rvz154jb_2djkqybb503h0000gn/T/tmpGH6ekd TO /home/ansible-user/.ansible/tmp/ansible-tmp-1421180023.91-7480599726320/apt

<host.domain.com> EXEC /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=X] password: " -u root /bin/sh -c '"'"'echo SUDO-SUCCESS-X; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/ansible-user/.ansible/tmp/ansible-tmp-1421180023.91-7480599726320/apt; rm -rf /home/ansible-user/.ansible/tmp/ansible-tmp-1421180023.91-7480599726320/ >/dev/null 2>&1'"'"''


Then is just hangs. 

Brian Coca

unread,
Jan 14, 2015, 7:36:16 AM1/14/15
to ansible...@googlegroups.com
what happens when you manually run 'apt-get update' on the same machine?


--
Brian Coca

Bob Tanner

unread,
Jan 14, 2015, 12:41:18 PM1/14/15
to ansible...@googlegroups.com
I should have posted that info in the first place. Thanks for asking. Things work fine when apt-get update from the command line.

Brian Coca

unread,
Jan 14, 2015, 1:03:27 PM1/14/15
to ansible...@googlegroups.com
ok, so what happens when you run "apt-get dist-upgrade" , both these
commands are what ansible is running for you with those arguments to
the module.



--
Brian Coca

Bob Tanner

unread,
Jan 14, 2015, 6:07:55 PM1/14/15
to ansible...@googlegroups.com
Works as expected.

Oddly if I --limit "www.X.com" it also works.

When I'm -not- using --limit I do see this in the logs of the host that hangs then the "REMOTE_MODULE apt update_cache=yes cache_valid_time=3600" is run:

Jan 14 17:02:05 www sudo: pam_unix(sudo:auth): authentication failure; logname=ansible-user uid=1001 euid=0 tty=/dev/pts/3 ruser=ansible-user rhost=  user=ansible-user

But earlier in the logs I see

Jan 14 16:57:43 www ansible-setup: Invoked with filter=* fact_path=/etc/ansible/facts.d


So I know the sudo stuff is working for the "REMOTE_MODULE setup" stuff.

I'm really at a loss on why this is happening!

Bob Tanner

unread,
Jan 14, 2015, 9:27:10 PM1/14/15
to ansible...@googlegroups.com
I think I found the problem and I think it's a bug.

I have an inventory file that looks like this:


Once ansible hits the www.one-off.com every www.domainX.com after that has sudo authentication failure. If I comment out www.one-off.com everything works as expected (no hangs).

What's different with www.one-off.com? It has a different (one-off) sudo password.

hosts_vars/www.one-off.com.yml

---
ansible_ssh_pass: different-sudo-password
ansible_sudo_pass: different-sudo-password

ONCE www.one-off.com is played it seems like the sudo password from the command line is lost.

$ ansible-playbook --ask-pass --ask-sudo -i web web-servers.yml
SSH password:
sudo password [defaults to SSH password]:

The prompted for passwords (above) are what I call the defaults but the www.one-off.com has a different sudo password gotten from the hosts_vars file and any host listed after the www.one-off.com fails (unusable hangs).

Brian Coca

unread,
Jan 15, 2015, 8:28:55 AM1/15/15
to ansible...@googlegroups.com
seens like this issue
https://github.com/ansible/ansible/issues/9823



--
Brian Coca

Bob Tanner

unread,
Jan 15, 2015, 9:35:47 AM1/15/15
to ansible...@googlegroups.com
Yes, that's the issue.

The follow ups are not clear to me.

Is ansible behaving as expected? Meaning, if you set the 'ansible_sudo_pass' in a host_vars it's set from that point forward?

Brian Coca

unread,
Jan 15, 2015, 10:01:19 AM1/15/15
to ansible...@googlegroups.com
no, its a bug



--
Brian Coca
Reply all
Reply to author
Forward
0 new messages