Ansible hangs on package installation for hours before connection to host is lost

403 views
Skip to first unread message

Pankaj Basnal

unread,
Sep 10, 2020, 11:29:25 AM9/10/20
to Ansible Project
Hi,
I'm new to ansible and I'm facing an issue with executing our playbook. 

Setup-
* For 10 hosts on azure
* all hosts are identical and newly created
* ansible.cfg- 
    [defaults]
          host_key_checking = False

* portion of playbook that is installing packages

- name: Set Initscripts to be Installed
  set_fact:
    packages_to_install: "{{ packages_to_install | default([]) + [ 'initscripts' ] }}"
  tags:
    - common
  when: ansible_distribution_major_version is version_compare('7', '<')

- name: Set Java to be Installed
  set_fact:
    packages_to_install: "{{ packages_to_install | default([]) + [ jboss_java_pkg_name ] }}"
  when: install_java|bool
  tags:
    - common

- name: Set Unzip to be Installed
  set_fact:
    packages_to_install: "{{ packages_to_install | default([]) + [ 'unzip' ] }}"
  tags:
    - common

- name: Install packages
  package:
    name: "{{ packages_to_install }}"
    state: present
  register: packages_result
  retries: 5
  until: packages_result is success

At the step of package installation, ansible will freeze without any log output for 2 hours and then finally it will throw ssh connection error that host is unreachable.

Logs of the run
* 9/8/2020 8:18:08 PM : "download_only": false,
* 9/8/2020 8:18:20 PM : 17403 1599576500.81986: _low_level_execute_command(): executing: /bin/sh -c 'rm -f -r /home/wbuser/.ansible/tmp/ansible-tmp-1599576402.5-17403-228811267706642/ > /dev/null 2>&1 && sleep 0'
* 9/8/2020 8:18:20 PM : >>><<<
* 9/8/2020 8:18:20 PM : 17403 1599576500.86101: attempt loop complete, returning result
* 9/8/2020 8:18:20 PM : 17403 1599576500.86111: dumping result to json
* 9/8/2020 8:18:20 PM : "invocation": {
* 9/8/2020 8:18:20 PM : "update_cache": false,
* 9/8/2020 8:18:20 PM : "unzip-6.0-21.el7.x86_64 providing unzip is already installed",
* 9/8/2020 10:28:42 PM : 17410 1599584322.50669: stderr chunk (state=3):
* 9/8/2020 10:28:42 PM : >>>Shared connection to 52.225.188.39 closed.
* 9/8/2020 10:28:42 PM : <<<
* 9/8/2020 10:28:42 PM : 17410 1599584322.50769: stderr chunk (state=3):
* 9/8/2020 10:28:42 PM : >>><<<
* 9/8/2020 10:28:42 PM : 17410 1599584322.50780: stdout chunk (state=3):
* 9/8/2020 10:28:42 PM : >>><<<
* 9/8/2020 10:28:42 PM : <52.225.188.39> (255, '', 'Shared connection to 52.225.188.39 closed.\r\n')
* 9/8/2020 10:28:42 PM : 17410 1599584322.50832: _low_level_execute_command(): starting
* 9/8/2020 10:28:42 PM : 17410 1599584322.50843: _low_level_execute_command(): executing: /bin/sh -c 'rm -f -r /home/wbuser/.ansible/tmp/ansible-tmp-1599576402.86-17410-193336400580842/ > /dev/null 2>&1 && sleep 0'
* 9/8/2020 10:28:42 PM : <52.225.188.39> ESTABLISH SSH CONNECTION FOR USER: wbuser
* 9/8/2020 10:28:42 PM : <52.225.188.39> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o Port=33000 -o 'IdentityFile="/tmp/AnsibleFiles/id_rsa"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="wbuser"' -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ControlPath=/root/.ansible/cp/cccd982d88 52.225.188.39 '/bin/sh -c '"'"'rm -f -r /home/wbuser/.ansible/tmp/ansible-tmp-1599576402.86-17410-193336400580842/ > /dev/null 2>&1 && sleep 0'"'"''
* 9/8/2020 10:28:43 PM : 17410 1599584323.14653: stderr chunk (state=2):
* 9/8/2020 10:28:43 PM : >>><<<
* 9/8/2020 10:28:43 PM : 17410 1599584323.14667: stdout chunk (state=2):
* 9/8/2020 10:28:43 PM : >>><<<
* 9/8/2020 10:28:43 PM : <52.225.188.39> (0, '', '')
* 9/8/2020 10:28:43 PM : 17410 1599584323.14703: _low_level_execute_command() done: rc=0, stdout=, stderr=
* 9/8/2020 10:28:43 PM : 17410 1599584323.14742: _execute() done
* 9/8/2020 10:28:43 PM : 17410 1599584323.14745: dumping result to json
* 9/8/2020 10:28:43 PM : 17410 1599584323.14750: done dumping result, returning
* 9/8/2020 10:28:43 PM : 17410 1599584323.14777: done running TaskExecutor() for JbossEAP-26/TASK: jboss_common_role : Install packages [000d3ae7-f205-69c8-6cc1-0000000000c9]
* 9/8/2020 10:28:43 PM : 17410 1599584323.14799: sending task result for task 000d3ae7-f205-69c8-6cc1-0000000000c9
* 9/8/2020 10:28:43 PM : 17410 1599584323.15331: done sending task result for task 000d3ae7-f205-69c8-6cc1-0000000000c9
* 9/8/2020 10:28:43 PM : 17410 1599584323.15339: WORKER PROCESS EXITING
* 9/8/2020 10:28:43 PM : fatal: [JbossEAP-26]: UNREACHABLE! => {
* 9/8/2020 10:28:43 PM : "changed": false,
* 9/8/2020 10:28:43 PM : "msg": "Failed to connect to the host via ssh: Shared connection to 52.225.188.39 closed.",
* 9/8/2020 10:28:43 PM : "unreachable": true
* 9/8/2020 10:28:43 PM : }

Can anyone help in understanding why this would be happening and what measures I can take to prevent this? 

Stefan Hornburg (Racke)

unread,
Sep 10, 2020, 11:49:55 AM9/10/20
to ansible...@googlegroups.com
There is really no point to retrying package installation. Better look at the output when it fails first.

Also your until: condition compares a dict with a variable named success. So it will never return a true
value.

Regards
Racke
> --
> You received this message because you are subscribed to the Google Groups "Ansible Project" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> ansible-proje...@googlegroups.com <mailto:ansible-proje...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ansible-project/9911e46a-5342-41d8-bd6f-7a50548e8bd6n%40googlegroups.com
> <https://groups.google.com/d/msgid/ansible-project/9911e46a-5342-41d8-bd6f-7a50548e8bd6n%40googlegroups.com?utm_medium=email&utm_source=footer>.


--
Ecommerce and Linux consulting + Perl and web application programming.
Debian and Sympa administration. Provisioning with Ansible.

signature.asc

Dick Visser

unread,
Sep 10, 2020, 11:52:08 AM9/10/20
to ansible...@googlegroups.com
On Thu, 10 Sep 2020 at 17:29, Pankaj Basnal <pankajb...@gmail.com> wrote:
>
> Hi,
> I'm new to ansible and I'm facing an issue with executing our playbook.
>
> Setup-
> * For 10 hosts on azure
> * all hosts are identical and newly created
> * ansible.cfg-
> [defaults]
> host_key_checking = False
>
> * portion of playbook that is installing packages
>
> - name: Set Initscripts to be Installed
> set_fact:
> packages_to_install: "{{ packages_to_install | default([]) + [ 'initscripts' ] }}"

IIRC this should not work - it will get into an endless loop.

This may or may not be your problem.
It occurs elsewhere below as well.

Pankaj Basnal

unread,
Sep 10, 2020, 12:37:42 PM9/10/20
to Ansible Project
This playbook works with less number of hosts. As the number of hosts increase the probability of error increases. There are a few runs which have completed successfully with less number of hosts. Though I don't have logs for them.
I'll take your suggestions and try without until and retry. But if the problem is with the dictionary check then it would have failed for all the hosts. But it failed for only 1 host and completed for the other 9. 

I can attach the complete log file if that helps

Prasad Shetty

unread,
Sep 10, 2020, 1:28:48 PM9/10/20
to ansible...@googlegroups.com
There are multiple way to improve the performance. I have noticed multiple time yum ansible module slow down the performance.

Try to use the pipelining or mitogen 

Please refer the below URL 




--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/9911e46a-5342-41d8-bd6f-7a50548e8bd6n%40googlegroups.com.

Stefan Hornburg (Racke)

unread,
Sep 11, 2020, 4:50:52 AM9/11/20
to ansible...@googlegroups.com
On 9/10/20 6:37 PM, Pankaj Basnal wrote:
> This playbook works with less number of hosts. As the number of hosts increase the probability of error increases. There
> are a few runs which have completed successfully with less number of hosts. Though I don't have logs for them.
> I'll take your suggestions and try without until and retry. But if the problem is with the dictionary check then it
> would have failed for all the hosts. But it failed for only 1 host and completed for the other 9. 
>

Yes that is correct .. success is a builtin test in Ansible.

But at any rate when the package install fails, that's a problem of the host (e.g. lack of resources,
installation problems) which need to be addressed instead of retrying the installation.

Regards
Racke

> I can attach the complete log file if that helps
> On Thursday, September 10, 2020 at 9:22:08 PM UTC+5:30 dick....@geant.org wrote:
>
> On Thu, 10 Sep 2020 at 17:29, Pankaj Basnal <pankajb...@gmail.com> wrote:
> >
> > Hi,
> > I'm new to ansible and I'm facing an issue with executing our playbook.
> >
> > Setup-
> > * For 10 hosts on azure
> > * all hosts are identical and newly created
> > * ansible.cfg-
> > [defaults]
> > host_key_checking = False
> >
> > * portion of playbook that is installing packages
> >
> > - name: Set Initscripts to be Installed
> > set_fact:
> > packages_to_install: "{{ packages_to_install | default([]) + [ 'initscripts' ] }}"
>
> IIRC this should not work - it will get into an endless loop.
>
> This may or may not be your problem.
> It occurs elsewhere below as well.
>
> --
> You received this message because you are subscribed to the Google Groups "Ansible Project" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> ansible-proje...@googlegroups.com <mailto:ansible-proje...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ansible-project/2e0b338b-7b94-46b2-80fa-20267b167d65n%40googlegroups.com
> <https://groups.google.com/d/msgid/ansible-project/2e0b338b-7b94-46b2-80fa-20267b167d65n%40googlegroups.com?utm_medium=email&utm_source=footer>.
signature.asc

Pankaj Basnal

unread,
Sep 29, 2020, 3:06:03 AM9/29/20
to Ansible Project

I logged in to the VMs when ansible got stuck and here are my findings -
1. I was able to login from the same container using same ssh keys.
2. The packages were installed successfully on the host machines and yet not confirmation was given from ansible on the host machine.

I'm using the redhat-cop/ansible-role-jboss-common repo for setting up the host machine.

Any specific logs that I should look for?

Pankaj Basnal

unread,
Sep 29, 2020, 3:06:55 AM9/29/20
to Ansible Project
Any reason why this would get stuck in endless loop? 


I'm using the redhat-cop/ansible-role-jboss-common repo for setting up the host machine. 
On Thursday, September 10, 2020 at 9:22:08 PM UTC+5:30 dick....@geant.org wrote:

Dick Visser

unread,
Sep 29, 2020, 3:31:38 AM9/29/20
to ansible...@googlegroups.com
I would try getting help from the authors of the redhat-cop repository first, if only to make sure things can actually work (this is not known).

To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.


To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/1ae9770e-3e6c-4ed6-b367-53312a3d4963n%40googlegroups.com.


--
Sent from a mobile device - please excuse the brevity, spelling and punctuation.
Reply all
Reply to author
Forward
0 new messages