Ansible fails to connect to newly provisioned EC2 on 3rd task after successful running first 2 tasks

ddrak...@gmail.com

unread,

Nov 13, 2017, 1:16:25 PM11/13/17

to Ansible Project

I have a playbook that creates EC2 instances and adds them to an in memory group using the add_host module. I am then able to connect to the in memory group and perform two successful commands before a third fails.

I am seeing this problem just running the same file module to create directories. I have something like this in my main playbook (ec2hosts is the in-memory group creating after provisioning)

- hosts: ec2hosts 

  user: ubuntu

  gather_facts: false

  name: try the setup

  tasks:

 - name: Get EC2 facts

      ec2_metadata_facts: 

      register: ec2_facts

- name: import configure role 

      import_role: 

        name: configure

      vars: 

        efs_ids: "{{ efs_id }}"

The configure role is very simple:

- name: Make the aws credentials directory

  file: 

    state: directory

    path: ~/.aws

- name: Make the hi directory

  file: 

    state: directory

    path: ~/.hi

- name: Make a temp directory

  file: 

    state: directory

    path: ~/.temp

- name: Make a bar directory

  file: 

    state: directory

    path: ~/.bar

And this fails at the Make a temp directory task. The failed output with -vvv looks like:

<35.160.185.188> (0, '', "Warning: Permanently added '35.160.185.188' (ECDSA) to the list of known hosts.\r\n") <35.160.185.188> ESTABLISH SSH CONNECTION FOR USER: ubuntu <35.160.185.188> SSH: EXEC ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i AnsibleTest.pem -o 'IdentityFile="[omitted_full_path]/AnsibleTest.pem"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=ubuntu -o ConnectTimeout=10 -tt 35.160.185.188 '/bin/sh -c '"'"'/usr/bin/python /home/ubuntu/.ansible/tmp/ansible-tmp-1510596698.75-58373657425242/file.py; rm -rf "/home/ubuntu/.ansible/tmp/ansible-tmp-1510596698.75-58373657425242/" > /dev/null 2>&1 && sleep 0'"'"'' <35.160.185.188> (255, '', 'ssh_exchange_identification: read: Connection reset by peer\r\n') fatal: [35.160.185.188]: UNREACHABLE! => { "changed": false, "msg": "Failed to connect to the host via ssh: ssh_exchange_identification: read: Connection reset by peer\r\n", "unreachable": true }

I am using the following ssh_args in my ansible.cfg for the playbook: 

ssh_args = -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i "AnsibleTest.pem" 

Does anyone know what's happening here? This seems pretty weird and I'm stuck. 

Thanks!

ddrak...@gmail.com

unread,

Nov 16, 2017, 11:35:18 AM11/16/17

to Ansible Project

I eventually fixed this by adding a retries parameter to the [ssh_connection] section of my ansible.cfg so it looks like the below.

[ssh_connection]

ssh_args = -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no 

retries = 10

Pretty lame! If anyone finds a better solution please let me know...

Matt Martz

unread,

Nov 16, 2017, 11:38:21 AM11/16/17

to ansible...@googlegroups.com

Perhaps the instance was not actually up and accessible yet. Did you use `wait_for` or `wait_for_connection` after creating the instance to wait and ensure they are up and accessible before moving on?

--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-project+unsubscribe@googlegroups.com.
To post to this group, send email to ansible-project@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/2f0e10a9-82d2-4af9-9c1d-e0ad2e64fdeb%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Matt Martz
@sivel
sivel.net

ddrak...@gmail.com

unread,

Dec 8, 2017, 2:42:48 PM12/8/17

to Ansible Project

Yes, I use the wait parameter in the ec2 module. And the weirdest thing is that two of the tasks work before the third fails, so the connection is up and working and then just stops working. I've seen this when uploading files as well with messages like "the sftp file transfer mechansi failed", but the retries work and come to the rescue.

From where I sit, I pretty much don't see a way to do any of this EC2 stuff without the retries and am surprised no one else has seen this.

To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To post to this group, send email to ansible...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/2f0e10a9-82d2-4af9-9c1d-e0ad2e64fdeb%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward