Ansible fails to connect to newly provisioned EC2 on 3rd task after successful running first 2 tasks

199 views
Skip to first unread message

ddrak...@gmail.com

unread,
Nov 13, 2017, 1:16:25 PM11/13/17
to Ansible Project
I have a playbook that creates EC2 instances and adds them to an in memory group using the add_host module. I am then able to connect to the in memory group and perform two successful commands before a third fails. 

I am seeing this problem just running the same file module to create directories. I have something like this in my main playbook (ec2hosts is the in-memory group creating after provisioning)

- hosts: ec2hosts
user: ubuntu
gather_facts: false
name: try the setup
tasks:
- name: Get EC2 facts
ec2_metadata_facts:
register: ec2_facts
- name: import configure role
import_role:
name: configure
vars:
efs_ids: "{{ efs_id }}"

The configure role is very simple:

- name: Make the aws credentials directory
file:
state: directory
path: ~/.aws
- name: Make the hi directory
file:
state: directory
path: ~/.hi
- name: Make a temp directory
file:
state: directory
path: ~/.temp
- name: Make a bar directory
file:
state: directory
path: ~/.bar



And this fails at the Make a temp directory task. The failed output with -vvv looks like:

<35.160.185.188> (0, '', "Warning: Permanently added '35.160.185.188' (ECDSA) to the list of known hosts.\r\n") <35.160.185.188> ESTABLISH SSH CONNECTION FOR USER: ubuntu <35.160.185.188> SSH: EXEC ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i AnsibleTest.pem -o 'IdentityFile="[omitted_full_path]/AnsibleTest.pem"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=ubuntu -o ConnectTimeout=10 -tt 35.160.185.188 '/bin/sh -c '"'"'/usr/bin/python /home/ubuntu/.ansible/tmp/ansible-tmp-1510596698.75-58373657425242/file.py; rm -rf "/home/ubuntu/.ansible/tmp/ansible-tmp-1510596698.75-58373657425242/" > /dev/null 2>&1 && sleep 0'"'"'' <35.160.185.188> (255, '', 'ssh_exchange_identification: read: Connection reset by peer\r\n') fatal: [35.160.185.188]: UNREACHABLE! => { "changed": false, "msg": "Failed to connect to the host via ssh: ssh_exchange_identification: read: Connection reset by peer\r\n", "unreachable": true }

I am using the following ssh_args in my ansible.cfg for the playbook:
ssh_args = -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i "AnsibleTest.pem"

Does anyone know what's happening here? This seems pretty weird and I'm stuck.
Thanks!

ddrak...@gmail.com

unread,
Nov 16, 2017, 11:35:18 AM11/16/17
to Ansible Project
I eventually fixed this by adding a retries parameter to the [ssh_connection] section of my ansible.cfg so it looks like the below. 

[ssh_connection]
ssh_args = -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no
retries = 10

Pretty lame! If anyone finds a better solution please let me know...

Matt Martz

unread,
Nov 16, 2017, 11:38:21 AM11/16/17
to ansible...@googlegroups.com
Perhaps the instance was not actually up and accessible yet.  Did you use `wait_for` or `wait_for_connection` after creating the instance to wait and ensure they are up and accessible before moving on?

--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-project+unsubscribe@googlegroups.com.
To post to this group, send email to ansible-project@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/2f0e10a9-82d2-4af9-9c1d-e0ad2e64fdeb%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Matt Martz
@sivel
sivel.net

ddrak...@gmail.com

unread,
Dec 8, 2017, 2:42:48 PM12/8/17
to Ansible Project
Yes, I use the wait parameter in the ec2 module. And the weirdest thing is that two of the tasks work before the third fails, so the connection is up and working and then just stops working. I've seen this when uploading files as well with messages like "the sftp file transfer mechansi failed", but the retries work and come to the rescue.

From where I sit, I pretty much don't see a way to do any of this EC2 stuff without the retries and am surprised no one else has seen this. 
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To post to this group, send email to ansible...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages