Can't run playbook against multiple nodes using ec2.py and AWS tags

139 views
Skip to first unread message

alexande...@paradoxplaza.com

unread,
Jan 22, 2016, 10:33:29 AM1/22/16
to Ansible Project
Hello,

I'm using the Amazon Dynamic inventory script from here. I've created some instances on EC2, and have given all of them a custom tag called Purpose, with a value like NodePurpose, and instance names Node1, Node2 etc.

I've written a playbook with:

---
hosts: tag_Purpose_NodePurpose

so that the playbook points to all the abovementioned nodes. When more than one node is up, I get random SSH errors from some or all of the nodes, with this message:

ERROR! failed to transfer file to /home/ubuntu/.ansible/tmp/ansible-tmp-1453475788.15-95937026898429/setup:
 
sftp> put /tmp/tmpIwd28M /home/ubuntu/.ansible/tmp/ansible-tmp-1453475788.15-95937026898429/setup
OpenSSH_6.7p1 Debian-5+deb8u1, OpenSSL 1.0.1k 8 Jan 2015
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: auto-mux: Trying existing master
debug2: fd 3 setting O_NONBLOCK
debug2: mux_client_hello_exchange: master version 4
debug3: mux_client_forwards: request forwardings: 0 local, 0 remote
debug3: mux_client_request_session: entering
debug3: mux_client_request_alive: entering
debug3: mux_client_request_alive: done pid = 10523
debug3: mux_client_request_session: session request sent
debug1: mux_client_request_session: master session id: 4
debug2: Remote version: 3
debug2: Server supports extension "posix-...@openssh.com" revision 1
debug2: Server supports extension "sta...@openssh.com" revision 2
debug2: Server supports extension "fsta...@openssh.com" revision 2
debug2: Server supports extension "hard...@openssh.com" revision 1
debug2: Server supports extension "fs...@openssh.com" revision 1
debug3: Sent message fd 5 T:16 I:1
debug3: SSH_FXP_REALPATH . -> /home/ubuntu size 0
debug3: Looking up /tmp/tmpIwd28M
debug3: Sent message fd 5 T:17 I:2
debug3: Received stat reply T:101 I:2
debug1: Couldn't stat remote file: No such file or directory
debug3: Sent message SSH2_FXP_OPEN I:3 P:/home/ubuntu/.ansible/tmp/ansible-tmp-1453475788.15-95937026898429/setup
remote open("/home/ubuntu/.ansible/tmp/ansible-tmp-1453475788.15-95937026898429/setup"): No such file or directory
debug3: mux_client_read_packet: read header failed: Broken pipe
debug2: Received exit status from master 0

It seems like ansible is looking for a tmp file that only exists on one instance, on other instances, too, and of course, it's not there. This has happened with all versions since 1.7, and with various versions of the ec2.py script (I've updated it from time to time).

Anyone have any idea what this is about and how to solve it?

Alexander

emmanuel...@menlosecurity.com

unread,
Jan 22, 2016, 12:10:57 PM1/22/16
to Ansible Project
Well....
do you nodes have the same ssh key?
is ubuntu your remote_user ?
is ~ubuntu/.ansible writable/readable?

Better yet, can you show us your playbook and the task it is failing on?
Best,
--
E

alexande...@paradoxplaza.com

unread,
Jan 25, 2016, 4:00:49 AM1/25/16
to Ansible Project
The nodes have the same ssh key, and the user is correct. In fact, when I try the playbook with --limit, and apply it on one node at a time, it works correctly. It's only when I group nodes using a custom EC2 tag, that this happens. 

Also, this has happened with several playbooks, and always during the "setup" task, before any of my own tasks. To demonstrate this, I wrote this ultra-simple playbook:

---
- hosts: tag_Purpose_NodePurpose
  gather_facts: True
  become: yes
  tasks:
    - debug: msg="This is just a demonstration"

It still happens. I got the following output (some stuff redacted):

 
I also ran the same playbook ONLY on the node that failed above, and it worked correctly.  

alexande...@paradoxplaza.com

unread,
Jan 25, 2016, 7:42:37 AM1/25/16
to Ansible Project
I believe I figured out the problem. I had set my control_path to this value in ansible.cfg:

[ssh_connection]
control_path = %(directory)s/%%r

Which didn't include %h (ie hostname), so connections overwrote each others' control sockets (by default, ansible seems to re-use SSH connections as decribed here). 

I tried both adding %h to the control path, and disabling multiplexing (ControlMaster=no), and both succeeded. 

So, kids, include at least %h, %p and %r in the control path, as the man page suggests, so that connections don't get mixed up.
Reply all
Reply to author
Forward
0 new messages