The short version;
I'm running tests against Fedora 22 Alpha Atomic host and Server, and am seeing various connection failures depending on ssh_connection settings. They all look like something reported against openssh-5.3 with the ControlPersist backport. I played around with various values and found a workaround, remove the ControlPersist value:
ssh_args = -o ControlMaster=auto
in ~/.ansible.cfg allows everything to work. The question is why?
The long version:
I went through the various options mentioned in the thread in a local ansible.cfg, disabling pipeline, changing to scp, clearing ssh_args entirely. The last option worked, so I dug further.
The ansible version is:
[fedora@atomic-master ~]$ rpm -q ansible
ansible-1.8.4-1.fc22.noarch
OpenSSH versions on each system:
192.168.122.10 | success | rc=0 >>
openssh-6.7p1-11.fc22.x86_64
192.168.122.11 | success | rc=0 >>
openssh-6.7p1-10.fc22.x86_64
No user based ssh settings, all values default from /etc/ansible: (p1-11 succeeds, p1-10 fails)
<192.168.122.10> PubkeyAuthentication=no ConnectTimeout=10 GSSAPIAuthentication=no ControlPath=/home/fedora/.ansible/cp/ansible-ssh-%h-%p-%r StrictHostKeyChecking=no ControlMaster=auto ControlPersist=60s
fatal: [192.168.122.10] => failed to transfer file to /home/fedora/.ansible/tmp/ansible-tmp-1427209312.33-30225738744475/setup:
Couldn't read packet: Connection reset by peer
<192.168.122.11> PubkeyAuthentication=no ConnectTimeout=10 GSSAPIAuthentication=no ControlPath=/home/fedora/.ansible/cp/ansible-ssh-%h-%p-%r StrictHostKeyChecking=no ControlMaster=auto ControlPersist=60s
User based ~/.ansible.cfg, Pipelining enabled (p1-11 succeeds, p1-10 fails)
host_key_checking = False
#ssh_args = -o ControlMaster=auto
<192.168.122.10> PubkeyAuthentication=no 'sudo -k && sudo -H -S -p "[sudo via ansible, key=irpxmyfkxjqtjkyqbfmyvogzyeygovsm] password: " -u root /bin/sh -c '"'"'echo SUDO-SUCCESS-irpxmyfkxjqtjkyqbfmyvogzyeygovsm; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python'"'"'' ConnectTimeout=10 GSSAPIAuthentication=no ControlPath=/home/fedora/.ansible/cp/ansible-ssh-%h-%p-%r StrictHostKeyChecking=no ControlMaster=auto ControlPersist=60s
fatal: [192.168.122.10] => ssh connection error waiting for sudo or su password prompt
<192.168.122.11> PubkeyAuthentication=no 'sudo -k && sudo -H -S -p "[sudo via ansible, key=hdearqulmxcbkpjxjlgwpyiiapebebju] password: " -u root /bin/sh -c '"'"'echo SUDO-SUCCESS-hdearqulmxcbkpjxjlgwpyiiapebebju; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python'"'"'' ConnectTimeout=10 GSSAPIAuthentication=no ControlPath=/home/fedora/.ansible/cp/ansible-ssh-%h-%p-%r StrictHostKeyChecking=no ControlMaster=auto ControlPersist=60s
User based ~/.ansible.cfg, Pipelining with ControlPersist removed (p1-11 succeeds, p1-10 fails)
host_key_checking = False
ssh_args = -o ControlMaster=auto
<192.168.122.10> PubkeyAuthentication=no 'sudo -k && sudo -H -S -p "[sudo via ansible, key=gbcczsohsqeiuyzxezohqbikibenkeun] password: " -u root /bin/sh -c '"'"'echo SUDO-SUCCESS-gbcczsohsqeiuyzxezohqbikibenkeun; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python'"'"'' ConnectTimeout=10 GSSAPIAuthentication=no StrictHostKeyChecking=no ControlMaster=auto
fatal: [192.168.122.10] => ssh connection error waiting for sudo or su password prompt
<192.168.122.11> PubkeyAuthentication=no 'sudo -k && sudo -H -S -p "[sudo via ansible, key=xjcccpqinmchipiubhsmjwdjvgmytbww] password: " -u root /bin/sh -c '"'"'echo SUDO-SUCCESS-xjcccpqinmchipiubhsmjwdjvgmytbww; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python'"'"'' ConnectTimeout=10 GSSAPIAuthentication=no StrictHostKeyChecking=no ControlMaster=auto
User based ~/.ansible.cfg, Remove ControlPersist and pipelining (p1-11 succeeds, p1-10 succeeds)
host_key_checking = False
ssh_args = -o ControlMaster=auto
<192.168.122.10> PubkeyAuthentication=no 'sudo -k && sudo -H -S -p "[sudo via ansible, key=dyqtvojqcsjschscszupwxnithfwjhqs] password: " -u root /bin/sh -c '"'"'echo SUDO-SUCCESS-dyqtvojqcsjschscszupwxnithfwjhqs; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/fedora/.ansible/tmp/ansible-tmp-1427210773.45-273681836317736/setup; rm -rf /home/fedora/.ansible/tmp/ansible-tmp-1427210773.45-273681836317736/ >/dev/null 2>&1'"'"'' ConnectTimeout=10 GSSAPIAuthentication=no StrictHostKeyChecking=no ControlMaster=auto
<192.168.122.11> PubkeyAuthentication=no 'sudo -k && sudo -H -S -p "[sudo via ansible, key=gixgtyagbeoctcldxprcqrjwuxdhscvr] password: " -u root /bin/sh -c '"'"'echo SUDO-SUCCESS-gixgtyagbeoctcldxprcqrjwuxdhscvr; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/fedora/.ansible/tmp/ansible-tmp-1427210773.47-126330968681284/setup; rm -rf /home/fedora/.ansible/tmp/ansible-tmp-1427210773.47-126330968681284/ >/dev/null 2>&1'"'"'' ConnectTimeout=10 GSSAPIAuthentication=no StrictHostKeyChecking=no ControlMaster=auto
Manually testing the ControlPersist values on the command line works as expected for both, will time out after 60s
[fedora@atomic-master ansible-atomic]$ ssh -F /dev/null -o ControlMaster=auto -o ControlPersist=60s -S Test_Master_Socket
fed...@192.168.122.11 echo Hello
[fedora@atomic-master ansible-atomic]$ ps -fu `whoami` | grep "[s]sh.*Test_Master_Socket"
fedora 1015 1 0 11:29 ? 00:00:00 ssh: Test_Master_Socket [mux]
[fedora@atomic-master ansible-atomic]$ ssh -F /dev/null -S Test_Master_Socket -O check 192.168.122.11
Master running (pid=1015)
Bottom line, I'm out of troubleshooting steps, not sure what the impact of the workaround is, and I think someone who has more depth should take a look. I'm cc'ing ansible-devel because I wasn't sure what the right forum for this sort of issue was. Hopefully this was clear!
Cheers,
-Matt M