Reliably wait_for SSH connection to go up

1,984 views
Skip to first unread message

anatoly techtonik

unread,
Apr 8, 2014, 11:26:15 PM4/8/14
to ansible...@googlegroups.com
---
- hosts: '{{ host }}'
  gather_facts: no
  tasks:
  - name: reboot machine
    command: /sbin/reboot

  - name: waiting for machine to come back
    local_action: wait_for host={{ ansible_ssh_host }} port={{ ansible_ssh_port }} delay=30 timeout=180 state=started

  - command: echo "X"


This fails with;


$ ansible-playbook tasks/reboot-and-wait.yml -e "host=node"

PLAY [node] **************************************************************

TASK: [reboot machine] ********************************************************
changed: [node]

TASK: [waiting for machine to come back] **************************************
ok: [node]

TASK: [command echo "X"] ******************************************************
No handlers could be found for logger "paramiko.transport"
fatal: [node] => {'msg': 'FAILED: Error reading SSH protocol banner', 'failed': True}

FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/root/reboot-and-wait.retry

svnserver                  : ok=2    changed=1    unreachable=1    failed=0


I can add additional delay, but I want to connect as soon as possible. Is there a way to check that not only port if open, but service on remote end is also up?

anatoly techtonik

unread,
Apr 8, 2014, 11:49:56 PM4/8/14
to ansible...@googlegroups.com
I also tried this one - doesn't work.

  - name: waiting for machine to come back
    local_action: wait_for host={{ ansible_ssh_host }} port={{ ansible_ssh_port }} delay=30 timeout=180

  #- pause: seconds=5

  - command: echo "X"
    register: status
    ignore_errors: True
    until: status != "X"
    retries: 5
    delay: 10

Same  {'msg': 'FAILED: Error reading SSH protocol banner', 'failed': True}

Michael DeHaan

unread,
Apr 9, 2014, 10:12:43 AM4/9/14
to ansible...@googlegroups.com
Sounds like the port was open but the SSH agent wasn't ready to receive traffic yet.

I'd suggest inserting a small call to "pause: seconds=5" after the wait_for and see if that resolves the issue.

It usually does.




--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To post to this group, send email to ansible...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/211f6dc8-d158-4b49-8c39-2da969404319%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

anatoly techtonik

unread,
Apr 9, 2014, 11:44:30 AM4/9/14
to ansible...@googlegroups.com
Port is open, but SSH agent is not ready to receive traffic.

pause: seconds=5 command already was there, doesn't help.
increasing to 30 sometimes helps and sometimes not - depending
on the load on virtual machine. I hoped that there is a way to set
flexible wait time from 30 seconds up to 15 minutes. Setting pause
to 15 minutes will annoy humans and to 30 seconds will likely to
break automation at some point.
> You received this message because you are subscribed to a topic in the
> Google Groups "Ansible Project" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/ansible-project/z9uaC4F_L0Y/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> ansible-proje...@googlegroups.com.
> To post to this group, send email to ansible...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ansible-project/CAEVJ8QN_LjqZB_J6Kesko5kcpD7c3tenksg3fV8hXTCoLNDkQQ%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.



--
anatoly t.

Scott Anderson

unread,
Apr 9, 2014, 1:51:57 PM4/9/14
to ansible...@googlegroups.com
I'm having exactly this problem today. The script already takes 2 minutes to run so I am loathe to add more time to a wait, and at some point that will be insufficient again.

It would be nice to have a wait that specifically checks to make sure a ssh connection works instead of depending on a port check only.

-scott

James Martin

unread,
Apr 9, 2014, 6:51:43 PM4/9/14
to ansible...@googlegroups.com
Since wait_for isn't work for you, might I suggest as a hacky workaround utilizing a do-until loop?

docs:   http://docs.ansible.com/playbooks_loops.html#do-until-loops

Basically you would define a local action that would attempt to ssh to the inventory_hostname and run echo foo.  If it returns successfully it should mean that ssh is fully ready.  It will keep trying until it gets a return code that indicates a success or fails 5 times with a 10 second interval.

Here's how it would look:

- hosts: 10.42.0.6
  gather_facts: no
  tasks:
  - local_action: command ssh vagrant@{{ inventory_hostname }} echo foo
    register: result
    until: result.rc  == 0
    retries: 20
    delay: 10



On Wednesday, April 9, 2014 11:44:30 AM UTC-4, anatoly techtonik wrote:

Michael DeHaan

unread,
Apr 15, 2014, 4:47:44 PM4/15/14
to ansible...@googlegroups.com
I really don't like the idea of executing SSH inside of ansible directly because it wouldn't work with any of the --ask-pass options.  (But should be fine with ssh-agent).

Anyway, haven't seen this myself.   Having SSH not functional after 30 seconds of the port being open seems quite odd to me to be honest.


Reply all
Reply to author
Forward
0 new messages