restart service, check if port is ready to accept and then move to next host

257 views
Skip to first unread message

Sameer Modak

unread,
Oct 31, 2023, 10:08:09 AM10/31/23
to Ansible Project
restart service, check if service is ready to accept connection because it takes time to come up. Once we sure its listening on port then only move to next host. unless dont move because we can only afford to have one service down at a time.

is there any to short hand or ansible native way to handle this using ansible module.


code:

name: Restart zookeeper followers

  throttle: 1

  any_errors_fatal: true

  shell: |

     systemctl restart {{zookeeper_service_name}}  

     timeout 22 sh -c 'until nc localhost {{zookeeper_server_port}}; do sleep 1; done'

  when: not zkmode.stdout_lines is search('leader')



Will McDonald

unread,
Oct 31, 2023, 10:23:31 AM10/31/23
to ansible...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/67ca5f13-855d-4d40-a47a-c0fbe11ea3b5n%40googlegroups.com.

Sameer Modak

unread,
Oct 31, 2023, 1:54:28 PM10/31/23
to Ansible Project
Hello Will,

I have used throttle so that part is sorted. But i dont think wait_for works here for example.
task 1 restart. <--- now in this task already he has restarted all hosts one by one 
task 2 wait_for <-- this will fail if port does not come up but no use because restart is triggered.

we just want to know if in one task it restarts and checks if fails aborts play thats it. Now we got the results but used shell module.

Will McDonald

unread,
Oct 31, 2023, 7:58:40 PM10/31/23
to ansible...@googlegroups.com
I don't entirely understand your approach, constraints or end-to-end requirements here, but trying to read between the lines...

1. You have a cluster of zookeeper nodes (presumably 2n+1 so 3, 5 or more nodes)
2. You want to do a rolling restart of these nodes 1 at a time, wait for the node to come back up, check it's functioning, and if that doesn't work, fail the run
3. With your existing approach you can limit the restart of a service using throttle at the task level, but then don't know how to handle failure in a subsequent task
4. You don't think wait_for will work because you only throttle on the restart task

(Essentially you want your condition "has the service restarted successfully" to be in the task itself.)

Again some thoughts that might help you work through this...

1. Any reason you couldn't just use serial at a playbook level? If so, what is that?
2. If you must throttle rather than serial, consider using it in a block along with a failed_when
3. Try and avoid using shell and use builtin constructs like service, it'll save you longer term pain

Read through the links I posted earlier and explain what might stop you using the documented approach.

This post from Vladimir on Superuser might be useful too: https://superuser.com/questions/1664197/ansible-keyword-throttle (loads of other 2n+1 rolling update/restart examples out there too: https://stackoverflow.com/questions/62378317/ansible-rolling-restart-multi-cluster-environment)




Will McDonald

unread,
Oct 31, 2023, 8:03:14 PM10/31/23
to ansible...@googlegroups.com
Edit: s/along with a failed_when/along with wait_for/

Sameer Modak

unread,
Nov 1, 2023, 6:09:54 AM11/1/23
to Ansible Project
Let me try with block and serial and get back to you

Sameer Modak

unread,
Nov 3, 2023, 9:22:18 AM11/3/23
to Ansible Project
Hello will,



i tried to do it with block and serial no it does not work say's block cant have serial 

tasks:

  - name: block check

    block:

      - name: run this shell

        shell: 'systemctl restart "{{zookeeper_service_name}}"'


      - name: debug

        debug:

          msg: "running my task"


      - name: now run this task

        shell: timeout -k 3 1m sh -c 'until nc -zv localhost {{hostvars[inventory_hostname].zk_port}}; do sleep 1; done'


    when:

    - not zkmode is search('leader')

    serial: 1

~                                                


Will McDonald

unread,
Nov 3, 2023, 9:50:43 AM11/3/23
to ansible...@googlegroups.com
I think you've misunderstood what I suggested. (Or I've explained it poorly.)

If you use serial, you wouldn't need a block necessarily as you'd be executing over the inventory hosts one-at-a-time.

If you insist on sticking with throttle, try it with a block in order to group your service restart and service availability check.

I strongly going and taking the time to read the rolling update example that's already documented, understand it and then think about how to apply that to what you're trying to achieve.

 

Sameer Modak

unread,
Nov 3, 2023, 11:00:21 AM11/3/23
to Ansible Project
Ok my requirement is exactly the same.

EXactly the same.

list of taks needs to be run one by one on single host at a time 

Todd Lewis

unread,
Nov 3, 2023, 11:30:13 AM11/3/23
to ansible...@googlegroups.com, uto...@gmail.com
That's correct; serial is not a task or block key word. It's a playbook key word.
- name: One host at a time
  hosts: ducks_in_a_row
  serial: 1
  max_fail_percentage: 0
  tasks:
    - task1
    - task2
    - task3
Read up on serial and max_fail_percentage . Blocks don't come into it.

Sameer Modak

unread,
Nov 9, 2023, 11:29:28 AM11/9/23
to Ansible Project
Hello Todd,

I tried serial and it works but my problem is, serial works in playbook so when i write import_playbook inside include_task: zookeeper.yaml it fails saying u cant import playbook inside task.
Now, How do i do it then??

ok so let me give you how i am running basically i have created role prometheus which you can find here in below my personal public repo.  Role has its usual main.yml which includes tasks and i have created Restartandcheck.yml which i am unable to use because import_playbook error if i put in zookeeper.yml file


Zdenek Pyszko

unread,
Nov 16, 2023, 9:40:02 AM11/16/23
to Ansible Project
Hello Sameer,
my two cents here as i made a quick lookup to your repo.
I would suggest to refactor your repo to use roles.
You have three different playbooks referenced in main.yml, which are doing more or less the same job.
Create a role 'enable prometheus' which will be dynamic enough to make decision based on input variables (zookeeper, Kafka,...)
And one tiny role to restart the services(if needed).
Outcome: single playbook, one prometheus role, one service mgmt(restart) role, no DRY code(dont repeat yourself), re-usable.

Dne čtvrtek 9. listopadu 2023 v 17:29:28 UTC+1 uživatel Sameer Modak napsal:

Sameer Modak

unread,
Nov 19, 2023, 2:54:59 AM11/19/23
to Ansible Project
Thanks a lot Zdenek.

I got it now i have  heard your comments and converted this to something closer .


Can you plz spot if  there is a room for an improvement .

Zdenek Pyszko

unread,
Nov 24, 2023, 8:03:53 AM11/24/23
to Ansible Project
yea, getting better :)
Have a look to diff of my fork, what could work:
https://github.com/sameergithub5/prometheusrole/pull/1

It is still pretty raw, but it contains the idea.

Dne neděle 19. listopadu 2023 v 8:54:59 UTC+1 uživatel Sameer Modak napsal:

Evan Hisey

unread,
Nov 25, 2023, 1:49:00 PM11/25/23
to ansible...@googlegroups.com
Zdenek-
 Quick question on your pull request, possibly missing the obvious.  I see you use loop_control to set the outer loop variable on the roles.  My understanding is the the roles would be a different namespace for the loops, so not interfere with the {{ item }} for the control loop, so was this for control clarity, or am I missing something with a namespace conflict?

Zdenek Pyszko

unread,
Nov 27, 2023, 3:14:15 AM11/27/23
to Ansible Project
Hi Evan,
The loop_control part already came from Sameer, i just kept this part as i didnt want to bring another level of complexity.
But in general, i use loop_control pretty often, especially in some deeper structures and to enforce readability, i.e.:

* input vars structure
---
hypervisors:
hypervisor_1:
vms:
- name: vm_1
  state: stopped
- name: vm_2
  state: started
hypervisor_2:
vms:
- name: vm_3
  state: started


* main.yml
---
- name: Loop over hypervisors
  include_tasks: vm_action.yml
  loop: "{{ hypervisors  | dict2items }}"
  loop_control:
loop_var: hypervisor


* vm_action.yml
---
- name: Do action on VM
  debug:
    msg: "VM {{ vm.name }} on hypervisor {{ hypervisor.key }} is in state {{ vm.state }}"
  delegate_to: "{{ hypervisor.key }}"
  loop: "{{ hypervisor.value.vms }}"
  loop_control:
    loop_var: vm

Dne sobota 25. listopadu 2023 v 19:49:00 UTC+1 uživatel Evan Hisey napsal:

Zdenek Pyszko

unread,
Nov 27, 2023, 3:42:07 AM11/27/23
to Ansible Project
Regarding interference topic, this is how looping over the role without loop_var could look like. We all see its gonna be KABOOM :)


* main.yml
---
- name: Loop over hypervisors
  include_role:
name: vm_action
  loop: "{{ hypervisors  | dict2items }}"


* roles/vm_action/tasks/main.yml
---
- name: Do role action on VM  #CRASH
  debug:
    msg: "VM {{ item.name }} on hypervisor {{ item.key }} is in state {{ item.state }}" 
  delegate_to: "{{ item.key }}"
  loop: "{{ item.value.vms }}"



Dne pondělí 27. listopadu 2023 v 9:14:15 UTC+1 uživatel Zdenek Pyszko napsal:
Reply all
Reply to author
Forward
0 new messages