restart service, check if port is ready to accept and then move to next host

Sameer Modak

unread,

Oct 31, 2023, 10:08:09 AM10/31/23

to Ansible Project

restart service, check if service is ready to accept connection because it takes time to come up. Once we sure its listening on port then only move to next host. unless dont move because we can only afford to have one service down at a time.

is there any to short hand or ansible native way to handle this using ansible module.

code:

name: Restart zookeeper followers

throttle: 1

any_errors_fatal: true

shell: |

systemctl restart {{zookeeper_service_name}}

timeout 22 sh -c 'until nc localhost {{zookeeper_server_port}}; do sleep 1; done'

when: not zkmode.stdout_lines is search('leader')

Will McDonald

unread,

Oct 31, 2023, 10:23:31 AM10/31/23

to ansible...@googlegroups.com

I'd suggest reading up on rolling updates using serial:

https://docs.ansible.com/ansible/latest/playbook_guide/guide_rolling_upgrade.html#the-rolling-upgrade

https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_strategies.html#setting-the-batch-size-with-serial

You can use wait_for or wait_for_connection to ensure service availability before continuing:

https://docs.ansible.com/ansible/latest/collections/ansible/builtin/wait_for_module.html

https://docs.ansible.com/ansible/latest/collections/ansible/builtin/wait_for_connection_module.html

--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/67ca5f13-855d-4d40-a47a-c0fbe11ea3b5n%40googlegroups.com.

Sameer Modak

unread,

Oct 31, 2023, 1:54:28 PM10/31/23

to Ansible Project

Hello Will,

I have used throttle so that part is sorted. But i dont think wait_for works here for example.

task 1 restart. <--- now in this task already he has restarted all hosts one by one

task 2 wait_for <-- this will fail if port does not come up but no use because restart is triggered.

we just want to know if in one task it restarts and checks if fails aborts play thats it. Now we got the results but used shell module.

Will McDonald

unread,

Oct 31, 2023, 7:58:40 PM10/31/23

to ansible...@googlegroups.com

I don't entirely understand your approach, constraints or end-to-end requirements here, but trying to read between the lines...

1. You have a cluster of zookeeper nodes (presumably 2n+1 so 3, 5 or more nodes)

2. You want to do a rolling restart of these nodes 1 at a time, wait for the node to come back up, check it's functioning, and if that doesn't work, fail the run

3. With your existing approach you can limit the restart of a service using throttle at the task level, but then don't know how to handle failure in a subsequent task

4. You don't think wait_for will work because you only throttle on the restart task

(Essentially you want your condition "has the service restarted successfully" to be in the task itself.)

Again some thoughts that might help you work through this...

1. Any reason you couldn't just use serial at a playbook level? If so, what is that?

2. If you must throttle rather than serial, consider using it in a block along with a failed_when

3. Try and avoid using shell and use builtin constructs like service, it'll save you longer term pain

Read through the links I posted earlier and explain what might stop you using the documented approach.

This post from Vladimir on Superuser might be useful too: https://superuser.com/questions/1664197/ansible-keyword-throttle (loads of other 2n+1 rolling update/restart examples out there too: https://stackoverflow.com/questions/62378317/ansible-rolling-restart-multi-cluster-environment)

To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/3370b143-050a-4a14-a858-f5abe60c2678n%40googlegroups.com.

Will McDonald

unread,

Oct 31, 2023, 8:03:14 PM10/31/23

to ansible...@googlegroups.com

Edit: s/along with a failed_when/along with wait_for/

Sameer Modak

unread,

Nov 1, 2023, 6:09:54 AM11/1/23

to Ansible Project

Let me try with block and serial and get back to you

Sameer Modak

unread,

Nov 3, 2023, 9:22:18 AM11/3/23

to Ansible Project

Hello will,

i tried to do it with block and serial no it does not work say's block cant have serial

tasks:

- name: block check

block:

- name: run this shell

shell: 'systemctl restart "{{zookeeper_service_name}}"'

- name: debug

debug:

msg: "running my task"

- name: now run this task

shell: timeout -k 3 1m sh -c 'until nc -zv localhost {{hostvars[inventory_hostname].zk_port}}; do sleep 1; done'

when:

- not zkmode is search('leader')

serial: 1

~

Will McDonald

unread,

Nov 3, 2023, 9:50:43 AM11/3/23

to ansible...@googlegroups.com

I think you've misunderstood what I suggested. (Or I've explained it poorly.)

If you use serial, you wouldn't need a block necessarily as you'd be executing over the inventory hosts one-at-a-time.

If you insist on sticking with throttle, try it with a block in order to group your service restart and service availability check.

I strongly going and taking the time to read the rolling update example that's already documented, understand it and then think about how to apply that to what you're trying to achieve.

To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/69417f84-b761-4008-8284-ac644d3384f7n%40googlegroups.com.

Sameer Modak

unread,

Nov 3, 2023, 11:00:21 AM11/3/23

to Ansible Project

Ok my requirement is exactly the same.

https://stackoverflow.com/questions/64048208/run-ansible-tasks-on-hosts-one-by-one

EXactly the same.

list of taks needs to be run one by one on single host at a time

Todd Lewis

unread,

Nov 3, 2023, 11:30:13 AM11/3/23

to ansible...@googlegroups.com, uto...@gmail.com

That's correct; serial is not a task or block key word. It's a playbook key word.

- name: One host at a time
  hosts: ducks_in_a_row
  serial: 1
  max_fail_percentage: 0
  tasks:
    - task1
    - task2
    - task3

Read up on serial and max_fail_percentage . Blocks don't come into it.

To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/69417f84-b761-4008-8284-ac644d3384f7n%40googlegroups.com.

-- 
Todd

Sameer Modak

unread,

Nov 9, 2023, 11:29:28 AM11/9/23

to Ansible Project

Hello Todd,

I tried serial and it works but my problem is, serial works in playbook so when i write import_playbook inside include_task: zookeeper.yaml it fails saying u cant import playbook inside task.

Now, How do i do it then??

ok so let me give you how i am running basically i have created role prometheus which you can find here in below my personal public repo. Role has its usual main.yml which includes tasks and i have created Restartandcheck.yml which i am unable to use because import_playbook error if i put in zookeeper.yml file

https://github.com/sameergithub5/prometheusrole/tree/main/prometheus

Zdenek Pyszko

unread,

Nov 16, 2023, 9:40:02 AM11/16/23

to Ansible Project

Hello Sameer,

my two cents here as i made a quick lookup to your repo.

I would suggest to refactor your repo to use roles.

You have three different playbooks referenced in main.yml, which are doing more or less the same job.

Create a role 'enable prometheus' which will be dynamic enough to make decision based on input variables (zookeeper, Kafka,...)

And one tiny role to restart the services(if needed).

Outcome: single playbook, one prometheus role, one service mgmt(restart) role, no DRY code(dont repeat yourself), re-usable.

Dne čtvrtek 9. listopadu 2023 v 17:29:28 UTC+1 uživatel Sameer Modak napsal:

Sameer Modak

unread,

Nov 19, 2023, 2:54:59 AM11/19/23

to Ansible Project

Thanks a lot Zdenek.

I got it now i have heard your comments and converted this to something closer .

https://github.com/sameergithub5/prometheusrole/tree/main/node_exporter_and_prometheus_jmx_exporter

Can you plz spot if there is a room for an improvement .

Zdenek Pyszko

unread,

Nov 24, 2023, 8:03:53 AM11/24/23

to Ansible Project

yea, getting better :)
Have a look to diff of my fork, what could work:
https://github.com/sameergithub5/prometheusrole/pull/1

It is still pretty raw, but it contains the idea.

Dne neděle 19. listopadu 2023 v 8:54:59 UTC+1 uživatel Sameer Modak napsal:

Evan Hisey

unread,

Nov 25, 2023, 1:49:00 PM11/25/23

to ansible...@googlegroups.com

Zdenek-

Quick question on your pull request, possibly missing the obvious. I see you use loop_control to set the outer loop variable on the roles. My understanding is the the roles would be a different namespace for the loops, so not interfere with the {{ item }} for the control loop, so was this for control clarity, or am I missing something with a namespace conflict?

To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/f52dca30-05ac-49ad-9622-5539f037cdf0n%40googlegroups.com.

Zdenek Pyszko

unread,

Nov 27, 2023, 3:14:15 AM11/27/23

to Ansible Project

Hi Evan,
The loop_control part already came from Sameer, i just kept this part as i didnt want to bring another level of complexity.
But in general, i use loop_control pretty often, especially in some deeper structures and to enforce readability, i.e.:

* input vars structure
---
hypervisors:
hypervisor_1:
vms:
- name: vm_1
state: stopped
- name: vm_2
state: started
hypervisor_2:
vms:
- name: vm_3
state: started

* main.yml
---
- name: Loop over hypervisors
include_tasks: vm_action.yml
loop: "{{ hypervisors | dict2items }}"
loop_control:
loop_var: hypervisor

* vm_action.yml
---
- name: Do action on VM
debug:
msg: "VM {{ vm.name }} on hypervisor {{ hypervisor.key }} is in state {{ vm.state }}"
delegate_to: "{{ hypervisor.key }}"
loop: "{{ hypervisor.value.vms }}"
loop_control:
loop_var: vm

Dne sobota 25. listopadu 2023 v 19:49:00 UTC+1 uživatel Evan Hisey napsal:

Zdenek Pyszko

unread,

Nov 27, 2023, 3:42:07 AM11/27/23

to Ansible Project

Regarding interference topic, this is how looping over the role without loop_var could look like. We all see its gonna be KABOOM :)

* main.yml
---
- name: Loop over hypervisors

include_role:
name: vm_action
loop: "{{ hypervisors | dict2items }}"

* roles/vm_action/tasks/main.yml
---
- name: Do role action on VM #CRASH
debug:
msg: "VM {{ item.name }} on hypervisor {{ item.key }} is in state {{ item.state }}"
delegate_to: "{{ item.key }}"
loop: "{{ item.value.vms }}"

Dne pondělí 27. listopadu 2023 v 9:14:15 UTC+1 uživatel Zdenek Pyszko napsal:

Reply all

Reply to author

Forward