- name: restart zookeeper one by one on follower first and ensure all is good
throttle: 1
service:
name: 'confluent-zookeeper'
state: restarted
when: not zkmode.stdout_lines is search('leader')
- name: check follower zookeeper are up and running
shell: 'systemctl status confluent-zookeeper -l| grep -i error || systemctl status confluent-zookeeper | grep failed'
register: zkstatus
failed_when: zkstatus.rc == 0
Now in this case , 1 task is getting executed on all hosts despite having error in logs . I want it to be failed as soon as it has error and should not continue on next server.
- name: Get running processes list from remote host
shell: "ps -efw | grep -e zookeeper.properties | grep -v grep |awk '{print $2}'"
register: runningzkprc
- name: Kill running processes
throttle: 1
# ignore_errors: yes
shell: |
kill -9 "{{ runningzkprc.stdout_lines[0] }}"
sleep 3
sleep 3
systemctl start confluent-zookeeper510
systemctl status confluent-zookeeper510 -l| grep -i error
register: zkstart
failed_when: zkstart.rc != 0
when: not zkmode.stdout_lines is search('leader')
- name: Kill zookeeper processes and restart service
ansible.builtin.shell: |
if pkill --signal 9 -f zookeeper.properties ; then
sleep 6
systemctl start confluent-zookeeper510
fi
systemctl status confluent-zookeeper510
register: zkstart
--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/7b8e12de-8c99-4f85-ba9a-618f7308cdc9n%40googlegroups.com.
-- Todd
[kafka-3: FAILED! => {"changed": true, "cmd": "if pkill --signal 9 -f zookeeper.properties ; then\nsystemctl start confluent-zookeeper510\nelse\nexit 1\nfi\nsystemctl status confluent-zookeeper510\n", "delta": "0:00:00.034111", "end": "2023-07-12 10:31:58.344951", "failed_when_result": true, "msg": "non-zero return code", "rc": -9, "start": "2023-07-12 10:31:58.310840", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
fatal: [kafka-4]: FAILED! => {"changed": true, "cmd": "if pkill --signal 9 -f zookeeper.properties ; then\nsystemctl start confluent-zookeeper510\nelse\nexit 1\nfi\nsystemctl status confluent-zookeeper510\n", "delta": "0:00:00.032830", "end": "2023-07-12 10:31:59.744091", "failed_when_result": true, "msg": "non-zero return code", "rc": -9, "start": "2023-07-12 10:31:59.711261", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
===========================================================================
code:
- name: Kill running java processes and start newly configured systemd
throttle: 1
# ignore_errors: yes
any_errors_fatal: true
shell: |
if [ pkill --signal 9 -f zookeeper.properties ] ; then
systemctl start confluent-zookeeper510; sleep 2; echo "stat" | nc localhost 2181;st="$?" systemctl status confluent-zookeeper510 -l | grep -v ERRROR;et="$?"
[[ $st -eq 0 && $et -eq 0 ]] && exit 0 || exit 1
else
exit 1
fi
register: zkstart
failed_when: zkstart.rc != 0
when: not zkmode.stdout_lines is search('leader')
- name: Kill running java processes and start newly configured systemd
throttle: 1
# ignore_errors: yes
any_errors_fatal: true
shell: |
if pkill --signal 9 -f zookeeper.properties ; then
sleep 6
systemctl start confluent-zookeeper510
fi
systemctl status confluent-zookeeper510
register: zkstart
failed_when: zkstart.rc != 0
when: not zkmode.stdout_lines is search('leader')
Hi Todd,I tried to run script with shell module but it fails with below error,[kafka-3: FAILED! => {"changed": true, "cmd": "if pkill --signal 9 -f zookeeper.properties ; then\nsystemctl start confluent-zookeeper510\nelse\nexit 1\nfi\nsystemctl status confluent-zookeeper510\n", "delta": "0:00:00.034111", "end": "2023-07-12 10:31:58.344951", "failed_when_result": true, "msg": "non-zero return code", "rc": -9, "start": "2023-07-12 10:31:58.310840", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
fatal: [kafka-4]: FAILED! => {"changed": true, "cmd": "if pkill --signal 9 -f zookeeper.properties ; then\nsystemctl start confluent-zookeeper510\nelse\nexit 1\nfi\nsystemctl status confluent-zookeeper510\n", "delta": "0:00:00.032830", "end": "2023-07-12 10:31:59.744091", "failed_when_result": true, "msg": "non-zero return code", "rc": -9, "start": "2023-07-12 10:31:59.711261", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
===========================================================================
code:
- name: Kill running java processes and start newly configured systemd
throttle: 1
# ignore_errors: yes
any_errors_fatal: true
shell: |
if [ pkill --signal 9 -f zookeeper.properties ] ; then
systemctl start confluent-zookeeper510; sleep 2; echo "stat" | nc localhost 2181;st="$?" systemctl status confluent-zookeeper510 -l | grep -v ERRROR;et="$?"
Secondly, as its impossible to explore entire tool like ansible hence just asking you the question, Does it even possible to do above things using ansible module in single task???like find process id of process and kill if it doesnt exist do not proceed with other hosts as well.
-- Todd
if pkill --signal 9 -f zookeeper.properties ; then
sleep 2;systemctl start confluent-zookeeper510; sleep 2; echo "stat" | nc localhost 2181;st="$?"
systemctl status confluent-zookeeper5910 -l | grep -v ERROR;et="$?"
[[ $st -eq 0 && $et -eq 0 ]] && exit 0 || exit 1
else
exit 1
fi
---
- hosts: all
become: yes
tasks:
- name: create log dirs specifics to zookeeper,server and schema-registry
file:
path: "{{ item }}"
state: directory
loop:
- /var/log/zookeeper
- /var/log/kafka
- /var/log/schema-registry
- name: copy service files to /usr/lib/systemd/system
template:
src: "{{ item.src }}"
dest: "{{ item.dest }}"
loop:
- {src: 'confluent-zookeeper.service.j2', dest: '/usr/lib/systemd/system/confluent-zookeeper.service'}
- {src: 'confluent-kafka.service.j2', dest: '/usr/lib/systemd/system/confluent-kafka.service'}
- {src: 'confluent-schema-registry.service.j2', dest: '/usr/lib/systemd/system/confluent-schema-registry.service'}
- name: systemd reload
systemd:
daemon_reload: true
- name: check who is existing zokeeper leader
shell: 'echo stat | nc localhost 2181 | grep Mode'
register: zkmode
- name: get broker id
shell: |
export brkid=$(ps -ef | grep -i server.properties | grep -v grep | awk '{print $NF}')
grep broker.id ${brkid} | awk -F'=' '{print $2}'
register: brokerid
- name: get controller id
shell: "echo dump | nc localhost 2181 | grep -A 2 -i controller | grep -i brokers | awk -F '/' '{print $NF}'"
register: controllerid
- name: copy zookeeper,schema-registry and kafka service check files on servers
template:
src: check.j2
dest: '/tmp/check{{ item }}.sh'
mode: 551
loop:
- 'zookeeper'
- 'schema-registry'
- 'kafka'
- name: Kill running unmanaged java processes for zookeeper and schma-registry then start newly configured systemd processess
throttle: 1
# ignore_errors: yes
any_errors_fatal: true
shell: "/bin/bash /tmp/check{{ item }}.sh"
register: followerstat
failed_when: followerstat.rc != 0
when: not zkmode.stdout_lines is search('leader')
loop:
- 'zookeeper'
- 'schema-registry'
- name: As all followers are up now repeat to kill running java cp and start systemd for leader
any_errors_fatal: true
shell: "/bin/bash /tmp/check{{ item }}.sh"
register: leadeprocstat
failed_when: leadeprocstat.rc != 0
when: zkmode.stdout_lines is search('leader')
loop:
- 'zookeeper'
- 'schema-registry'
- name: Kill running unmanaged java processes for broker and start broker from systemd for followers
throttle: 1
any_errors_fatal: true
shell: "/bin/bash /tmp/checkkafka.sh"
register: broprocstat
failed_when: broprocstat.rc != 0
when: (brokerid.stdout_lines[0] | int) != (controllerid.stdout_lines[0] | int)
- name: Kill running unmanaged java processes for broker and start broker from systemd for leader
throttle: 1
any_errors_fatal: true
shell: "/bin/bash /tmp/checkkafka.sh"
register: broprocstat
failed_when: broprocstat.rc != 0
when: (brokerid.stdout_lines[0] | int) == (controllerid.stdout_lines[0] | int)
If this is most accurate way then i think guys who are looking for similar solution can use this for reference.
If this is most accurate way then i think guys who are looking for similar solution can use this for reference.
--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/f38bfb8a-ddf1-448c-a1f5-74bb8193ff64n%40googlegroups.com.
-- Todd
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/d39a758c-1599-433b-85d2-ef13a7dfe9e2n%40googlegroups.com.
-- Todd