restart service only if its successful

113 views
Skip to first unread message

Sameer Modak

unread,
Jul 10, 2023, 3:17:04 PM7/10/23
to Ansible Project
Hello team,

I am trying to restart zookeeper service on all 3 nodes sequentially , we want restart zookeeper service one by one but it will be only run on 2nd server if 1st server zookeeper service was successful. How do i achieve this in 1 task .

task should fail if service is not properly restarted.

- name: restart zookeeper one by one on  follower first and ensure all is good

    throttle: 1

    service:

      name: 'confluent-zookeeper'

      state: restarted

    when: not zkmode.stdout_lines is search('leader')


  - name: check follower zookeeper are up and running

    shell: 'systemctl status confluent-zookeeper -l| grep -i error || systemctl status confluent-zookeeper | grep  failed'

    register: zkstatus

    failed_when: zkstatus.rc == 0


Now in this case , 1 task is getting executed on all hosts despite having error in logs . I want it to be failed as soon as it has error and should not continue on next server.




Sameer Modak

unread,
Jul 11, 2023, 10:26:27 AM7/11/23
to Ansible Project
I have used shell module and failed when combination to achieve this.

   - name: Get running processes list from remote host

    shell: "ps -efw | grep -e zookeeper.properties | grep -v grep |awk '{print $2}'"

    register: runningzkprc

  - name: Kill running processes

    throttle: 1

#    ignore_errors: yes

    shell: |

        kill -9 "{{ runningzkprc.stdout_lines[0] }}"

      sleep 3

      sleep 3

      systemctl start confluent-zookeeper510

      systemctl status confluent-zookeeper510 -l| grep -i error

    register: zkstart

    failed_when: zkstart.rc != 0

    when: not zkmode.stdout_lines is search('leader')


Todd Lewis

unread,
Jul 11, 2023, 12:29:59 PM7/11/23
to ansible...@googlegroups.com, uto...@gmail.com
Regrettably, pgrep and pkill seem widely unknown.
- name: Kill zookeeper processes and restart service
  ansible.builtin.shell: |
    if pkill --signal 9 -f zookeeper.properties ; then
       sleep 6
       systemctl start confluent-zookeeper510
    fi
    systemctl status confluent-zookeeper510
  register: zkstart
--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/7b8e12de-8c99-4f85-ba9a-618f7308cdc9n%40googlegroups.com.

-- 
Todd

Sameer Modak

unread,
Jul 12, 2023, 3:59:58 AM7/12/23
to Ansible Project
First of all Thanks  a lot for introducing me pkill way otherwise i would have written one more task to register the process id.

Many thanks Todd.  

Secondly, as its impossible to explore entire tool like ansible hence just asking you the question,  Does it even possible to do above things using ansible module in single task???
like find process id of process and kill if it doesnt exist do not proceed with other hosts as well.

Sameer Modak

unread,
Jul 12, 2023, 6:53:06 AM7/12/23
to Ansible Project
Hi Todd,

I tried to run script with shell module but it fails with below error,

[kafka-3: FAILED! => {"changed": true, "cmd": "if pkill --signal 9 -f zookeeper.properties ; then\nsystemctl start confluent-zookeeper510\nelse\nexit 1\nfi\nsystemctl status confluent-zookeeper510\n", "delta": "0:00:00.034111", "end": "2023-07-12 10:31:58.344951", "failed_when_result": true, "msg": "non-zero return code", "rc": -9, "start": "2023-07-12 10:31:58.310840", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

fatal: [kafka-4]: FAILED! => {"changed": true, "cmd": "if pkill --signal 9 -f zookeeper.properties ; then\nsystemctl start confluent-zookeeper510\nelse\nexit 1\nfi\nsystemctl status confluent-zookeeper510\n", "delta": "0:00:00.032830", "end": "2023-07-12 10:31:59.744091", "failed_when_result": true, "msg": "non-zero return code", "rc": -9, "start": "2023-07-12 10:31:59.711261", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

===========================================================================

code:

- name: Kill running java processes and start newly configured systemd

    throttle: 1

#    ignore_errors: yes

    any_errors_fatal: true

    shell: |

      if [ pkill --signal 9 -f zookeeper.properties ] ; then 

      systemctl start confluent-zookeeper510; sleep 2; echo "stat" | nc localhost 2181;st="$?" systemctl status confluent-zookeeper510 -l | grep -v ERRROR;et="$?"

      [[ $st -eq 0 && $et -eq 0 ]] && exit 0 || exit 1 

      else 

      exit 1 

      fi

    register: zkstart

    failed_when: zkstart.rc != 0

    when: not zkmode.stdout_lines is search('leader')


Sameer Modak

unread,
Jul 12, 2023, 6:57:54 AM7/12/23
to Ansible Project
Infact I tried code pasted by still gives same error.

 - name: Kill running java processes and start newly configured systemd

    throttle: 1

#    ignore_errors: yes

    any_errors_fatal: true

    shell: |

      if pkill --signal 9 -f zookeeper.properties ; then

        sleep 6

        systemctl start confluent-zookeeper510

      fi

      systemctl status confluent-zookeeper510

    register: zkstart

    failed_when: zkstart.rc != 0

    when: not zkmode.stdout_lines is search('leader')


Dick Visser

unread,
Jul 12, 2023, 7:31:27 AM7/12/23
to ansible...@googlegroups.com
Hii,


On Wed, 12 Jul 2023 at 12:53, Sameer Modak <sameer.m...@gmail.com> wrote:
Hi Todd,

I tried to run script with shell module but it fails with below error,

[kafka-3: FAILED! => {"changed": true, "cmd": "if pkill --signal 9 -f zookeeper.properties ; then\nsystemctl start confluent-zookeeper510\nelse\nexit 1\nfi\nsystemctl status confluent-zookeeper510\n", "delta": "0:00:00.034111", "end": "2023-07-12 10:31:58.344951", "failed_when_result": true, "msg": "non-zero return code", "rc": -9, "start": "2023-07-12 10:31:58.310840", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

fatal: [kafka-4]: FAILED! => {"changed": true, "cmd": "if pkill --signal 9 -f zookeeper.properties ; then\nsystemctl start confluent-zookeeper510\nelse\nexit 1\nfi\nsystemctl status confluent-zookeeper510\n", "delta": "0:00:00.032830", "end": "2023-07-12 10:31:59.744091", "failed_when_result": true, "msg": "non-zero return code", "rc": -9, "start": "2023-07-12 10:31:59.711261", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

===========================================================================

code:

- name: Kill running java processes and start newly configured systemd

    throttle: 1

#    ignore_errors: yes

    any_errors_fatal: true

    shell: |

      if [ pkill --signal 9 -f zookeeper.properties ] ; then 

      systemctl start confluent-zookeeper510; sleep 2; echo "stat" | nc localhost 2181;st="$?" systemctl status confluent-zookeeper510 -l | grep -v ERRROR;et="$?"


I feel we're going down a rabbit hole trying to fight ill designed systemd units with shell hacks (which include typos? ERRROR instead of ERROR).
My approach would be to make sure the systemd unit is doing what it should do, and then rely on that to do its job....

 

Stefan Hornburg (Racke)

unread,
Jul 12, 2023, 7:35:10 AM7/12/23
to ansible...@googlegroups.com
On 12/07/2023 13:30, Dick Visser wrote:
> Hii,
>
>
> On Wed, 12 Jul 2023 at 12:53, Sameer Modak <sameer.m...@gmail.com> wrote:
>
> Hi Todd,
>
> I tried to run script with shell module but it fails with below error,
>
> [kafka-3: FAILED! => {"changed": true, "cmd": "if pkill --signal 9 -f zookeeper.properties ; then\nsystemctl start confluent-zookeeper510\nelse\nexit 1\nfi\nsystemctl status confluent-zookeeper510\n", "delta": "0:00:00.034111", "end": "2023-07-12 10:31:58.344951", "failed_when_result": true, "msg": "non-zero return code", "rc": -9, "start": "2023-07-12 10:31:58.310840", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
>
> fatal: [kafka-4]: FAILED! => {"changed": true, "cmd": "if pkill --signal 9 -f zookeeper.properties ; then\nsystemctl start confluent-zookeeper510\nelse\nexit 1\nfi\nsystemctl status confluent-zookeeper510\n", "delta": "0:00:00.032830", "end": "2023-07-12 10:31:59.744091", "failed_when_result": true, "msg": "non-zero return code", "rc": -9, "start": "2023-07-12 10:31:59.711261", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
>
> ===========================================================================
>
> code:
>
> - name: Kill running java processes and start newly configured systemd
>
> throttle: 1
>
> #ignore_errors: yes
>
> any_errors_fatal: true
>
> shell: |
>
> if [ pkill --signal 9 -f zookeeper.properties ] ; then
>
> systemctl start confluent-zookeeper510; sleep 2; echo "stat" | nc localhost 2181;st="$?" systemctl status confluent-zookeeper510 -l | grep -v ERRROR;et="$?"
>
>
> I feel we're going down a rabbit hole trying to fight ill designed systemd units with shell hacks (which include typos? ERRROR instead of ERROR).
> My approach would be to make sure the systemd unit is doing what it should do, and then rely on that to do its job....

Certainly!

Regards

         Racke

>
> --
> You received this message because you are subscribed to the Google Groups "Ansible Project" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/CAF8BbLZQ7v-K7vSnFkGsVYwm_6a9HO3f1Cf7r8yBM%2Bb_DQcj4A%40mail.gmail.com <https://groups.google.com/d/msgid/ansible-project/CAF8BbLZQ7v-K7vSnFkGsVYwm_6a9HO3f1Cf7r8yBM%2Bb_DQcj4A%40mail.gmail.com?utm_medium=email&utm_source=footer>.


--
Automation expert - Ansible and friends
Linux administrator & Debian maintainer
Perl Dancer & conference hopper

Todd Lewis

unread,
Jul 12, 2023, 8:07:40 AM7/12/23
to ansible...@googlegroups.com, uto...@gmail.com
Maybe you should be using the systemctl action "reload-or-restart"?

Killing service processes out from under systemd should not be a thing.

And as Dick Visser said, we're straying from Ansible issues.

To your other question:

Secondly, as its impossible to explore entire tool like ansible hence just asking you the question,  Does it even possible to do above things using ansible module in single task???
like find process id of process and kill if it doesnt exist do not proceed with other hosts as well.
I know of no ansible module (besides "ansible.builtin.shell") that would let you do all that in a single task. But again, unless the service is broken or the units are basically wrong, you shouldn't need to do that. The service manager should be able to handle that for you, thus, a single task should suffice.
--
Todd
-- 
Todd

Sameer Modak

unread,
Jul 12, 2023, 10:47:09 AM7/12/23
to Ansible Project
I get it . But this is specific case where my current process is not managed by systemd so i had to use pkill anyways.

Now issue is shell module with if else is not working for me correctly. Like its giving rc -9. if i copy same shell script and rung using shell module like bash /tmp/zkproc.sh it works.

Reason to post here is due to shell module multiple command not working as expected. Below is shell script.  Now i had to use grep -v "ERROR;et="$?" because sometime process does says its running but log has some errors which cant be ignored.


if pkill --signal 9 -f zookeeper.properties ; then 

       sleep 2;systemctl start confluent-zookeeper510; sleep 2; echo "stat" | nc localhost 2181;st="$?"

       systemctl status confluent-zookeeper5910 -l | grep -v ERROR;et="$?"

       [[ $st -eq 0 && $et -eq 0 ]] && exit 0 || exit 1

else

exit 1

fi




Sameer Modak

unread,
Jul 13, 2023, 2:51:40 PM7/13/23
to Ansible Project
Hello Todd/Dick/Stephan,

This is how i did, do u thnink any of below tasks  can still be handled otherwise (more accurately) or this is the best way.

---

- hosts: all

  become: yes

  tasks:

  - name: create log dirs specifics to zookeeper,server and schema-registry

    file:

      path: "{{ item }}"

      state: directory

    loop:

      - /var/log/zookeeper

      - /var/log/kafka

      - /var/log/schema-registry

   

  - name: copy service files to /usr/lib/systemd/system

    template:

      src: "{{ item.src }}"

      dest: "{{ item.dest }}"

    loop:

      - {src: 'confluent-zookeeper.service.j2', dest: '/usr/lib/systemd/system/confluent-zookeeper.service'}

      - {src: 'confluent-kafka.service.j2', dest: '/usr/lib/systemd/system/confluent-kafka.service'}

      - {src: 'confluent-schema-registry.service.j2', dest: '/usr/lib/systemd/system/confluent-schema-registry.service'} 

    

  - name: systemd reload

    systemd:

      daemon_reload: true 

  - name: check who is existing zokeeper leader

    shell: 'echo stat | nc localhost 2181 | grep Mode'

    register: zkmode


  - name: get broker id 

    shell: |

      export brkid=$(ps -ef | grep -i server.properties | grep -v grep  | awk '{print $NF}')

      grep broker.id ${brkid} | awk -F'=' '{print $2}'

    register: brokerid


  - name: get controller id 

    shell: "echo dump | nc localhost 2181 | grep -A 2 -i controller | grep -i brokers | awk -F '/' '{print $NF}'"

    register: controllerid 


  - name: copy zookeeper,schema-registry and kafka service check files on servers

    template:

      src: check.j2

      dest: '/tmp/check{{ item }}.sh'

      mode: 551

    loop:

      - 'zookeeper'

      - 'schema-registry'

      - 'kafka' 


  - name: Kill running unmanaged java processes for zookeeper and schma-registry then  start newly configured systemd processess

    throttle: 1

#    ignore_errors: yes

    any_errors_fatal: true

    shell: "/bin/bash /tmp/check{{ item }}.sh"

    register: followerstat

    failed_when: followerstat.rc != 0

    when: not zkmode.stdout_lines is search('leader')

    loop: 

      - 'zookeeper'

      - 'schema-registry'



  - name: As all followers are up now repeat to kill running java cp and start systemd for leader

    any_errors_fatal: true

    shell: "/bin/bash /tmp/check{{ item }}.sh"

    register: leadeprocstat

    failed_when: leadeprocstat.rc != 0

    when: zkmode.stdout_lines is search('leader')

    loop: 

      - 'zookeeper'

      - 'schema-registry'

  

  - name: Kill running unmanaged java processes for broker and start broker from systemd for followers

    throttle: 1

    any_errors_fatal: true

    shell: "/bin/bash /tmp/checkkafka.sh"

    register: broprocstat

    failed_when: broprocstat.rc != 0

    when: (brokerid.stdout_lines[0] | int) != (controllerid.stdout_lines[0] | int)  


  - name: Kill running unmanaged java processes for broker and start broker from systemd for leader

    throttle: 1

    any_errors_fatal: true

    shell: "/bin/bash /tmp/checkkafka.sh"

    register: broprocstat

    failed_when: broprocstat.rc != 0

    when: (brokerid.stdout_lines[0] | int) == (controllerid.stdout_lines[0] | int)


If this is most accurate  way then i think guys  who are looking for similar solution can use this for reference.

Todd Lewis

unread,
Jul 13, 2023, 3:35:55 PM7/13/23
to ansible...@googlegroups.com, uto...@gmail.com
Any reason you don't want to follow https://docs.confluent.io/ansible/current/overview.html ("Ansible Playbooks for Confluent Platform")?

It's not clear how you got zookeeper and friends installed on these hosts without the benefit of a service manager. In any case, I'm surprised you're putting the systemd unit files elsewhere from /etc/systemd/system. And I don't see you enabling or starting those services.

Be sure to run ansible-lint and consider taking its recommendations.

Otherwise, without spinning up unmanaged instances myself, I don't have anything else to comment on.
--
Todd
If this is most accurate  way then i think guys  who are looking for similar solution can use this for reference.
--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.

Sameer Modak

unread,
Jul 14, 2023, 4:20:26 AM7/14/23
to Ansible Project
Hi Todd,
Morning!!!


Reason i didn't follow confluent playbook is because i had to develop by our own for customized services. Obviously i can take hind out of it.

Second thing why i did nt use systemd is because playbook is getting bigger but yeah u made good point i forgot to enable the serivce(by the way its there in that shell script i m running to check services its all in one)


I would like to appreciate you on suggesting ansible-lint i knew but never thaught of using but seems now its the time.

Last thing how do I appreciate your replies/time please let me know. Your responses encourages people to use ansible more and more.

Obviously its hard to remember syntax some shorthand tricks when u dont use that tool for 3 months but your replies made it easy.

Todd Lewis

unread,
Jul 14, 2023, 7:11:30 AM7/14/23
to ansible...@googlegroups.com, uto...@gmail.com
The cp-ansible project is a bit overwhelming, but it's written by the people who know the software the best. At least reading through it is instructive, not only from a Confluent standpoint, but as an example of how a relatively large Ansible project can be structured.

As for "how do I appreciate your replies/time please let me know" — ironically, the best way to show your appreciation for people who answer questions on forums like this is to ask interesting questions! Such people are here because they learn from the real-world problems actual users run into, and the challenge of producing solutions to those problems makes them better at their craft.

Good luck with your project.
--
Todd
Reply all
Reply to author
Forward
0 new messages