Selenium hub shutdowns when it receives SIGTERM signal from supervisord

1,692 views
Skip to first unread message

Malarvizhi Ganesan

unread,
Feb 16, 2021, 11:14:20 AM2/16/21
to Selenium Users
Hi All, 

We have selenium-grid setup in openshift and running it on daily basis.
Monthly 4-6 times (out of 30 test cycles, maximum 6 times), selenium-hub pod restarts which leads to few session loss & that causes test case failures. 

1. Pod configured with restart policy as "on-failure" - so hub pod restarts on failure (no issue here)
2. Reason for hub shutdown which leads to restart is  - receives SIGTERM signal to shutdown.
Log details from hub: 
          Trapped SIGTERM/SIGINT/x so shutting down supervisord
           WARN received SIGTERM indicating exit request
           INFO waiting for selenium-hub to die
           INFO stopped: selenium-hub (terminated by SIGTERM)
           Shutdown complete

Which process sends SIGTERM signal to hub and why?
What could be the possible reason for receiving SIGTERM signal in selenium hub?

Diego Molina

unread,
Mar 10, 2021, 8:36:17 AM3/10/21
to Selenium Users
Hi,

Something is killing the pod, which ends up killing the container. I would check the pod logs and overall cluster activity.

Malarvizhi Ganesan

unread,
Mar 11, 2021, 12:15:14 AM3/11/21
to Selenium Users
Hi Diego Molina,

Thanks for replying. We fixed this issue. But took sometime to update here. 
You are right :), something is killing the pod.

What is killing the pod?

We have readiness & liveness probe (Health check) defined with timeout of:  5 secs & failureThreshold (retry): 3 (default)  for selenium-hub. Sometime liveness probe didn't response within given 5 secs for all 3 retries which tells openshift to take care of un-healthy selenium-hub pod by restarting (openshift deployment/deployment-config) or moving it to not-ready state (openshift pod).

What is the Fix: Increased probes timeout from 5 to 30 secs & failureThreshold from 3 to 5 as recommended in 
fixed the issue. 

If you see the fix, it's too simple. But analyzing this issue was really superb experience. Once again thank you Diego Molina

Few suggestions if you are running your test in containerized env (to easier the analysis if you encounter any failures):
1. Set hub & chrome-node log level to "FINE" or "ALL" (Default: Info)
2. Collect & store all pod logs (hub, chrome & test pod) once execution completed.
3. Collect & store all pods event logs once execution completed.
4. Collect & store all pods status once execution completed.
5. Add readiness & liveness probe for both hub & node 

⇜Krishnan Mahadevan⇝

unread,
Mar 11, 2021, 1:44:08 AM3/11/21
to Selenium Users
Malarvizhi,

Thank you so much for taking the time to come back and close the thread with a detailed resolution and steps that helped you! Appreciate it.

Thanks & Regards
Krishnan Mahadevan

"All the desirable things in life are either illegal, expensive, fattening or in love with someone else!"
My Scribblings @ http://wakened-cognition.blogspot.com/
My Technical Scribblings @ https://rationaleemotions.com/


--
You received this message because you are subscribed to the Google Groups "Selenium Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to selenium-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/selenium-users/b31b36f6-fe52-4e92-a1a9-d0531459f4e4n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages