Hi
Diego Molina,
Thanks for replying. We fixed this issue. But took sometime to update here.
You are right :), something is killing the pod.
What is killing the pod?
We have readiness & liveness probe (Health check) defined with timeout of: 5 secs & failureThreshold (retry): 3 (default) for selenium-hub. Sometime liveness probe didn't response within given 5 secs for all 3 retries which tells openshift to take care of un-healthy selenium-hub pod by restarting (openshift deployment/deployment-config) or moving it to not-ready state (openshift pod).
What is the Fix: Increased probes timeout from 5 to 30 secs & failureThreshold from 3 to 5 as recommended in
fixed the issue.
If you see the fix, it's too simple. But analyzing this issue was really superb experience. Once again thank you Diego Molina
Few suggestions if you are running your test in containerized env (to easier the analysis if you encounter any failures):
1. Set hub & chrome-node log level to "FINE" or "ALL" (Default: Info)
2. Collect & store all pod logs (hub, chrome & test pod) once execution completed.
3. Collect & store all pods event logs once execution completed.
4. Collect & store all pods status once execution completed.
5. Add readiness & liveness probe for both hub & node