| We updated kubernetes plugin from version 1.14.3 to version 1.14.9. On version 1.14.3 everything was running smoothly and after upgrade to 1.14.9 when Jenkins is under heavy load (starting/running ~100 jobs/pods) we observe that builds are stuck right after pod start or after Git checkout (first step of our test pipelines). We have a step timeout after 30 minutes and those jobs that were stuck could not be killed and was stuck with the following log:
> git checkout -f 073a008e8e8fdad44c3d637a8b9e5995277724ae
Cancelling nested steps due to timeout
Body did not finish within grace period; terminating with extreme prejudice
Only hard kill with calling POST BUILD_URL/kill did stop the build. What is interesting, sometimes those builds did fail and asked for changing max connections to k8s API
Caused: java.io.IOException: Interrupted while waiting for websocket connection, you should increase the Max connections to Kubernetes API
We increased the setting gradually to up to 60000 (sic!) and it did not solve the issue in any way. On other Jenkins instances running under not so heavy load there is no sign of the issue whatsoever. Can it be somehow related to expiring kubernetes clients introduced in 1.14.5? Downgrade to version 1.14.3 solved all the issues. Please let me know if any additional information should be provided. |