[JIRA] (JENKINS-56735) Builds hanging after pod start in version 1.14.9

5 views
Skip to first unread message

karolgil.kg@gmail.com (JIRA)

unread,
Mar 25, 2019, 5:21:02 AM3/25/19
to jenkinsc...@googlegroups.com
Karol Gil created an issue
 
Jenkins / Bug JENKINS-56735
Builds hanging after pod start in version 1.14.9
Issue Type: Bug Bug
Assignee: Carlos Sanchez
Components: kubernetes-plugin
Created: 2019-03-25 09:20
Environment: EKS version 1.11.8
Jenkins version 2.164.1
Kubernetes Plugin version 1.14.9
Java version 8

Jenkins installation from Helm chart

Jenkins running with following JVM args:
 -Dkubernetes.websocket.ping.interval=30000
 -Dkubernetes.websocket.timeout=10000
These were introduced due to reoccurring socket timeouts and did solve the issue.
Labels: kuberenetes-plugin kubernetes plugin jenkins
Priority: Major Major
Reporter: Karol Gil

We updated kubernetes plugin from version 1.14.3 to version 1.14.9. On version 1.14.3 everything was running smoothly and after upgrade to 1.14.9 when Jenkins is under heavy load (starting/running ~100 jobs/pods) we observe that builds are stuck right after pod start or after Git checkout (first step of our test pipelines).

We have a step timeout after 30 minutes and those jobs that were stuck could not be killed and was stuck with the following log:
 

 > git checkout -f 073a008e8e8fdad44c3d637a8b9e5995277724ae
Cancelling nested steps due to timeout
Body did not finish within grace period; terminating with extreme prejudice

Only hard kill with calling POST BUILD_URL/kill did stop the build.
What is interesting, sometimes those builds did fail and asked for changing max connections to k8s API

Caused: java.io.IOException: Interrupted while waiting for websocket connection, you should increase the Max connections to Kubernetes API

We increased the setting gradually to up to 60000 (sic!) and it did not solve the issue in any way. On other Jenkins instances running under not so heavy load there is no sign of the issue whatsoever. Can it be somehow related to expiring kubernetes clients introduced in 1.14.5?
Downgrade to version 1.14.3 solved all the issues.

Please let me know if any additional information should be provided.

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

karolgil.kg@gmail.com (JIRA)

unread,
Mar 25, 2019, 7:35:03 AM3/25/19
to jenkinsc...@googlegroups.com
Karol Gil updated an issue
Change By: Karol Gil
We updated kubernetes plugin from version 1.14.3 to version 1.14.9. On version 1.14.3 everything was running smoothly and after upgrade to 1.14.9 when Jenkins is under heavy load (starting/running ~100 jobs/pods) we observe that builds are stuck right after pod start or after Git checkout (first step of our test pipelines).


We have a step timeout after 30 minutes and those jobs that were stuck could not be killed and was stuck with the following log:
 
{code:java}

> git checkout -f 073a008e8e8fdad44c3d637a8b9e5995277724ae
Cancelling nested steps due to timeout
Body did not finish within grace period; terminating with extreme prejudice
{code}


Only hard kill with calling {{POST BUILD_URL/kill}} did stop the build.
What is interesting, sometimes those builds did fail and asked for changing max connections to k8s API

{code:java}

Caused: java.io.IOException: Interrupted while waiting for websocket connection, you should increase the Max connections to Kubernetes API
{code}


We increased the setting gradually to up to 60000 (sic!) and it did not solve the issue in any way. On other Jenkins instances running under not so heavy load there is no sign of the issue whatsoever. Can it be somehow related to expiring kubernetes clients introduced in 1.14.5?

I'm guessing that may be the case, because everything is working smoothly after Jenkins restart for some time (~24 hours usually, but we had shorter time frames as well) and then everything for stuck. After killing all jobs and restarting Jenkins everything went back to normal - again for finite time.
Downgrade to version 1.14.3 solved all the issues.

Please let me know if any additional information should be provided.

jglick@cloudbees.com (JIRA)

unread,
Jun 3, 2019, 12:19:02 PM6/3/19
to jenkinsc...@googlegroups.com

karolgil.kg@gmail.com (JIRA)

unread,
Jun 3, 2019, 1:56:02 PM6/3/19
to jenkinsc...@googlegroups.com

We thought so and we tested fixed version 1.15.5 last week. Unfortunately, we still a lot of

Interrupted while waiting for websocket connection, you should increase the Max connections to Kubernetes API

issues in logs. We reverted to 1.14.9 again and everything works smoothly once again.

jglick@cloudbees.com (JIRA)

unread,
Jun 12, 2019, 3:34:06 PM6/12/19
to jenkinsc...@googlegroups.com
Jesse Glick updated an issue
 
Change By: Jesse Glick
Labels: jenkins kuberenetes-plugin kubernetes plugin

jglick@cloudbees.com (JIRA)

unread,
Jul 16, 2019, 3:43:26 PM7/16/19
to jenkinsc...@googlegroups.com
Jesse Glick assigned an issue to Unassigned
Change By: Jesse Glick
Assignee: Carlos Sanchez
Reply all
Reply to author
Forward
0 new messages