[JIRA] (JENKINS-59652) [kubernetes plugin] Protect Jenkins slave pods from eviction

jonathan.pigree@gmail.com (JIRA)

unread,

Oct 4, 2019, 2:35:03 AM10/4/19

to jenkinsc...@googlegroups.com

Jonathan Pigrée created an issue

Jenkins /

JENKINS-59652

[kubernetes plugin] Protect Jenkins slave pods from eviction

Issue Type:	Improvement
Assignee:	Carlos Sanchez
Components:	kubernetes-plugin
Created:	2019-10-04 06:34
Environment:	GKE cluster master and node pools version: 1.14 Cluster autoscaler activated Jenkins master LTS installed with official Helm chart (1.1.24) Kubernetes plugin: 1.19.0
Priority:	Minor
Reporter:	Jonathan Pigrée

I have a sporadic bug occuring on my Jenkins installation for months now:

java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error' at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229) at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF

I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:

https://issues.jenkins-ci.org/browse/JENKINS-39844

- https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command

However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

I tried to set the annotation cluster-autoscaler.kubernetes.io/safe-to-evict: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

However, when passing the PDB into the podTemplate yaml it is just totally ignored.

How can I protect my jenkins slave pods from eviction?

Add Comment

This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)

jonathan.pigree@gmail.com (JIRA)

unread,

Oct 4, 2019, 2:36:03 AM10/4/19

to jenkinsc...@googlegroups.com

Jonathan Pigrée updated an issue

Jenkins /

JENKINS-59652

[kubernetes plugin] Protect Jenkins slave pods from eviction

Change By:	Jonathan Pigrée

I have a sporadic bug occuring on my Jenkins installation for months now:

??
{noformat}

java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF ??
{noformat}

I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:

- https://issues.jenkins-ci.org/browse/JENKINS-39844

- [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

However, when passing the PDB into the podTemplate yaml it is just totally ignored.

How can I protect my jenkins slave pods from eviction?

Add Comment

jonathan.pigree@gmail.com (JIRA)

unread,

Oct 4, 2019, 2:37:02 AM10/4/19

to jenkinsc...@googlegroups.com

Jonathan Pigrée updated an issue

Jenkins /

JENKINS-59652

[kubernetes plugin] Protect Jenkins slave pods from eviction

Change By:	Jonathan Pigrée

I have a sporadic bug occuring on my Jenkins installation for months now:

{noformat}
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'

at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF

{noformat}

I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
- https://issues.jenkins-ci.org/browse/JENKINS-39844

- [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

However, when passing the PDB into the podTemplate yaml it is just totally ignored.

How can I protect my jenkins slave pods from eviction?

Add Comment

jonathan.pigree@gmail.com (JIRA)

unread,

Oct 4, 2019, 2:37:02 AM10/4/19

to jenkinsc...@googlegroups.com

jonathan.pigree@gmail.com (JIRA)

unread,

Oct 4, 2019, 2:37:02 AM10/4/19

to jenkinsc...@googlegroups.com

Jonathan Pigrée updated an issue

Jenkins /

JENKINS-59652

[kubernetes plugin] Protect Jenkins slave pods from eviction

Change By:	Jonathan Pigrée

I have a sporadic bug occuring on my Jenkins installation for months now:
{noformat}
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'

at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
{noformat}
I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
- https://issues.jenkins-ci.org/browse/JENKINS-39844

- [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

However, when passing the PDB into the podTemplate yaml it is just totally ignored.

How can I protect my jenkins slave pods from eviction?

Add Comment

jonathan.pigree@gmail.com (JIRA)

unread,

Oct 4, 2019, 2:37:02 AM10/4/19

to jenkinsc...@googlegroups.com

jonathan.pigree@gmail.com (JIRA)

unread,

Oct 4, 2019, 2:38:01 AM10/4/19

to jenkinsc...@googlegroups.com

jonathan.pigree@gmail.com (JIRA)

unread,

Oct 4, 2019, 2:40:02 AM10/4/19

to jenkinsc...@googlegroups.com

Jonathan Pigrée updated an issue

Jenkins /

JENKINS-59652

[kubernetes plugin] Protect Jenkins slave pods from eviction

Change By:	Jonathan Pigrée

I have a sporadic bug occuring on my Jenkins installation for months now:
{noformat}
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
{noformat}
I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
- https://issues.jenkins-ci.org/browse/JENKINS-39844
- [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

To fix this, I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https on all my pods in the podTemplate yaml : //www.google.com/url?q=http://
{noformat}
cluster-autoscaler.kubernetes.io/safe-to-evict &sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ] : "false" on my jenkins slave pods but {noformat}
However, it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

However But , when passing the PDB into the podTemplate yaml it is just totally ignored. How can I protect my jenkins slave pods from eviction?

Add Comment

siegfried.kiermayer@sap.com (JIRA)

unread,

Oct 15, 2019, 6:11:03 AM10/15/19

to jenkinsc...@googlegroups.com

Sigi Kiermayer commented on

JENKINS-59652

Re: [kubernetes plugin] Protect Jenkins slave pods from eviction

We are also running Jenkins on GKE. At least we don't have issues that a running jenkins slave gets 'moved' to downscale the cluster but we have purposefully created a node pool only for jenkins slaves and sized them so that one jenkins slave uses one node. With autoscaling it is relativly quick but you can and should also have one node running idle.

One thing to be aware of: We added the pdb to make sure jenkins is not killed/moved etc. but we removed it again. When gke is doing maintenance, it will only delay the eviction of a pod by one hour. Which makes the whole process much slower as gke will wait for every pod with pdb for an hour.

Add Comment

Reply all

Reply to author

Forward