Creating Jenkins slaves using kubernetes-plugin that restart on node failures

196 views
Skip to first unread message

Cooper99

unread,
Aug 29, 2017, 11:44:53 AM8/29/17
to Jenkins Users
I am new to Jenkins so this may be a simple question.  I am using the kubernetes-plugin to dynamically create Jenkins slaves. The one thing I have noticed is that when using the plugin to create the slaves is if a node gets deleted the slave pod is running on, the slave pod is not restarted.  I am not sure if this is a configuration error on my part or just the way it is.  It seems that based on this article: https://www.infoq.com/articles/scaling-docker-kubernetes-v1 that having the slaves restart when a node goes down would always be desired.  
I am using Jenkins 2.66 and kubernetes-plugin 0.11 on Kubernetes 1.6.2. 

Thanks.

Carlos Sanchez

unread,
Aug 29, 2017, 11:50:08 AM8/29/17
to Jenkins Users
It doesn't restart the agents because as soon as the agent crashes the build will fail. So there is no point in restarting them

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/4a04bd53-d927-406a-b8ba-6e346a5ece9b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Cooper99

unread,
Aug 29, 2017, 3:34:21 PM8/29/17
to Jenkins Users
Hi Carlos,

Thanks for the prompt reply.
What I have seen is that when the node is deleted the slave/pod doesn't crash, it is just deleted.  Then the Jenkins master just sits there waiting for the slave to return with the following output:
Cannot contact default-6b0e4a2d33a: java.io.IOException: remote file operation failed: /home/jenkins/workspace/installer/Run_Installer at hudson.remoting.Channel@1925c5c0:JNLP4-connect connection from 192.168.3.18/192.168.3.18:46497: hudson.remoting.ChannelClosedException: channel is already closed

Art.


On Tuesday, August 29, 2017 at 11:50:08 AM UTC-4, Carlos Sanchez wrote:
It doesn't restart the agents because as soon as the agent crashes the build will fail. So there is no point in restarting them
On Tue, Aug 29, 2017 at 5:30 PM, Cooper99 <roo...@gmail.com> wrote:
I am new to Jenkins so this may be a simple question.  I am using the kubernetes-plugin to dynamically create Jenkins slaves. The one thing I have noticed is that when using the plugin to create the slaves is if a node gets deleted the slave pod is running on, the slave pod is not restarted.  I am not sure if this is a configuration error on my part or just the way it is.  It seems that based on this article: https://www.infoq.com/articles/scaling-docker-kubernetes-v1 that having the slaves restart when a node goes down would always be desired.  
I am using Jenkins 2.66 and kubernetes-plugin 0.11 on Kubernetes 1.6.2. 

Thanks.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-use...@googlegroups.com.

Sam Beckwith III

unread,
Oct 18, 2017, 4:15:51 AM10/18/17
to Jenkins Users
Howdy, Cooper99!

I am encountering this as well, sir.  If I may ask, how did you resolved this?

You understand this but others may not: This is a tricky situation. Without the functionality to fail the job on disconnection from the node in this context, we end up in an endlessly suspended/waiting state. Wrapping this in a timeout is quite undesirable because our task in may take a variable amount of time to complete meaning our job, which will fail due to the disconnect, will sit unnecessarily until the timeout.

It is far more desirable to fail fast.

-Sam

Art Baldini

unread,
Oct 23, 2017, 8:32:34 PM10/23/17
to jenkins...@googlegroups.com
Hi Sam,
We are still trying to come up with the best work-around. Currently we are kicking off a the build/job and returning immediately. Then we have another job that monitors the status of the first job.  Definitely not an ideal situation, but it works for now.

--
You received this message because you are subscribed to a topic in the Google Groups "Jenkins Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jenkinsci-users/BwHCEelcOAY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jenkinsci-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/d1841dec-c4db-4642-b82d-fe90c2a77b3f%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages