kubernetes-plugin could use Jobs instead of bare Pods

336 views
Skip to first unread message

daniel.s...@gebit.de

unread,
Nov 22, 2018, 10:04:26 AM11/22/18
to Jenkins Developers
Hi all,

I am working on a kubernetes cluster for Jenkins. I quickly found Carlos Sanchez' kubernetes-plugin and started using and adapting it.
The plugin creates "bare" Pods directly in kubernetes, watches and in the end deletes them after the agent process inside the Pod terminated.
I think there are several issues with this:
a) bare Pods do not get rescheduled in failover cases, for example node crash
b) kubernetes-plugin is responsible for deleting such a bare Pod (because Pods are designed to never terminate). If Jenkins (or the node its running on)
crashes while terminating the slave, it might never actually delete the Pod.
c) kubernetes-plugin immediately deletes the Pod after the build inside the container is finished. If something else is happening inside the Pod after
the build (e. g. cleanup or syncing), it has no chance of completing.

While researching, I found out that kubernetes provides a "Job" object that runs a certain amount of containers until successful completion.
Using a kubernetes Job to schedule builds onto the cluster seems to solve all my aforementioned issues.
a) a Job ensures that the specified number of completed runs are achieved, even in case of failover
b) and c) a Job automatically deletes all associated Pods when it gets deleted (and finished Jobs themselves can get auto-deleted with
the new alpha feature TTLAfterFinished in k8s v1.12). This frees the kubernetes-plugin from caring for the Pods after build completion.

Has anyone experience with kubernetes Jobs? Am I missing something obvious?

@Carlos Sanchez: If not, I would try to implement this in your kubernetes-plugin. Would you be interested to get this back into your
mainline? Then I would fork the github repo and start directly off master.

Regards,
Daniel

Carlos Sanchez

unread,
Nov 22, 2018, 10:42:04 AM11/22/18
to jenkin...@googlegroups.com
It's not using jobs because they did not exist when the plugin was created, and yes, it would be a good addition to the plugin.

Note that I don't think it would fix a) as Jenkins expects the agent to keep state. ie. agent clones git repo then crashes later on, Jenkins expects the agent to already have the gi repo cloned after agent restart
 

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/1fe306b4-f8f6-48b1-a98a-087b119f880f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

daniel.s...@gebit.de

unread,
Nov 22, 2018, 11:49:53 AM11/22/18
to Jenkins Developers
Nice, I will start off your master then.

And yes, a) might actually not be fixed by using Jobs. I will continue researching and report back.

Regards,
Daniel

Olblak

unread,
Nov 26, 2018, 3:58:12 AM11/26/18
to jenkin...@googlegroups.com
Note that I don't think it would fix a) as Jenkins expects the agent to keep state. ie. agent clones git repo then crashes later on, Jenkins expects the agent to already have the gi repo cloned after agent restart

Why not using persistent volume to keep the state? Like an emptyDir?

---
-> gpg --keyserver keys.gnupg.net --recv-key 52210D3D
---


Carlos Sanchez

unread,
Nov 26, 2018, 4:15:05 AM11/26/18
to jenkin...@googlegroups.com
On Mon, Nov 26, 2018 at 10:08 AM Olblak <m...@olblak.com> wrote:
Note that I don't think it would fix a) as Jenkins expects the agent to keep state. ie. agent clones git repo then crashes later on, Jenkins expects the agent to already have the gi repo cloned after agent restart

Why not using persistent volume to keep the state? Like an emptyDir?

if pod terminates emptyDir is no longer available. And using a persistent volume like block storage brings other problems like attachments, rescheduling in different AZs,...
 

daniel.s...@gebit.de

unread,
Nov 27, 2018, 3:48:45 AM11/27/18
to Jenkins Developers
Actually, I am using a persistent volume to keep state of the working directory of each agent (and rsync it before and after each build). But if kubernetes would automatically reschedule an agent Pod (in case of a crashed node, for example),
then runtime state information like tokens and identity would be lacking. The Jenkins master wouldn't know the agent trying to connect. So using a Job would actually complicate things in this case.
And I also found an easier way to achieve my points b) and c) from the original post, without rewriting half the plugin for Jobs. If I configure the kubernetes-plugin's PodRetention to "Always", the kubernetes-plugin will no longer bother itself with the Pods after starting them. Then I only need an automatic way of kubernetes to clean completed Pods (not Jobs, as suggested before). There is a setting in the controller-manager flags called "terminated-pod-gc-threshold" (and its not alpha, like the one mentioned before for Jobs). Setting this flag to a reasonably small number gives me exactly the behavior I want.
Reply all
Reply to author
Forward
0 new messages