[JIRA] (JENKINS-42422) Add support for directory caching in pod jobs

0 views
Skip to first unread message

electroma@gmail.com (JIRA)

unread,
Mar 1, 2017, 6:30:02 PM3/1/17
to jenkinsc...@googlegroups.com
Roman Safronov created an issue
 
Jenkins / New Feature JENKINS-42422
Add support for directory caching in pod jobs
Issue Type: New Feature New Feature
Assignee: Carlos Sanchez
Components: kubernetes-plugin
Created: 2017/Mar/01 11:29 PM
Priority: Minor Minor
Reporter: Roman Safronov

It would be great to be able to "cache" directories between job executions.
In some cases it helps to greatly speedup job execution.
Similar to Travis: https://docs.travis-ci.com/user/caching/.

Now I achieve it with persistent volume and pipeline function doing explicit (re)store steps.

What can be added to kubernetes-plugin:
-option to manage caches (specifically drop cache for specific job)
-add DSL construct like podTemplate(... cachePaths: ["A", "B/C"])
-default strategy for cache management (shared NFS-backed volume or provisioned PVC per job)

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.1.7#71011-sha1:2526d7c)
Atlassian logo

jenkins-ci@carlossanchez.eu (JIRA)

unread,
Mar 1, 2017, 10:57:01 PM3/1/17
to jenkinsc...@googlegroups.com
Carlos Sanchez commented on New Feature JENKINS-42422
 
Re: Add support for directory caching in pod jobs

You can use persistent volumes in the pod, isn't that enough?

electroma@gmail.com (JIRA)

unread,
Mar 2, 2017, 11:58:02 AM3/2/17
to jenkinsc...@googlegroups.com

As I stated it description, PV is too basic:

  • There is no option to provision PV automatically (i.e. create EBS dedicated for this job)
  • I need to create separate script / job to drop caches
  • It's better to use backup/restore instead of direct file manipulation on the remote mount

I like Travis' approach.

zachl44@gmail.com (JIRA)

unread,
Mar 15, 2017, 8:26:01 PM3/15/17
to jenkinsc...@googlegroups.com

A Travis-like approach would be awesome. PVs are somewhat clunky in this case imo.

This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)
Atlassian logo

jenkins-ci@carlossanchez.eu (JIRA)

unread,
May 25, 2018, 5:16:02 AM5/25/18
to jenkinsc...@googlegroups.com

This is easier now with the yaml syntax as you can create PVCs on demand

jalkjaer@liveintent.com (JIRA)

unread,
Jun 19, 2018, 8:14:01 AM6/19/18
to jenkinsc...@googlegroups.com

Carlos Sanchez : Can you provide an example of how PVCs can be created on demand in the YAML syntax. I see no reference in the K8S docs on how to do this. PVC are there described to be objects with their own separate lifecycle - PV's can be dynamically be generated based on the storageClass when PVC are created, but that of course is not the same

Statefull sets seem to be the only way to dynamically create PVC's as those have volumeClaimTemplates 

 

 

jalkjaer@liveintent.com (JIRA)

unread,
Jun 19, 2018, 8:41:01 AM6/19/18
to jenkinsc...@googlegroups.com
J Alkjaer edited a comment on New Feature JENKINS-42422
[~csanchez] : Can you provide an example of how PVCs can be created on demand in the YAML syntax. I see no reference in the K8S docs on how to do this. PVC are there described to be objects with their own separate lifecycle - PV's can be dynamically be generated based on the storageClass when PVC are created, but that of course is not the same


Statefull sets seem to be the only way to dynamically create PVC's as those have {{volumeClaimTemplates }}

 

 

jalkjaer@liveintent.com (JIRA)

unread,
Jun 19, 2018, 8:44:05 AM6/19/18
to jenkinsc...@googlegroups.com
J Alkjaer edited a comment on New Feature JENKINS-42422
[~csanchez] : Can you provide an example of how PVCs can be created on demand in the YAML syntax. I see no reference in the K8S docs on how to do this. PVC are there described to be objects with their own separate lifecycle - PV's can be dynamically generated based on the storageClass when PVC are created, but that of course is not the same


Statefull sets seem to be the only way to dynamically create PVC's as those have {{volumeClaimTemplates }}

Even if the YAML field   can include a PVC object, it would still need some mechanism to dynamically assign a name that can be referenced in the pods volume section

 

rudi@chiarito.com (JIRA)

unread,
Jan 9, 2019, 6:19:01 PM1/9/19
to jenkinsc...@googlegroups.com
R C commented on New Feature JENKINS-42422

The only sane way to implement this is through StatefulSets, especially if you have more than one agent (who doesn't run Jenkins like that?).

An example scenario:

You're currently running three agents. A fourth one needs to be launched. As the last comment says, there's no easy way to create an unique volume name that can also be referenced by the new pod. StatefulSets do that: https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#writing-to-stable-storage

Assuming a StatefulSet named agent-XYZ with a VolumeClaimTemplate named workspace, the plugin would need to scale the StatefulSet to 4. Then Kubernetes will take care of creating pod agent-XYZ-3 and PVC workspace-agent-XYZ-3

When, later, the cluster is idle, the plugin can scale the SS down to 1. Kubernetes will terminate pods 3, 2 and 1, in that order, leaving 0 still up. The volumes are KEPT, by design, which is probably what you want if they're a cache (I do, too). When, later the SS scales up, the volumes are already around.

This is all nice and takes some complexity away from the plugin, but there's some extra work to be done.

I haven't looked at the code, but I assume that the plugin treats each pod it creates in isolation, independent of each other (is that correct?). With StatefulSets, you probably want to fingerprint the pod definition so that identical ones are mapped to the same set (XYZ above would be such a hash). Then the plugin needs to track how many pods are needed for each fingerprint and scale the StatefulSets accordingly.

The other problem is passing the unique JENKINS_SECRET to the pod. I don't think StatefulSets allow per-pod secrets/variables. So, either secrets need to be turned off (bad) or they need to be delivered out-of-band, e.g. in an init container.

 

 

This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

jenkins-ci@carlossanchez.eu (JIRA)

unread,
Jan 10, 2019, 3:29:02 AM1/10/19
to jenkinsc...@googlegroups.com

statefulsets can't be used for agents if you want isolation and not long running agents. If job 1 ends before job 3 you can't scale down the SS or you'd kill the pod for job 3

rudi@chiarito.com (JIRA)

unread,
Jan 30, 2019, 8:34:02 PM1/30/19
to jenkinsc...@googlegroups.com
R C commented on New Feature JENKINS-42422

[Comment almost lost thanks to JIRA]

Right, there's that, too, but I would be OK if jobs were scheduled starting from the lowest index available. In our case, with a hot cache (which is what we'd like to achieve), an agent's longest run wouldn't be orders of magnitude worse than the average or best case. That is still a net win, even with the occasional inefficiency here and there, like job 1 being idle every once in a while.

I came across someone else's [experience with stateful agents|https://hiya.com/blog/2017/10/02/kubernetes-base-jenkins-stateful-agents/.] We don't spend 20 minutes doing setup work like they do, but there are pipelines where installing dependencies takes up longer than the actual tests.

Is the current design of the plugin too reliant on creating pods manually? The least invasive way might be for the scaling to be handled by a HorizontalPodAutoscaler (it should work with StatefulSets). Then the plugin would pick available slaves in alphabetical order. Is there much left to the Kubernetes plugin, at that point?

 

jenkins-ci@carlossanchez.eu (JIRA)

unread,
Jan 31, 2019, 3:52:46 AM1/31/19
to jenkinsc...@googlegroups.com

I wrote about using a ReplicationController for agents like 4 years ago, but Jenkins needs to be able to kill the jobs that have finished, not a random one. https://www.infoq.com/articles/scaling-docker-kubernetes-v1
At that point you could just have a pool of agents in k8s instead of dynamic allocation and use the swarm plugin to register them automatically

About the original issue description, I would use a shared filesystem that supports multiple mounts for writing (NFS, gluster,...) and then it would be easier for developers, while being harder to setup. Doing something to support other mount-once volumes will be harder. There could be an option to copy things from a directory before the build and back after the build, but that's already implemented with stash()/unstash().
Looking for concrete ideas on what to do, but I would not likely implement them as I have limited time

jenkins-ci@carlossanchez.eu (JIRA)

unread,
Jan 31, 2019, 3:53:16 AM1/31/19
to jenkinsc...@googlegroups.com

jglick@cloudbees.com (JIRA)

unread,
Jul 16, 2019, 3:43:39 PM7/16/19
to jenkinsc...@googlegroups.com
Jesse Glick assigned an issue to Unassigned
 
Change By: Jesse Glick
Assignee: Carlos Sanchez

syaramada-c@scrippsnetworks.com (JIRA)

unread,
Oct 9, 2019, 1:05:04 AM10/9/19
to jenkinsc...@googlegroups.com
suryatej yaramada commented on New Feature JENKINS-42422
 
Re: Add support for directory caching in pod jobs

Can we achieve this by using EFS by any chance if so I would like to know how? we already mount jenkins master /var/lib/jenkins which is running on EC2 to EFS. we using kubernetes plugin to start dynamic agents and run our maven/gradle/npm jobs but those jobs takiing some extra time as cache is not happening so for every build it downloading from artifactory. I tried with mounting /root/.m2 directory but problem is not able to start another job till this job finishes. I would like to know if any work around for this. 

Really appreciated
`
Multi-Attach error for volume "pvc-4e98f6b9-ea38-11e9-aa7d-02dbf42e9b46" Volume is already used by pod(s) sandbox-surya-maven-cache-3-8hk2s-n2v0v-l4f9h

This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)
Atlassian logo

faure.tristan@gmail.com (JIRA)

unread,
Mar 26, 2020, 3:32:03 AM3/26/20
to jenkinsc...@googlegroups.com

I also have the same issue : https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/jenkinsci-users/X61BK83LHLU/hztGcbkvBgAJ

I don't really know travis but what i assume it is equivalent to https://docs.gitlab.com/ee/ci/caching/

about Carlos Sanchez comments :

  • we did not find either how to create pvc in the yaml syntax it was always overrident when we tried
  • stashu / unstash or archiveArtifacts could help us but how can configure it to store in a pv ? how long the data is kept if there are no unstash ?
This message was sent by Atlassian Jira (v7.13.12#713012-sha1:6e07c38)
Atlassian logo

faure.tristan@gmail.com (JIRA)

unread,
Mar 26, 2020, 3:33:04 AM3/26/20
to jenkinsc...@googlegroups.com
Tristan FAURE edited a comment on New Feature JENKINS-42422
i assume it is equivalent to [https://docs.gitlab.com/ee/ci/caching/]

about [~csanchez] comments :
* we did not find either how to create pvc in the yaml syntax it was always overrident when we tried
* stashu / unstash or archiveArtifacts could help us but how can configure it to store in a pv ? how long the data is kept if there are no unstash ?

faure.tristan@gmail.com (JIRA)

unread,
Mar 26, 2020, 5:32:12 AM3/26/20
to jenkinsc...@googlegroups.com
Tristan FAURE edited a comment on New Feature JENKINS-42422
I don't really know travis but i assume it is equivalent to [https://docs.gitlab.com/ee/ci/caching/]


about [~csanchez] comments :
* we did not find either how to create pvc in the yaml syntax it was always overrident when we tried
* stashu stash / unstash or archiveArtifacts could help us but how can configure it to store in a pv ? how long the data is kept if there are no unstash ?
** +edit:+ as we want data available across jobs, stash unstash does not seem relevant
Reply all
Reply to author
Forward
0 new messages