"checkout scm" in multibranch/pipeline

Nigel Magnay

unread,

Apr 6, 2016, 10:39:28 AM4/6/16

to jenkin...@googlegroups.com

I'm happily using the Github org folders, with multibranch projects, which are very cool.

One issue that we have, is our git project is getting pretty big, and our inbound network connection is not particularly fast.

This means each fresh build which spins a docker container to run in, when it hits the first pipeline step "checkout scm", it tends to spend 15 minutes redownloading from github.

Now I suppose I could probably build a container image with an baseline git repo in it - but that feels like maintenance.

It occurred to me that the change detection logic is already storing a clone of the repo, (in JENKINS_HOME/jobs/.../branches/master/workspace@script).

Is there a way to persuade "checkout scm" to pull from that reference copy first? This would save a lot of external network traffic for us.

Mark Waite

unread,

Apr 6, 2016, 11:19:50 AM4/6/16

to jenkin...@googlegroups.com

Jessie Glick has noted that the Mercurial plugin is able to copy a repository from the master using the Jenkins master to slave transport. He recommended that would be a help for the git plugin as well. Unfortunately, I'm not aware of anyone working to add that facility to the git plugin.

Mark Waite

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CAPYP83TnKhhBmmYNX1U_F1rQ4YsBrMTkb8f95Q1ox5VQbdvwJA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Nigel Magnay

unread,

Apr 6, 2016, 11:34:21 AM4/6/16

to jenkin...@googlegroups.com

Happy to have a stab at it if someone can point me in the right direction. I couldn't see anything obvious in the hg plugin, is it something within the "checkout scm" workflow step ?

To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CAO49JtGpx3ZGMCUDOOyYr2Drx%3D8%3DKpwCJ49epiSx2iK0MCX8zw%40mail.gmail.com.

Mark Waite

unread,

Apr 6, 2016, 2:25:58 PM4/6/16

to jenkin...@googlegroups.com

I think Cache.java is one good place to start

https://github.com/jenkinsci/mercurial-plugin/blob/master/src/main/java/hudson/plugins/mercurial/Cache.java

Mark Waite

To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CAPYP83RaGU__JV%2Bemz4A6gqHgSdRC5gyg5UAmJUX%3DS7aikY50w%40mail.gmail.com.

Jesse Glick

unread,

Apr 7, 2016, 12:22:10 PM4/7/16

to Jenkins Dev

On Wed, Apr 6, 2016 at 10:39 AM, Nigel Magnay <nigel....@gmail.com> wrote:
> I could probably build a container image with an baseline git
> repo in it

Or keep a persistent workspace that the container mounts, so you can
`git clean -fdx` (or the equivalent `GitSCM` extension) and just do a
small update.

> It occurred to me that the change detection logic is already storing a clone
> of the repo, (in JENKINS_HOME/jobs/.../branches/master/workspace@script).

Not once I fix JENKINS-33273.

Note that the Git plugin *does* have a caching facility, but currently
it is internal to `GitSCMSource` and used only for branch scanning
(for the plain Git SCM source—not used for GitHub). This could perhaps
be extended to support caching as a user-level feature, as in the
Mercurial plugin. In principle `git-bundle` could be used to transfer
incremental updates to the agent. (The other use case supported by
such a feature is agents with no network connection other than to the
master, though this is probably unusual.)

Nigel Magnay

unread,

Apr 7, 2016, 12:46:30 PM4/7/16

to jenkin...@googlegroups.com

On Thu, Apr 7, 2016 at 5:22 PM, Jesse Glick <jgl...@cloudbees.com> wrote:

On Wed, Apr 6, 2016 at 10:39 AM, Nigel Magnay <nigel....@gmail.com> wrote:
> I could probably build a container image with an baseline git
> repo in it

Or keep a persistent workspace that the container mounts, so you can
`git clean -fdx` (or the equivalent `GitSCM` extension) and just do a
small update.

Hmm - sounds tricky. I'm running a docker cloud (on Triton), so I never know which physical

machine it's going to be on, and multiple builds are often running concurrently. I could 'mount' in the sense of a file share I guess.

> It occurred to me that the change detection logic is already storing a clone
> of the repo, (in JENKINS_HOME/jobs/.../branches/master/workspace@script).

Not once I fix JENKINS-33273.

Note that the Git plugin *does* have a caching facility, but currently
it is internal to `GitSCMSource` and used only for branch scanning
(for the plain Git SCM source—not used for GitHub). This could perhaps
be extended to support caching as a user-level feature, as in the
Mercurial plugin. In principle `git-bundle` could be used to transfer
incremental updates to the agent. (The other use case supported by
such a feature is agents with no network connection other than to the
master, though this is probably unusual.)

I was thinking of something relatively simple (rather than playing with bundles), as the semantics of "checkout scm" are quite nice (check me out the version of the source I'm meant to be building). If I can translate that from a "git checkout https://github/blah @ revision" to "git fetch into (per-job) cache, rewrite checkout to call cache rather than github". I like the transparency too of not having to change Jenkinsfiles to make it work. Exposing the (cached) git repository is probably little more than exposing the directory for HTTP at a push.

I might take a look at the Git SCM source and see if I can hack something up.

Jesse Glick

unread,

Apr 7, 2016, 1:33:17 PM4/7/16

to Jenkins Dev

On Thu, Apr 7, 2016 at 12:46 PM, Nigel Magnay <nigel....@gmail.com> wrote:
> I was thinking of something relatively simple (rather than playing with
> bundles)

Well, bundles actually *are* pretty simple. Not terribly efficient, if
your master-to-slave channel is slow, but the code would be simple. At
least the Mercurial implementation fits on a page. I do not have
reason to believe Git would be much different, though I am no expert
on the details.

> I like
> the transparency too of not having to change Jenkinsfiles to make it work.

Well any proper change would be transparent to the user, it is just a
question of the right implementation.

> Exposing the (cached) git repository is probably little more than exposing
> the directory for HTTP at a push.

Well, you *could* use the `git-server` plugin to serve cache repos
over HTTP(S) or SSH to agents, and that would probably be easy enough
to prototype. But for production usage (i.e., convincing Mark to merge
a PR), you get into a lot of complications over security, reverse
proxies, network partitions, etc.

> I might take a look at the Git SCM source and see if I can hack something
> up.

Note that while the current cache code lives in `GitSCMSource`, to be
repurposed this way it would need to be moved into `GitSCM` proper,
and would be equally applicable to non-multibranch calls.

Björn Pedersen

unread,

Apr 8, 2016, 7:46:52 AM4/8/16

to Jenkins Developers

Hi,
Would a local (e.g on your jenkins master) reference repository help? Then most objects are resolved from that, and only very fresh changes need to be fetched from master. See the advanced git clone option.

See e.g. http://randyfay.com/content/reference-cache-repositories-speed-clones-git-clone-reference for a basic overview with plain git.

Björn

Jesse Glick

unread,

Apr 8, 2016, 9:58:16 AM4/8/16

to Jenkins Dev

On Fri, Apr 8, 2016 at 7:46 AM, 'Björn Pedersen' via Jenkins
Developers <jenkin...@googlegroups.com> wrote:
> Would a local (e.g on your jenkins master) reference repository help? Then
> most objects are resolved from that, and only very fresh changes need to be

> fetched from master. See the advanced git clone option [--reference].

An interesting idea, though not by itself sufficient to deal with
builds run on an agent, especially in the Docker case; you would still
need to have a reference repository on a persistent volume mountable
by the container (perhaps one such cache for each Docker computer).

Jesse Glick

unread,

Apr 8, 2016, 10:02:18 AM4/8/16

to Jenkins Dev

For reference, the Mercurial plugin when configured to use the cache
option automatically uses the relink extension to maintain hard links
between the cache and the workspace, when possible, and can be
configured to use the sharing extension as well to reduce disk usage.

Reply all

Reply to author

Forward