Central Git Checkout

45 views
Skip to first unread message

qbd...@vt.edu

unread,
Dec 16, 2013, 5:06:22 PM12/16/13
to jenkins...@googlegroups.com
Has anyone considered a modification to the git plugin (or other SCM plugins), to reduce the total number of checkouts physically required on a slave machine?

We run 6 mult-configuration projects (debug and release configurations) on a handful of computers. Each configuration requires one code checkout (2.1 G). That's 2.1*2*6 = 25.2 G  on each slave machine, for the source code only. The master build machine requires another 12.6 G, because it checks out additional copies *outside* the scope of build directories (no one knows why). Forget about leaving "Advanced Project Options" > "Restrict where this project can be run" unchecked, because if the build is started on a different computer each time, the code has to be checked out at the top level (potentially adding the 12.6 G to each slave machine, if the load is shared).

Perhaps these issues have already been addressed; we run Jenkins version 1.524.

Suggestions:
a) Live with redundant code checkouts
b) Stop using the git plugin, and have my shell script handle code checkout
c) Rewrite the git plugin, or add an option, to allow one checkout per project on a slave. Technically this is the minimum number of checkouts allowed, because each project is allowed to access a unique sha1 hash.
d) Add an option to the Jenkins core to disregard the top-level code checkouts (which I gather are done for some kind of consistency check...we can live without that check I think).
e) I don't know Java and don't have much free time, otherwise I would write a completely new git plugin.

Thanks for any suggestions, especially if I am missing something obvious here.

Gergely Nagy

unread,
Dec 16, 2013, 5:59:50 PM12/16/13
to jenkins...@googlegroups.com
Not sure how have the exact same "checkout" multiple times is useful (most of my jobs modify the workspace so I need to isolate them from each other -> separate workspace == separate "checkouts").

OTOH, I was also interested to save on disk space when it comes to large git repos cloned multiple times. Note that I mean "clone" not "checkout".

So, an easy/safe enough solution to reduce waste of duplicate clones:
1) have 1 proper clone of the big repo on each slave - just by create a dumb "mirror" job with no build steps, just to keep the clone up to date
2) the actual jobs use the "Reference" repo option: although the main clone location can be still be the central server, you point to that workspace of the mirror job as a reference (see --reference https://www.kernel.org/pub/software/scm/git/docs/git-clone.html, and the Advanced git plugin option).

My results using this is quite dramatic, the repo (.git) sizes under the worker jobs are just a few Mb, even the actually repo is 1.5G+ in my case.
Also the "clone" for each worker job will just take a fraction of time.

Now I'd like Jenkins (plugin?) to provide a setup like this out-of-the-box, eliminating the manual fiddling (I needed to add site-wide variables overridden for each slave, etc..). Anyway it's good enough for me now.

Of course the actual "checkout" sizes are the same, but I see no safe&straightforward way around that, that's just the nature of building/testing code.
So, not sure if this helps but good luck anyway.
Greg



--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-use...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

David Gayman

unread,
Dec 16, 2013, 8:16:55 PM12/16/13
to jenkins...@googlegroups.com
Greg,

Thank you for responding, I am somewhat new to git and had no idea about the --reference option. Your solution looks like basically the best way to do this.

I still wonder why there are no options out-of-the-box to:

1) Set up only one cloned repository per project, and
2) Refuse to check out repositories at the top level (on the machine which is conducting the build)

Or, if there are options to do this but I don't know where they are. It seems like the setup you and I have should be the default! The only thing I figure is that for generality's sake, each build should have a unique clone of the repo.

Thanks.

*Also, my terminology is too loose - I meant "clone" not "checkout".


 


--
You received this message because you are subscribed to a topic in the Google Groups "Jenkins Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jenkinsci-users/IwgQSKqrC6g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jenkinsci-use...@googlegroups.com.

Gergely Nagy

unread,
Dec 16, 2013, 9:16:29 PM12/16/13
to jenkins...@googlegroups.com
I still wonder why there are no options out-of-the-box to:
Probably noone needed it badly enough to do it or to pay for it :)

It seems to me that in the Git world, several smaller repos are preferred instead of a big monolithic one.

Not sure about you, but in our case the big repo was inherited from a subversion setup. When switching to Git, we kept the centralised workflow and its big repo with lots of obsolete branches, but since then there has been a natural tendency to keep new things in separate repos.
While this story may be common in the corporate world, it's probably less so in the more modern open source/startup/smaller agile projects - which I guess are more of a driving force for Jenkins..
Also, while the problem has annoyed me for a long time, it hasn't been a blocker so that I haven't even tried this workaround until recently.. 
The other reason may be that this would basically involve auto-creating jobs (or something equivalent) on each slave, managed from a central location... at a first glance this differs from the standard ways plugins extending Jenkins.
Anyway If I had time I'd have a go at building a plugin - unless anyone started already?

teilo

unread,
Dec 16, 2013, 11:28:00 PM12/16/13
to jenkins...@googlegroups.com
Sounds like you want a "shallow clone" rather than a full one. you have thus option on the git plugin for about a year.
That way you won't have all of the history for the clone and hence less disk space and network traffic is required.

David Gayman

unread,
Dec 17, 2013, 3:01:24 AM12/17/13
to jenkins...@googlegroups.com
Nice, I wasn't aware of shallow or reference clones. This could be helpful.

One more note: The jenkins server is set up as a web server to the outside world. Our build machines are not exposed as servers. For security reasons no source code is allowed on the jenkins server; but it would be preferable to have this machine be in charge of running all builds. This makes sense to me from an organization standpoint, and the builds can be pretty resource-intensive so why not consolidate the build processes to run on the server.

In this specific case, jenkins breaks. My server cannot both run all builds, and not clone the repository. (I work around this fine, it is just a convenience item for now, but doesn't make sense to me).


On Mon, Dec 16, 2013 at 6:28 PM, teilo <teilo+...@teilo.net> wrote:
Sounds like you want a "shallow clone" rather than a full one. you have thus option on the git plugin for about a year.
That way you won't have all of the history for the clone and hence less disk space and network traffic is required.

David Gayman

unread,
Dec 17, 2013, 3:05:36 AM12/17/13
to jenkins...@googlegroups.com
Sure and I see how using a lot of small git repos, as opposed to one large repo, could be beneficial. But for established code bases this is often not the case.

Mark Waite

unread,
Dec 17, 2013, 4:20:47 AM12/17/13
to jenkins...@googlegroups.com
I may have misunderstood your comment about not wanting to clone the source code onto the Jenkins server.  If so, my apologies.  I assumed you meant that you don't want the project source code to ever arrive on the central Jenkins server.

One way to do that might be to constrain the jobs to never run on the master node (with "Restrict where this project can be run" set to "!master")

Another technique might be to use remote polling.  The central Jenkins server can poll for changes on the repository without having a complete copy of the repository if you are using Git plugin 2.0 (the current release), command line Git implementation (the default), and you do not require include regions or exclude regions.

In the Git plugin 2.0, the option to require a cloned repo on the master server is listed as "Force polling using workspace".  By default, the plugin uses "remote polling" instead of polling using a workspace.  The remote polling does not require a local workspace on the master Jenkins server, but it does require that you are using the Git command line implementation (rather than JGit) and it requires that you are not using include regions, exclude users or exclude regions.

Mark Waite


You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-use...@googlegroups.com.

David Gayman

unread,
Dec 19, 2013, 7:22:51 PM12/19/13
to jenkins...@googlegroups.com
Mark, correct I want no source code on the jenkins server. Currently I use "Restrict where this project can be run" to force the build to be run from one specific computer. Your logic would also work, the only issue being that if the build is started one day on slave A and the next day on slave B, a fresh clone of the repository must be performed at the top level potentially for every build. Jenkins was designed to work this way, sharing the load between many machines, which I'm sure is fine for small repositories but is cumbersome for larger ones.

Thank you for pointing out the remote polling option in the Git plugin 2.0. I was using an earlier version and wasn't aware of this. That's exactly the option I was looking for.

...Also are there security issues with this setup that I need to be aware of?

Mark Waite

unread,
Dec 19, 2013, 7:51:31 PM12/19/13
to jenkins...@googlegroups.com
Remote polling executes on the central server.  It does not require a copy of the source code on the central server, but it does require read access to the source code, since the "git ls-remote" command is used to perform the remote polling.

If you're unwilling to have the source code on your central server, you're probably also unwilling to have read access to the source code from the central server.  I haven't tried using remote polling on a machine which is not the central server.  I suspect that won't work.

Jenkins does allow you to add credentials to the server and then use those credentials from within a job definition.  I don't think there is a requirement that the Jenkins user generally needs read access to the git repository, just that the Jenkins process needs read access when using the credential you defined in Jenkins.

Mark Waite
Reply all
Reply to author
Forward
0 new messages