Hi friends is it possible clone 18 GB repository in Jenkins....

1,958 views
Skip to first unread message

bandi pavankumar reddy

unread,
Jul 19, 2014, 2:50:52 AM7/19/14
to jenkins...@googlegroups.com
Hi friends i am trying to clone 18 GB repository in Jenkins it's giving  time out error and i already mentioned timeout 60 minuets i n 13 th minute it' displaying time out error ...give me any suggestions it is possible or not 

abhinavn

unread,
Jul 19, 2014, 4:27:32 AM7/19/14
to jenkins...@googlegroups.com
This is not related to jenkins, for cloning large git repo you need to follow shallow clone

Mark Waite

unread,
Jul 19, 2014, 8:59:11 AM7/19/14
to jenkins...@googlegroups.com
I've seen several challenges with large git repositories, and several ways to handle those challenges.
  1. Use a reference repository to reduce the amount of data to be transferred during the "fetch".  Accept that you may need to periodically update that local reference repository with the most recent changes from the central repository.  Refer to http://randyfay.com/content/reference-cache-repositories-speed-clones-git-clone-reference
  2. Use a shallow clone to reduce the amount of data to be transferred during the "fetch".  Accept that a shallow clone into the Jenkins workspace does not carry all the history with it.  Read http://stackoverflow.com/questions/6941889/is-git-clone-depth-1-shallow-clone-more-useful-than-it-makes-out for more information
  3. Use a sparse checkout to reduce the amount of data to be created during the "checkout".  Accept that your Jenkins job must manage the definition of the subset of directories updated by the "checkout".  Add the "Additional Behaviours" "Sparse Checkout paths" to your Jenkins job definition and insert the list of directories to checkout
A combination of 1 and 3 or 2 and 3 usually gives the fastest checkout.  At work, we have a painfully large 7 GB repository.  We use a reference repository and a sparse checkout to make the checkout of the small portion we require very fast (on the order of seconds rather than minutes).

If you require a full and complete checkout of an 18 GB repository on a system with slow discs, the current Jenkins git plugin probably won't do it for you, due to one of the problems described in https://issues.jenkins-ci.org/browse/JENKINS-20387 .  The "git checkout" command in git-client-plugin has a 10 minute timeout which can only be adjusted by setting a property on the java command line.  If you need to perform a "checkout" which takes longer than 10 minutes to complete, then the git-client-plugin currently requires that you set that property.  That may be changed in a coming release of the git-client-plugin so that the timeout configured in the git plugin UI applies to "checkout", not just to "fetch".

Mark Waite


--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Thanks!
Mark Waite

Russ Tremain

unread,
Jul 19, 2014, 10:08:54 AM7/19/14
to jenkins...@googlegroups.com
Whenever you clone a git repository, git copies all of the objects from the original repo to the ".git" directory in the clone, along with the clear-text files of the branch you are cloning.

This is a major problem with git scalability, and there are various workarounds, including shallow clones.

One that I like, is to import your git repository into Perforce via the Git Fusion addon, and then break up the repository into multiple smaller git repositories using Perforce client views (we call this "slice and dice").

Of course, once your source is in Perforce, you can always just use the Perforce client views directly in Jenkins, and avoid copying the Git meta directory altogether. This is much faster for a build checkout.

Another advantage of using Perforce to store your git repositories is, it is scalable - i.e., you can replicate your git repos via Perforce replicas and edge servers. Perforce stores the git objects unmolested as binary objects, so you can always recreate the original repository anywhere you have a replica. (We do this at Perforce - we use a "build farm" replica exclusively for build checkouts).

I gave a webinar on the "slice and dice" technique a while back:

http://www.perforce.com/resources/presentations/webinars/dev-talk-avoid-git-bloat-submodule-hell

I also did a tutorial on setting up and exploring Git Fusion:

http://www.perforce.com/blog/130702/using-git-api-perforce-part-1

cheers,
-Russ

At 11:50 PM -0700 7/18/14, bandi pavankumar reddy wrote:
>Hi friends i am trying to clone 18 GB repository in Jenkins it's giving time out error and i already mentioned timeout 60 minuets i n 13 th minute it' displaying time out error ...give me any suggestions it is possible or not
>
>--

bandi pavankumar reddy

unread,
Jul 19, 2014, 2:46:58 PM7/19/14
to jenkins...@googlegroups.com
i configured correctly and i configured time out 60 minutes and while starting build OK after 13 to 14 minutes it was giving error time out..some files fetching and cloning also....but i having total 16 files in that 7 files cloning in 13 minute ....but still 6 file's pending with time out error...i was configured git repository with advanced clone option in that i mentioned time out.....i am sending Error... i tryed with shallow clon also

Fetching upstream changes from g...@100.00.1000.0:Uuuu_P_SOURCE.git 
> C:\Program Files (x86)\Git\bin\git.exe fetch --tags --progress g...@100.00.1000.0:Uuuu_P_SOURCE.git +refs/heads/*:refs/remotes/origin/* 
> C:\Program Files (x86)\Git\bin\git.exe rev-parse "origin/master^{commit}" 
Checking out Revision a9f4bc55deac7ceb3ef93799ca15a83cea955242 (origin/master) 
> C:\Program Files (x86)\Git\bin\git.exe config core.sparsecheckout 
> C:\Program Files (x86)\Git\bin\git.exe checkout -f a9f4bc55deac7ceb3ef93799ca15a83cea955242 
ERROR: Timeout after 10 minutes 
FATAL: Could not checkout null with start point a9f4bc55deac7ceb3ef93799ca15a83cea955242 
hudson.plugins.git.GitException: Could not checkout null with start point a9f4bc55deac7ceb3ef93799ca15a83cea955242 

Mark Waite

unread,
Jul 19, 2014, 3:08:04 PM7/19/14
to jenkins...@googlegroups.com
The error "Timeout after 10 minutes" in your output shows that you're seeing https://issues.jenkins-ci.org/browse/JENKINS-23476 .  The plugin assumes (incorrectly in your case) that a checkout operation (which is entirely local disc I/O) will not need more than 10 minutes.  Your repository is so large, or your disc I/O is so slow that you can't checkout within the 10 minute timeout period.

You can either use sparse checkout within the git plugin to reduce the time required to perform a checkout, or you can switch from using the git plugin to perform your checkout yourself as a build step.  Performing the checkout yourself as a build step is more complicated and more error prone, but does not have the timeout limit.  Unfortunately, it also makes Jenkins less useful, because you can no longer see source code changes as part of the job, and you can no longer poll for changes.

Mark Waite


--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Thanks!
Mark Waite

bandi pavankumar reddy

unread,
Jul 23, 2014, 11:24:44 AM7/23/14
to jenkins...@googlegroups.com, Mark Waite
Hi mark please suggest me any thing still i am getting this error even i mentioned time out also..
problem is with Jenkins or git....

i configured correctly and i configured time out 60 minutes and while starting build OK after 13 to 14 minutes it was giving error time out..some files fetching and cloning also....but i having total 16 files in that 7 files cloning in 13 minute ....but still 6 file's pending with time out error...i was configured git repository with advanced clone option in that i mentioned time out.....i am sending Error... i tryed with shallow clon also

Fetching upstream changes from g...@100.00.1000.0:Uuuu_P_SOURCE.git 
> C:\Program Files (x86)\Git\bin\git.exe fetch --tags --progress g...@100.00.1000.0:Uuuu_P_SOURCE.git +refs/heads/*:refs/remotes/origin/* 
> C:\Program Files (x86)\Git\bin\git.exe rev-parse "origin/master^{commit}" 
Checking out Revision a9f4bc55deac7ceb3ef93799ca15a83cea955242 (origin/master) 
> C:\Program Files (x86)\Git\bin\git.exe config core.sparsecheckout 
> C:\Program Files (x86)\Git\bin\git.exe checkout -f a9f4bc55deac7ceb3ef93799ca15a83cea955242 
ERROR: Timeout after 10 minutes 
FATAL: Could not checkout null with start point a9f4bc55deac7ceb3ef93799ca15a83cea955242 
hudson.plugins.git.GitException: Could not checkout null with start point a9f4bc55deac7ceb3ef93799ca15a83cea955242 

--
You received this message because you are subscribed to a topic in the Google Groups "Jenkins Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jenkinsci-users/xMYzhcYnC0s/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jenkinsci-use...@googlegroups.com.

Mark Waite

unread,
Jul 23, 2014, 1:03:30 PM7/23/14
to jenkins...@googlegroups.com
Please, read my earlier reply on this same thread.  The answer is the same to this question, since they are the same failure.

The problem is in the git client plugin or in the combination of the git client plugin, your very large repository, your slow file system, and/or your unwillingness to use "sparse checkout" to reduce the size of the content to be checked out to the working directory.

Mark Waite

Rob Mandeville

unread,
Jul 23, 2014, 2:54:18 PM7/23/14
to jenkins...@googlegroups.com

Okay, thinking laterally here.

 

Let’s assume that the Git plugin timeout is hardcoded to 10 minutes.  Now, all you have to do is bring the checkout under 10 minutes.  This may well be solvable.

 

First, if you haven’t done it already, use shallow checkouts.  The Git plugin has had that since 1.1.23 (September 2012).  If you’re building, you don’t need the histories, just the current version.  If that doesn’t help…

 

Get a sysadmin and profile the pulls.  Is your Git server maxing out on CPU or (more likely) disk I/O?  Is your client?

 

If your server is maxxing out, you need to either beef up your server or reduce the load.  Increasing disk speed is between you and your sysadmins, assuming that you own the server.  Reducing load?  Try one or more of these:

 

·         Stop polling.  If you can use GitLab, there’s a plugin to have GitLab push to Jenkins.  If you have a dozen polling projects, this will reduce load big-time.  There may be other Git push solutions for other Git servers; I don’t know.

·         The last time I had checkouts take over 10 minutes (on a proprietary system, not Git), the problem was that nightly builds kicked off all at once and tried to pull 60 branches of the code at once.  Solution?  Use the https://wiki.jenkins-ci.org/display/JENKINS/Throttle+Concurrent+Builds+Plugin, make pulling from source control into its own step, and only allow 3-5 simultaneous pulls.

·         If you have to poll, set up your polling schedule with ‘H’ notation (see the help for the polling schedule on your  job) to spread the polling around.

·         Compress the binaries you have in Git.  That can’t all be source, can it?

·         Better yet, put the binaries into something like Artifactory and have the build job pull them down after getting the actual source.

 

If your network is maxxing out, try one or more of these (some assume that you own your Git server hardware and network—if you’re running off of GitHub, some of these won’t work).

 

·         Put your build machines (and thus your Jenkins slaves) on the same subnet as the Git server, whether or not the Jenkins server is there as well.  If that’s impossible, at least get it to the same site (so it’s all LAN, no WAN).

·         Replicate the Git server on the subnet your build hosts are on.  Git is built to be distributed.

·         If you can’t put your build farm near your source farm, at least get a Jenkins slave over on the same network as the Git server.  Give it a job that polls Git.  Rather than actually performing the build, have it compress the sources into a giant Zip file, archives that, then kick off a downstream job (that runs on your local build farm) that unzips the artifact and does the build and test run.  You may need plugins to do this right.  The upstream job will still be able to tell you the changes made to the source, and point you to the downstream job with the actual results.

 

If the server and network are fine, but your build box is maxxed out on I/O writes, you’re going to have to beef up your hardware (or run fewer builds at once, if you run multiple builds on one host).  Get faster drives and/or get a RAID controller for your builds and put it into some sort of striping mode for faster writes.  If you just keep your sources and builds on the RAID (having more permanent things like the OS and your compilers on another drive/RAID), you probably don’t have to have that RAID actually be redundant.  If a drive blows, you lose your current build, swap out another drive, and try again.

 

--Rob

Click here to report this email as spam.



This e-mail and the information, including any attachments it contains, are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.

Mark Waite

unread,
Jul 23, 2014, 3:10:45 PM7/23/14
to jenkins...@googlegroups.com
Much appreciated for the lateral thinking!  Thanks for the suggestions.  I've embedded comments into the items.


On Wed, Jul 23, 2014 at 12:54 PM, Rob Mandeville <rmand...@dekaresearch.com> wrote:

Okay, thinking laterally here.

 

Let’s assume that the Git plugin timeout is hardcoded to 10 minutes.  Now, all you have to do is bring the checkout under 10 minutes.  This may well be solvable.

 


The git plugin updates the working directory in a two step process, with different interactions and different timeouts applicable to the different steps.

  1. Fetch remote changes to local repository ("git fetch") - timeout can be adjusted by user from the job definition page - "Additional Behaviours", "Advanced clone behaviors" - this is network intensive and requires work from the central git server, the network, and the local server
  2. Checkout working directory from local repository ("git checkout") - timeout cannot be adjust by user without passing a property argument to the Java virtual machine - this is disc intensive and only requires work from the local server and the local file system.  No network operations are involved in checkout.
 

First, if you haven’t done it already, use shallow checkouts.  The Git plugin has had that since 1.1.23 (September 2012).  If you’re building, you don’t need the histories, just the current version.  If that doesn’t help…

 

Get a sysadmin and profile the pulls.  Is your Git server maxing out on CPU or (more likely) disk I/O?  Is your client?

 


I don't think that's their issue, since the fetch phase is not the source of the timeout.
 

If your server is maxxing out, you need to either beef up your server or reduce the load.  Increasing disk speed is between you and your sysadmins, assuming that you own the server.  Reducing load?  Try one or more of these:

 

·         Stop polling.  If you can use GitLab, there’s a plugin to have GitLab push to Jenkins.  If you have a dozen polling projects, this will reduce load big-time.  There may be other Git push solutions for other Git servers; I don’t know.


That's good advice, but I don't think central server load is their issue, since the timeout happens in the "checkout" phase, not the "fetch" phase
 

·         The last time I had checkouts take over 10 minutes (on a proprietary system, not Git), the problem was that nightly builds kicked off all at once and tried to pull 60 branches of the code at once.  Solution?  Use the https://wiki.jenkins-ci.org/display/JENKINS/Throttle+Concurrent+Builds+Plugin, make pulling from source control into its own step, and only allow 3-5 simultaneous pulls.

·         If you have to poll, set up your polling schedule with ‘H’ notation (see the help for the polling schedule on your  job) to spread the polling around.

·         Compress the binaries you have in Git.  That can’t all be source, can it?


That's excellent advice.  If their "checkout" phase is copying large binaries from the local git repository to the local git working directory, and if they can reduce or eliminate those copies, it may increase the speed of the checkout. 
 

·         Better yet, put the binaries into something like Artifactory and have the build job pull them down after getting the actual source. 


Also excellent advice, and may substantially reduce checkout time by shifting the "get the large binaries" activity from a source master checkout fetch and checkout to a build time copy from a Artifactory or a local cached copy.
 

If your network is maxxing out, try one or more of these (some assume that you own your Git server hardware and network—if you’re running off of GitHub, some of these won’t work).

 

·         Put your build machines (and thus your Jenkins slaves) on the same subnet as the Git server, whether or not the Jenkins server is there as well.  If that’s impossible, at least get it to the same site (so it’s all LAN, no WAN).

·         Replicate the Git server on the subnet your build hosts are on.  Git is built to be distributed.

·         If you can’t put your build farm near your source farm, at least get a Jenkins slave over on the same network as the Git server.  Give it a job that polls Git.  Rather than actually performing the build, have it compress the sources into a giant Zip file, archives that, then kick off a downstream job (that runs on your local build farm) that unzips the artifact and does the build and test run.  You may need plugins to do this right.  The upstream job will still be able to tell you the changes made to the source, and point you to the downstream job with the actual results.

 

If the server and network are fine, but your build box is maxxed out on I/O writes, you’re going to have to beef up your hardware (or run fewer builds at once, if you run multiple builds on one host).  Get faster drives and/or get a RAID controller for your builds and put it into some sort of striping mode for faster writes.  If you just keep your sources and builds on the RAID (having more permanent things like the OS and your compilers on another drive/RAID), you probably don’t have to have that RAID actually be redundant.  If a drive blows, you lose your current build, swap out another drive, and try again.

 



 

--Rob


Mark Waite



--
Thanks!
Mark Waite

bandi pavankumar reddy

unread,
Jul 24, 2014, 7:44:56 AM7/24/14
to jenkins...@googlegroups.com
Hi mark 20 GB repository cloning is not posible in my jenkins but i cloned that repository in local D drive in "Workspace" folder through bash prompt and in jenkins slave Remote FS root i mentioned this directory name "D:/workspace" and next i created one job under this slave "git repository name and job name i mentioned same and i mentioned 20 Gb repository git url after then i did build this way fetching details easily.... so easily jenkins job fetching details and build will be sucess 
this way is ok are it will create any issue's....

Mark Waite

unread,
Jul 24, 2014, 8:28:10 AM7/24/14
to jenkins...@googlegroups.com
Try http://ingorichter.blogspot.com/2012/02/jenkins-change-workspaces-and-build.html as a blog posting which shows you how to do that.

Mark Waite

bandi pavankumar reddy

unread,
Jul 25, 2014, 6:08:34 AM7/25/14
to jenkins...@googlegroups.com
Dear sir 
Greetings...

Hi mark Sir i am Pavan Thank you very much mark sir  i never forgot you and you have helped me a lot....thank you very much

Thanks&regards
B.Pavankumar Reddy
917418801319

Adam Westhusing

unread,
Jul 29, 2014, 2:31:36 PM7/29/14
to jenkins...@googlegroups.com
Not sure if this was already suggested, but when configuring the Git repository in your project you can set the timeout.

Look at "Additional Behaviors" and add "Advanced Clone Behaviors".  There you can set "Timeout (in minutes) for clone and fetch operation" to whatever you need.

Mark Waite

unread,
Jul 29, 2014, 2:57:06 PM7/29/14
to jenkins...@googlegroups.com
Thanks.  Yes, Pavan is aware of the timeout value available on clone.  He adjusted that.  His timeout was happening during checkout, a phase which is entirely on the local disk, and which currently does not have a way to adjust the timeout.

Mark Waite
Reply all
Reply to author
Forward
0 new messages