Git plugin fetching too much data

386 views
Skip to first unread message

Marco Sacchetto

unread,
Nov 3, 2015, 4:47:20 AM11/3/15
to Jenkins Users
Hi,

I am trying to get a repository cloned inside a Jenkins job. The cloning operation works, but the downloaded data is way too much, and we are having issues since the site is on low bandwidth and the programmers want the workspace to be cleaned out at each run.
The git repository needs authentication, and I only need the master branch. If I try to run a clone operation manually from a console, it downloads around 3,5MB of data. When the git repository needs credentials, it seems that it automatically switches to using git init + git fetch. Git fetch, though, downloads around 100MB of data for nothing from the repository. I then set the refspec setting to "+refs/heads/master:refs/remotes/origin/master". If I try to run a git fetch manually from console with that refspec, once again I get the correct 3,5MB of data that I need.

The Jenkins Git plugin at this point behaves very strangely. What I see in the log is that it first runs:

"/usr/bin/git -c core.askpass=true fetch --tags --progress ***repository url here*** +refs/heads/*:refs/remotes/origin/* # timeout=30"

and only after that is done it finally runs a 

"/usr/bin/git -c core.askpass=true fetch --tags --progress ***repository url here*** +refs/heads/master:refs/remotes/origin/master # timeout=30"

This means that, deleting the workspace everytime, I still need to wait for all the 100MB of data to be downloaded again every time I run the job. Is there a reason for this behaviour? Is there also a way to inhibit it, or to force the git plugin to use clone instead of fetch?

Mark Waite

unread,
Nov 3, 2015, 7:11:47 AM11/3/15
to Jenkins Users
There is no way to force the plugin to use clone instead of fetch.  Even if there were, it would likely have the same problem, since clone is often described as "init + fetch".

You could reduce the amount of data transferred by using a shallow clone.  That's one of the checkboxes in the "Additional Behaviours" section.

You could reduce the amount of data transferred by retaining a reference copy of the repository on each slave agent.  That's one of the checkboxes in the "Additional Behaviours" section.

You can reduce the directories checked out with a sparse checkout, if you only need a subset of the directory tree.  That doesn't reduce the amount of data transferred, but reduces the time needed to perform the checkout.

I'm not sure why it is running the fetch with the full ref spec initially.  That seems like a bug.  However, it would need more investigation, and you probably want to reduce the amount of data transferred now.  To reduce the amount of data transferred immediately, use a shallow clone.

Mark Waite

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/10e38af4-0365-4a41-8fb1-8d2e641ee1d4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Marco Sacchetto

unread,
Nov 3, 2015, 8:12:33 AM11/3/15
to Jenkins Users

There is no way to force the plugin to use clone instead of fetch.  Even if there were, it would likely have the same problem, since clone is often described as "init + fetch".


That was meant as a way to try to overcome what seems to be a bug in the plugin, not sure it would have changed anything but at least would have been a try.
 
You could reduce the amount of data transferred by using a shallow clone.  That's one of the checkboxes in the "Additional Behaviours" section.


I know about the shallow clone and we use extensively but unluckily here it's useless. The problem is that the offending, big files are on a different branch from the one I need. With that fetch unluckily Jenkins downloads all of the branches, that's why it gets so slow. In this case for me the shallow clone would have no effect unluckily.
 
You could reduce the amount of data transferred by retaining a reference copy of the repository on each slave agent.  That's one of the checkboxes in the "Additional Behaviours" section.

Yes I know that as well, but we have hundreds of projects and that might become not so easy to manage - besides risking to become a big toll on disk space availability, which is often the reason for us wiping oout workspaces.

I'm not sure why it is running the fetch with the full ref spec initially.  That seems like a bug.  However, it would need more investigation, and you probably want to reduce the amount of data transferred now.  To reduce the amount of data transferred immediately, use a shallow clone.


I'll just wait to see if anybody else comes out with more ideas, if not I guess I'll file a bug on the plugin.
Thanks for your time!
Reply all
Reply to author
Forward
0 new messages