why does multibranch pipeline fetch branch source 3 times?

615 views
Skip to first unread message

Tim Black

unread,
Dec 16, 2019, 9:37:07 PM12/16/19
to Jenkins Users
Is there ANY multibranch pipeline configuration that would allow me to:
* place a Jenkinsfile at single BranchSource repo root, and
* perform A SINGLE FETCH of this repo, full stop, and
* fetch --tags in this single fetch, and
* all of the above works when either "WipeWorkspace" or "CleanBeforeCheckout" traits are set, so that the initial fetch (tags and all) are preserved without having to fetch --tags again 
??

I have a multibranch pipeline project configured with a single BranchSource pointing at my repo containing Jenkinsfile. I have set the following BranchSource traits in config.xml:

              <jenkins.plugins.git.traits.BranchDiscoveryTrait/>
              
<jenkins.plugins.git.traits.TagDiscoveryTrait/>
              
<jenkins.plugins.git.traits.SubmoduleOptionTrait>
                
<extension class="hudson.plugins.git.extensions.impl.SubmoduleOption">
                  
<disableSubmodules>false</disableSubmodules>
                  
<recursiveSubmodules>true</recursiveSubmodules>
                  
<trackingSubmodules>false</trackingSubmodules>
                  
<reference></reference>
                  
<parentCredentials>false</parentCredentials>
                  
<shallow>false</shallow>
                
</extension>
              
</jenkins.plugins.git.traits.SubmoduleOptionTrait>
              
<jenkins.plugins.git.traits.CleanBeforeCheckoutTrait>
                
<extension class="hudson.plugins.git.extensions.impl.CleanBeforeCheckout"/>
              
</jenkins.plugins.git.traits.CleanBeforeCheckoutTrait>

trying to coerce the project to fetch the repo ONCE to obtain everything my pipeline needs. (I have also tried this with "WipeWorkspaceTrait" and I get same problem.)

What is happening is that I can configure the project to fetch tags but it's meaningless if I have either "WipeWorkspaceTrait" or "CleanBeforeCheckoutTrait" set. This is because both of these delete the tags in the working tree. The first fetch is obviously there for grabbing the Jenkinsfile, but I don't understand why it needs to wipe/clean AFTER that. Why do the "WipeWorkspaceTrait" or "CleanBeforeCheckoutTrait" have to be implemented AFTER the initial fetch?


Mark Waite

unread,
Dec 16, 2019, 9:49:55 PM12/16/19
to Jenkins Users
As far as I know, there isn't a way to avoid multiple fetches with the current git plugin and git client plugin implementation.

It should be feasible to eventually remove at least one of the duplicate fetches so long as the job has configured the checkout option to use the same refspec in the initial fetch as is used in the checkout.  Refer to "Honor refspec on initial clone" at  https://plugins.jenkins.io/git#clone-extensions .  Unfortunately, that is a "feasible" idea but not an implemented idea.  The duplicate fetch is performed in all job types, even Freestyle.  Thus, it may be even later than the cases you're trying to handle with multibranch pipeline.

Reference repositories, narrow refspecs, and shallow clone are the current alternatives to reduce the clone time and disc space for a git workspace.  Refer to https://www.slideshare.net/markewaite/git-for-jenkins-faster-and-better for slides that I presented at Jenkins World 2019 on those alternatives.  Refer to https://support.cloudbees.com/hc/en-us/articles/115001728812-Using-a-Git-reference-repository for a deeper dive into the technique.  Refer to https://youtu.be/jBGFjFc6Jf8 and https://youtu.be/TsWkZLLU-s4?t=139 for older video descriptions of the techniques.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/8e9b12ca-ac47-4da9-977f-99da4b88b255%40googlegroups.com.


--
Thanks!
Mark Waite

Tim Black

unread,
Dec 17, 2019, 9:45:40 AM12/17/19
to Jenkins Users
Thanks for the info Mark! I'm curious what are the "use cases that might break" if I enabled "Honor refspec on initial clone" which you mention in your Live Demo video around 7:55. I would guess that my team's use case, which is performing branch-specific multibranch pipeline builds that do not need to know about other branches, is the use case that could very much benefit from customizing the refspec to fetch only the branch a particular branch pipeline project cares about. (If this use case doesn't benefit I can't imagine one that would.) 

Looking in my multibranch pipeline job "BranchSource" config after adding "Advanced clone behaviours" I can check "Honor refspec on initial clone". I am assuming here it is critical for me to additionally set "Specify ref specs" behavior at the same time. BTW, do you know, can I use ${BRANCH_NAME} env var in the refspec, e.g. will this work for a mb pipeline refspec? 

    +refs/heads/${BRANCH_NAME}:refs/remotes/@{remote}/${BRANCH_NAME}

Seems like this should be a built-in option for mb pipeline configs. Anyway, I will experiment with this and see how much time savings we get. Let me know if there's anything else I should know about this or if I'm making any wrong assertions above.

..and to my original question, I suppose this means there's really no way to achieve a true "single fetch per build, tags and all", without removing either the "WipeWorkspaceTrait" or the "CleanBeforeCheckoutTrait", correct? I'm actually ok with it doing multiple fetches as long as it preserves the things (tags) it fetched initially. I don't understand why the implementation/timing of these traits are clobbering the tags I fetched in the initial clone.

Which is to say.. Why is there the distinction between the "initial fetch" and "the checkout"? I think there's a lot going on in the plugin-background here I don't understand. Can you point me to some docs that explain these concepts?

My team is interested in performing "clean checkouts" each build but perhaps we should be less paranoid and remove the above Traits (and maybe use a reference repo as well.) 

Thanks again.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkins...@googlegroups.com.


--
Thanks!
Mark Waite

Mark Waite

unread,
Dec 17, 2019, 10:54:13 AM12/17/19
to Jenkins Users
On Tue, Dec 17, 2019 at 7:45 AM Tim Black <timb...@gmail.com> wrote:
Thanks for the info Mark! I'm curious what are the "use cases that might break" if I enabled "Honor refspec on initial clone" which you mention in your Live Demo video around 7:55.

As an example, the Git plugin and the Git client plugin use the contents of their own repositories as part of their automated tests.  Those automated tests assumed that the history of all branches was available in the workspace repository.  Other use cases include automated merge from one branch to another.  Without both branches, an automated merge won't work.
 
I would guess that my team's use case, which is performing branch-specific multibranch pipeline builds that do not need to know about other branches, is the use case that could very much benefit from customizing the refspec to fetch only the branch a particular branch pipeline project cares about. (If this use case doesn't benefit I can't imagine one that would.) 

Certain branch sources in multibranch pipeline will automatically configure a narrow refspec that specifically includes only the branch being built.  I don't recall which, but I believe it is the GitHub, Bitbucket, Gitea, and Gitlab.  The git multibranch pipeline does not configure a narrow refspec if I recall correctly.
 

Looking in my multibranch pipeline job "BranchSource" config after adding "Advanced clone behaviours" I can check "Honor refspec on initial clone". I am assuming here it is critical for me to additionally set "Specify ref specs" behavior at the same time. BTW, do you know, can I use ${BRANCH_NAME} env var in the refspec, e.g. will this work for a mb pipeline refspec? 

    +refs/heads/${BRANCH_NAME}:refs/remotes/@{remote}/${BRANCH_NAME}


I believe that (or a variant of it) will work.  I use it very frequently in my jenkins-bugs repository where I have a branch per bug check.  Significantly faster to clone a single branch from that repository than to clone the entire repository.
 
Seems like this should be a built-in option for mb pipeline configs. Anyway, I will experiment with this and see how much time savings we get. Let me know if there's anything else I should know about this or if I'm making any wrong assertions above.

..and to my original question, I suppose this means there's really no way to achieve a true "single fetch per build, tags and all", without removing either the "WipeWorkspaceTrait" or the "CleanBeforeCheckoutTrait", correct? I'm actually ok with it doing multiple fetches as long as it preserves the things (tags) it fetched initially. I don't understand why the implementation/timing of these traits are clobbering the tags I fetched in the initial clone.


The WipeWorkspaceTrait means that the entire repository is removed from the workspace at each job.  It guarantees that everything must be fetched again.  You only want the CleanBeforeCheckoutTrait so that it will retain the existing repository but assure that the working files in the repository are clean.
 
Which is to say.. Why is there the distinction between the "initial fetch" and "the checkout"? I think there's a lot going on in the plugin-background here I don't understand. Can you point me to some docs that explain these concepts?


To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/b76debc5-2ec4-47ef-9e0d-1c11520dc9f5%40googlegroups.com.


--
Thanks!
Mark Waite

Tim Black

unread,
Dec 17, 2019, 11:23:15 AM12/17/19
to Jenkins Users
Understood. Note that, even with "CleanBeforeCheckout" I have to "re fetch tags" because my initial checkout, the one that "Discover Tags" causes, is cleaned up afterwards. This is very counter-intuitive, because there's no high level description of what's going on, or why, or what all the words ("Fetch", Checkout"..) mean.

I have empirically determined that "CleanBeforeCHeckout" really means "Clean after the initial checkout/fetch/clone, the one whose sole purpose in life is to scan for Jenkinsfile changes, but before the subsequent second (and maybe third) checkout/fetch/clone operation".

It appears that the former, initial checkout/fetch/clone has controls in BranchSource behaviors to get tags, or do not get tags, etc.. but the subsequent checkout/fetch/clone operations are still a complete mystery to me. Do I have control over when and how those are going to occur? E.g. how can I make the subsequent checkout/fetch/clone operations use `--tags` instead of `--no-tags`?


--
Thanks!
Mark Waite

Björn Pedersen

unread,
Dec 18, 2019, 5:06:28 AM12/18/19
to Jenkins Users


Am Dienstag, 17. Dezember 2019 17:23:15 UTC+1 schrieb Tim Black:
Understood. Note that, even with "CleanBeforeCheckout" I have to "re fetch tags" because my initial checkout, the one that "Discover Tags" causes, is cleaned up afterwards. This is very counter-intuitive, because there's no high level description of what's going on, or why, or what all the words ("Fetch", Checkout"..) mean.

I have empirically determined that "CleanBeforeCHeckout" really means "Clean after the initial checkout/fetch/clone, the one whose sole purpose in life is to scan for Jenkinsfile changes, but before the subsequent second (and maybe third) checkout/fetch/clone operation".

It appears that the former, initial checkout/fetch/clone has controls in BranchSource behaviors to get tags, or do not get tags, etc.. but the subsequent checkout/fetch/clone operations are still a complete mystery to me. Do I have control over when and how those are going to occur? E.g. how can I make the subsequent checkout/fetch/clone operations use `--tags` instead of `--no-tags`?

Yes, that behaviour is a feature of all scm-driven pipelines:

  1.  First fetch Jenkinsfile  on master(!) soley to determine what actually to do
  2.  Create a workspace (depending on what is defined in the file, this may be on a completly different host or a throw-away workspace.
  3. Fetch your repo into this workspace ( scm checkout ....)
You can try to configure the first checkout to use a sparse checkout.

For Multibranch pipelines, an additional  fetch is done before all this to see which  branches have changes that need a build.

Björn

Luca Milanesio

unread,
Dec 18, 2019, 5:20:31 AM12/18/19
to 'Björn Pedersen' via Jenkins Users, Luca Milanesio
I actually suffered a lot from the multiple fetches with the Gerrit Code Review plugin, which is a SCM source with support for multi-branch pipeline.
The high number of fetches was triggering throttling on the Gerrit side and thus it was taking *hours* to discover the new branches/changes.

I am interested in understanding the shortcuts to get rid of those multiple fetches :-)

Luca.
Reply all
Reply to author
Forward
0 new messages