Is Jenkins doing a git pull between each matrix element?

711 views
Skip to first unread message

Brian Moffat

unread,
Aug 4, 2014, 12:11:02 PM8/4/14
to jenkins...@googlegroups.com
I have a matrix project to build different C# projects/solutions.  There is one git repository for all solutions.

The matrix job is configured for Git to checkout the master branch to a local directory, and then invokes a downstream project for each matrix element (solution).  That downstream project has not Git configuration, and simply builds the solution using MSBuild.

What I am observing (looking at the console output for the Matrix build job instance) is the matrix job performs the git checkout, which takes about 4 minutes.  It then says "Triggering <matrix element>", but that downstream project doesn't show up in the build list for another 4-5 minutes.  Based on noticing that the content of the checkout directory is changing (total size differences) during the period between when it says "Triggering ..." and when that downstream job actually starts, my guess is that another git checkout is occurring behind the scenes.

Obviously, this adds a significant amount of time to the total length of the matrix build.  The builds of each solution take under 1 minute.  So for the 7 solutions, I would expect the length of time to be 4+(7*1) or roughly 11 minutes, but with the 4-5 minute delay between each matrix element, it takes (4*7) + (7*1) or roughly 30 minutes.  Our CI/CD processes are maturing, and the lengthy delay between checkin at deploy to staging is now slowing us down.

Can anyone provide me guidance on what is going on behind the scenes (nothing in any Jenkins log that I can find), or tell me how I can configure the matrix job "properly" to not do this?

This has run fast, sometimes - but I think perhaps it stopped working after updating plugins, or may somehow be related to the plugins installed - I just don't know where to start to look.

The git section of the matrix job has:
- Additional Behaviors : Checkout to a subdirectory: C:\Build\Source\WebPortal\Master\Dev
- Advanced sub-module behavior: Recursively update submodules (checked)
- Prune stale remote-tracking branches (not sure this is necessary?)

Here is the console log for the matrix job:
Started by user <me>
[EnvInject] - Loading node environment variables.
Building in workspace C:\Tools\Jenkins\jobs\BuildAllWebPortals\workspace
 > C:\Tools\Git\cmd\git.exe rev-parse --is-inside-work-tree
Fetching changes from the remote Git repository
 > C:\Tools\Git\cmd\git.exe config remote.origin.url <our git repository url>
Pruning obsolete local branches Fetching upstream changes from <our git repository url>
> C:\Tools\Git\cmd\git.exe --version > C:\Tools\Git\cmd\git.exe fetch --tags --progress <our git repository url> +refs/heads/master:refs/remotes/origin/master --prune > C:\Tools\Git\cmd\git.exe rev-parse "origin/master^{commit}" Checking out Revision b18aa8cff0aa9da9166f740877845cef27b612ff (origin/master) > C:\Tools\Git\cmd\git.exe config core.sparsecheckout > C:\Tools\Git\cmd\git.exe checkout -f b18aa8cff0aa9da9166f740877845cef27b612ff > C:\Tools\Git\cmd\git.exe rev-list b18aa8cff0aa9da9166f740877845cef27b612ff > C:\Tools\Git\cmd\git.exe remote > C:\Tools\Git\cmd\git.exe submodule init > C:\Tools\Git\cmd\git.exe submodule sync > C:\Tools\Git\cmd\git.exe config --get remote.origin.url > C:\Tools\Git\cmd\git.exe submodule update --init --recursive Email was triggered for: Before Build Sending email for trigger: Before Build Sending email to: <distribution list> Triggering MobileASI
MobileASI completed with result SUCCESS
Triggering Furniture
Furniture completed with result SUCCESS
Triggering RetailPortal
RetailPortal completed with result SUCCESS
Triggering MobileLeons
MobileLeons completed with result SUCCESS
Triggering Leons
Leons completed with result SUCCESS
Triggering MobileVCF
MobileVCF completed with result SUCCESS
Triggering RetailTool
RetailTool completed with result SUCCESS
No emails were triggered.
Finished: SUCCESS

Mark Waite

unread,
Aug 4, 2014, 1:41:08 PM8/4/14
to jenkins...@googlegroups.com
On Mon, Aug 4, 2014 at 10:11 AM, Brian Moffat <crd...@gmail.com> wrote:
I have a matrix project to build different C# projects/solutions.  There is one git repository for all solutions.

The matrix job is configured for Git to checkout the master branch to a local directory, and then invokes a downstream project for each matrix element (solution).  That downstream project has not Git configuration, and simply builds the solution using MSBuild.

What I am observing (looking at the console output for the Matrix build job instance) is the matrix job performs the git checkout, which takes about 4 minutes.  It then says "Triggering <matrix element>", but that downstream project doesn't show up in the build list for another 4-5 minutes.  Based on noticing that the content of the checkout directory is changing (total size differences) during the period between when it says "Triggering ..." and when that downstream job actually starts, my guess is that another git checkout is occurring behind the scenes.

Your guess is correct.  Each instance gets a full copy of the repository.

If your repository is large, you could add the "Additional behaviours" for "Advanced clone options" and use a reference repository on the build machine(s).  For example, if you created a bare repository copy of your central git repo on each build machine at c:\bare\your-git-repo.git, then you could list c:\bare\your-git-repo.git as the reference repository location.  Git will then reference the contents of that bare repository and use that to reduce the "fetch" time.

That won't reduce the "checkout" time, just the "fetch" time.  If you also need to reduce the checkout time, then you might consider using "Sparse checkout" to only checkout those subdirectories which are actually required for your build.

Another alternative would be to implement "git-work-dir" support in the git plugin, and submit a pull request for it.  Refer to https://issues.jenkins-ci.org/browse/JENKINS-23594 for an enhancement request asking for that support.  That enhancement request seems to match your use case quite well.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Thanks!
Mark Waite

Brian Moffat

unread,
Aug 4, 2014, 2:24:14 PM8/4/14
to jenkins...@googlegroups.com
Thanks Mark.
I have a couple of follow-up questions, if you don't mind.

1.  I've seen it work - meaning that the matrix project would do one checkout and the downstream projects would run one after the other with no lengthy pause between.  So I'm left feeling like there should be a way to "get back to that" without the suggestions you made - since I've not modified the project configs since seeing it work.  It may be that at the time it was an "unintended feature" that got removed by a plug-in update, but that's pure conjecture on my part.

2.  Why does the matrix job always do a "get a full copy" of the repository before invoking the downstream job?  If that's what I wanted, I would have the downstream job do the pull.  To me it looks like a bug, not an enhancement request.  What am I missing?

-Brian

Mark Waite

unread,
Aug 4, 2014, 2:47:01 PM8/4/14
to jenkins...@googlegroups.com
On Mon, Aug 4, 2014 at 12:24 PM, Brian Moffat <crd...@gmail.com> wrote:
Thanks Mark.
I have a couple of follow-up questions, if you don't mind.

1.  I've seen it work - meaning that the matrix project would do one checkout and the downstream projects would run one after the other with no lengthy pause between.  So I'm left feeling like there should be a way to "get back to that" without the suggestions you made - since I've not modified the project configs since seeing it work.  It may be that at the time it was an "unintended feature" that got removed by a plug-in update, but that's pure conjecture on my part.

I'm reasonably sure that it has always behaved that way.  Each job in a matrix (multi-configuration) project may be on a separate machine, so each job performs a "git fetch" followed by a "git checkout".  However, if you can find versions of the git plugin and the git client plugin which behave the way you want, you could compare the source code of the current plugin and the version behaving the way you want to see if you can identify the cause of the behavioral change.
 
2.  Why does the matrix job always do a "get a full copy" of the repository before invoking the downstream job?  If that's what I wanted, I would have the downstream job do the pull.  To me it looks like a bug, not an enhancement request.  What am I missing?

I believe that each job in the matrix job needs an independent copy of the repository so that it can work independent of the other jobs in the matrix.



--
Thanks!
Mark Waite

Brian Moffat

unread,
Aug 4, 2014, 5:02:22 PM8/4/14
to jenkins...@googlegroups.com
Hi Mark,
Ok - I can see that my use-case is not universal and is likely on the "very simple" end of the spectrum.  I am currently not using distributed builds. My builds of multiple portals use the same code base and are serialized - therefore one checkout is all that is needed to build all of them.  In the short term, I'm thinking of doing the following - please tell me if this is an abomination in the Jenkins world.

I will create a new project that is not a matrix job and is triggered by an SCM change.  All it will do is the git checkout, and call a single downstream project, which will be the existing matrix project (without a trigger) that has the git section deleted and simply invokes the downstream build project iterating through the matrix elements.  My assumption is that since the SCM-triggered project will only do one checkout (explicitly or implicitly - since it is not a matrix project), and since the downstream projects do not have any git configurations, they will not do any checkouts either.

Feels like a kludge, but i think the result will be a complete build in 11 minutes as opposed to 30.

Do you agree?

Thanks,
Brian

Mark Waite

unread,
Aug 4, 2014, 10:43:07 PM8/4/14
to jenkins...@googlegroups.com
On Mon, Aug 4, 2014 at 3:02 PM, Brian Moffat <crd...@gmail.com> wrote:
Hi Mark,
Ok - I can see that my use-case is not universal and is likely on the "very simple" end of the spectrum.  I am currently not using distributed builds. My builds of multiple portals use the same code base and are serialized - therefore one checkout is all that is needed to build all of them.  In the short term, I'm thinking of doing the following - please tell me if this is an abomination in the Jenkins world.

Not an abomination, but it seems complicated to me. 
 
I will create a new project that is not a matrix job and is triggered by an SCM change.  All it will do is the git checkout, and call a single downstream project, which will be the existing matrix project (without a trigger) that has the git section deleted and simply invokes the downstream build project iterating through the matrix elements.  My assumption is that since the SCM-triggered project will only do one checkout (explicitly or implicitly - since it is not a matrix project), and since the downstream projects do not have any git configurations, they will not do any checkouts either.

I'm not sure what benefit you gain from a matrix job in that case.  Since builds are serialized, and able to use a single project which iterates through the matrix elements one at a time, why not just create a single freestyle project which calls a script that iterates through the projects and builds each project?

Using a starter job to do the checkout, then a matrix job to build the configurations seems like an interesting way to iterate through the configurations, but aren't the configurations already expressed in the source code, so you can write a script (MSBuild, perl, python, Java, etc.) which iterates through those configurations and builds each of them.



--
Thanks!
Mark Waite

Brian Moffat

unread,
Aug 5, 2014, 2:16:49 PM8/5/14
to jenkins...@googlegroups.com
Hi Mark,
The advantage of using my kludge as opposed to using a script to iterate through the builds is that Jenkins has all this great messaging (hipchat / email) and log capture that I would have to "re-implement". And I have a monitoring/reporting infrastructure in place that is based on the current matrix-based process - that too would need to change.  Certainly not insurmountable, but extra work.  I do appreciate your input and will make use of it in designing the evolution of our CI/CD process as our development environment evolves.

Thanks!
Brian

Les Mikesell

unread,
Aug 5, 2014, 2:29:30 PM8/5/14
to jenkinsci-users
On Tue, Aug 5, 2014 at 1:16 PM, Brian Moffat <crd...@gmail.com> wrote:
> Hi Mark,
> The advantage of using my kludge as opposed to using a script to iterate
> through the builds is that Jenkins has all this great messaging (hipchat /
> email) and log capture that I would have to "re-implement". And I have a
> monitoring/reporting infrastructure in place that is based on the current
> matrix-based process - that too would need to change. Certainly not
> insurmountable, but extra work. I do appreciate your input and will make
> use of it in designing the evolution of our CI/CD process as our development
> environment evolves.
>

Keep in mind that you do have the option of throwing resources at it
to speed the jobs up - that is add slave nodes that would do
independent checkouts and run the builds in parallel. If you are
bothered by a few extra minutes in a serialized run, it seems like the
way you should be thinking for a real speedup.

--
Les Mikesell
lesmi...@gmail.com

Brian Moffat

unread,
Aug 5, 2014, 5:49:19 PM8/5/14
to jenkins...@googlegroups.com
Hi Les,
In my case, I'm not bothered by the actual build time, which is under 1 minute per solution.  The problem is that the matrix build does a full repository pull before each matrix element, when, in my case, it is completely unnecessary, and adds 30+ minutes to the build time.  Using slave servers to parallel-ize the builds would still each need to do the pull, but it would reduce it from 30+ to around 6 minutes (if I had 7 or 8 slaves).  Seems like a high price to pay when I can use a single server and get it to 11 minutes total.  Couple that with our near-term future being one with a decreasing number of solutions (ultimately to 1), the need to parallel-ize (and the matrix project itself) will be removed.

The great thing about Jenkins, though, is that it provides a lot of flexibility in how to achieve your needs, and provides a lot of useful features.
Reply all
Reply to author
Forward
0 new messages