More complex git workflows

63 views
Skip to first unread message

HUSSEIN KADIRI

unread,
May 10, 2021, 7:29:46 PM5/10/21
to go...@googlegroups.com
Hi,
Sometimes a git repo is large and so it's not efficient/fast to do a git clone. One would have to use a reference repo. For all the parts that require git (config repository, git material, etc), is it possible to have more complex git workflows - utilizing a reference repo, shallow clone - since a simple git clone is not always possible? 

Marques Lee

unread,
May 10, 2021, 9:23:51 PM5/10/21
to go...@googlegroups.com
Git materials support shallow clone. I think you need to expand the advance tab to see the option. It’s of course in cruise-config.xml as well as the various pipelines-as-config syntaxes.

The workspace isn’t recloned every time either. If it exists on disk, it gets updated via fetch.

On Mon, May 10, 2021 at 4:29 PM HUSSEIN KADIRI <hoka...@gmail.com> wrote:
Hi,
Sometimes a git repo is large and so it's not efficient/fast to do a git clone. One would have to use a reference repo. For all the parts that require git (config repository, git material, etc), is it possible to have more complex git workflows - utilizing a reference repo, shallow clone - since a simple git clone is not always possible? 

--
You received this message because you are subscribed to the Google Groups "go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/go-cd/CAFD%2B7Dm2Ah3rAMzkt8j9pM_LoAKOn%3DgK0KMqNW-7dm8FuQOpyA%40mail.gmail.com.

HUSSEIN KADIRI

unread,
May 10, 2021, 9:25:33 PM5/10/21
to go...@googlegroups.com
My repo is too big, it can't be cloned by itself. it needs a reference repo. 

Is there a way to configure a reference repo ? 

You received this message because you are subscribed to a topic in the Google Groups "go-cd" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/go-cd/wEK4-BYReb8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to go-cd+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/go-cd/CAPKX9jatvS8kBrhek9meEdOJpus6Hb%2BpLBnWjcxNoQK97FXn7A%40mail.gmail.com.

Marques Lee

unread,
May 10, 2021, 9:31:24 PM5/10/21
to go...@googlegroups.com
Hmm. Well I suppose you could git clone —bare —mirror to a known location and then:

1) create git materials with file system paths for the url instead of http/ssh
2) run a cron to keep the ref repo updated

Would that work?

Sriram Narayanan

unread,
May 10, 2021, 9:36:09 PM5/10/21
to go...@googlegroups.com
On Tue, 11 May 2021 at 9:31 AM, Marques Lee <marqu...@thoughtworks.com> wrote:
Hmm. Well I suppose you could git clone —bare —mirror to a known location and then:

1) create git materials with file system paths for the url instead of http/ssh
2) run a cron to keep the ref repo updated

Would that work?

That’s what I do for my multi gig checkouts, in fact. 

HUSSEIN KADIRI

unread,
May 10, 2021, 9:37:13 PM5/10/21
to go...@googlegroups.com
I have elastic agents so the cron route would not be feasible. 

I mount a reference repo as a K8s PVC.

I want to do a git clone <url> --reference-repo <path to my mounted reference repo>. 

Can the git material support or be modified to accept reference repo paths? 

This is kind of a deal breaker if it can't 


Sriram Narayanan

unread,
May 10, 2021, 9:39:26 PM5/10/21
to go...@googlegroups.com
What does your reference repo contain? Is the content going to be used for the build? 

Ram

HUSSEIN KADIRI

unread,
May 10, 2021, 9:43:36 PM5/10/21
to go...@googlegroups.com
The reference repo is a recent copy (1 day old) of the repo. The content would be used for the build.

A build would clone from the reference repo, then fetch origin. This way the delta (1 day) is not too big. 

Cloning from scratch takes a long time because of the size of the repo. 


Marques Lee

unread,
May 10, 2021, 9:43:45 PM5/10/21
to go...@googlegroups.com
Hmm, unfortunately I don’t believe the git material in GoCD has native support for reference repos.

Could the server not run a cron to do git fetch —all on the bare repo to keep it updated?

Then all agents would pick up new changes so long as the volume holding the bare repo is mounted to agents.

Otherwise, yeah, GoCD may not be a good fit for you until we build in support for —reference-repo if that’s a deal breaker.

Marques Lee

unread,
May 10, 2021, 9:45:54 PM5/10/21
to go...@googlegroups.com
Basically the agents would be doing

git clone /local/path/to/bare/repo

In the scenario I was supposing

HUSSEIN KADIRI

unread,
May 10, 2021, 9:48:48 PM5/10/21
to go...@googlegroups.com
Hmm, we setup GoCD on kubernetes so the server is a kubernetes deployment. 

We are on GKE . Yes we can mount a volume but GKE PVC are only ReadWriteOnce (One pod mounted to read and write) or ReadOnlyMany (Multiple pods mounted and reading only from the volume).

Your recommendation requires a ReadWriteMany setup which is not possible in GKE.

Marques Lee

unread,
May 10, 2021, 9:53:57 PM5/10/21
to go...@googlegroups.com
Pardon my lack of familiarity with GKE, my question may be extremely naive - if there’s only process that needs to write to the mount (the cron that performs the fetch to keep the code updated), then by would the agents need read/write on the volume? Isn’t read-only enough to clone for a pipeline run?

Marques Lee

unread,
May 10, 2021, 9:55:21 PM5/10/21
to go...@googlegroups.com
Typo — “why would the agents need read/write” was what I meant

HUSSEIN KADIRI

unread,
May 10, 2021, 10:01:34 PM5/10/21
to go-cd
You're right the agent won't need read write. The volume is what needs to be setup as read write. See doc 
Write - would be the "cron" updating the repo
Read - would be the agents "cloning"
Reading and Writing can't happen simultaneously. 

HUSSEIN KADIRI

unread,
May 10, 2021, 10:06:02 PM5/10/21
to go-cd
Let me explore the local clone option some more
$ git clone /local/path/to/bare/repo

Thanks for the tip

Marques Lee

unread,
May 10, 2021, 10:12:55 PM5/10/21
to go...@googlegroups.com
Ah, I see. Probably would need some weird workaround to get this working then, which may not be worth your time/effort.

I think for us to support —reference, it would be straightforward but not exactly trivial either. It would require a modest code change from what I recall since I was working in that area late last year, but in a critical system that we’d probably need to be extra careful validating to make sure we don’t have regressions.

That said, since this is a dealbreaker for you, thanks for giving GoCD a try anyway. I might give a try to add support for that flag, but it won’t be overnight and likely not in a time frame that could work for you. Optional, but if you have any feedback about GoCD itself, we welcome that/would be certainly interested in hearing.

-Marques

Marques Lee

unread,
May 10, 2021, 10:12:57 PM5/10/21
to go...@googlegroups.com
Ok, fair enough :). Let us know if we can help in any way.

HUSSEIN KADIRI

unread,
May 10, 2021, 10:28:46 PM5/10/21
to go...@googlegroups.com
Thank you Marques, I really appreciate your help and support. You've been very responsive.
I'm looking for an alternative to Jenkins. Unfortunately we have really large repos so most existing CI tools don't work out of the box because of the clone issues.
I really like GoCD and have enjoyed creating a POC for it. I don't want to drop it just yet. I'll continue to look for a workaround.

Thanks again

  



--
Hussein Kadiri

Marques Lee

unread,
May 10, 2021, 10:36:45 PM5/10/21
to go...@googlegroups.com
Thank you Hussein for that, glad to help.

You may have seen this article here: 

I think this person works around limitations on readwritemany by creating separate PVs fir read and write backed by the same NFS disk. Not sure if that’s helpful or not or worth the extra infrastructure.

Sriram Narayanan

unread,
May 10, 2021, 11:49:41 PM5/10/21
to go...@googlegroups.com
Is creating a new PV and provisioning that a feasible option?

That's what I do on my Illumos build zones for some use cases - I use a ZFS snapshot, mount it into a Zone and run specific build experiments. It saves me a lot of time since I can avoid cloning gigabytes of code even from the same baremetal server.

HUSSEIN KADIRI

unread,
May 11, 2021, 8:10:31 PM5/11/21
to go-cd
Ok circling back to this. 
Thanks for the suggestion about using a local repo Marques.
TLDR: it worked!

Long form:
I created a bogus local repo (as a GKE PVC). I mounted the PVC on both the server and the agent as ReadOnlyMany (only way to get a PVC mounted on multiple pods).

With this, I configured the pipeline git material to point to the bogus local repo. This allows me to bypass the pipeline material restriction which requires a pipeline to have some sort of material.
Since the same disk is also mounted on the agent, I was able to clone the bogus repo. Now, i've successfully bypassed the material restriction of the pipeline and the CI builds on the agents does not fail! 

Now for the fun part, how do I get my actual large repo in the agent?
So I dug through the jenkins source code to see how they handle reference repos. 

This is what they do (which is what i'm doing as a job within the pipeline)   
$ git init <repo>
$ cd <repo>
$ echo "/path/to/reference/repo/.git/objects" > .git/objects/info/alternates
$ git config remote.origin.url  <repo url e.g https://github.com/gocd/somerepo>
$ git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master
$ git fetch --no-tags --progress <repo url e.g https://github.com/gocd/somerepo> +refs/heads/master:refs/remotes/origin/master
$ git checkout master
# Now I have a checked out repo  &  can run my builds!

You may ask instead of using a bogus repo, why not just mount the actual reference repo on the server and agent?
Well, the reference repo needs to be updated (daily). Since the reference repo is mounted as ReadOnlyMany, it can't be updated when it is in use. 
Update entails:
- Creating a new PV/PVC
- Updating pod template to use the new PV/PVC
- Delete the old PV/PVC

The agents are elastic - a build creates a new pod - so every new pod get the latest pod template with the latest PVC. 

The issue is with updating the PVC for the server. The server is a long running pod and the PV/PVC can not be swapped while it is running. 
It needs to be recreated which is quite disruptive to service. So I decided to mount a PV which does not change frequently hence the bogus repo. 


IMHO,  the condition that a pipeline needs to have a material is too restrictive. There are limited available materials and those materials don't cover all CI use cases. I think either have material that supports a robust amount of CI use cases OR make materials optional to pipelines.

Thank again for the suggestion Marques! 


 
 


Marques Lee

unread,
May 11, 2021, 8:33:34 PM5/11/21
to go...@googlegroups.com
I’m so glad you were able to get that working — Congrats!

Yes, GoCD doesn’t cover all possible material types, but tries to get most use cases.

The reason we have a material requirement is that in most cases, pipelines need to build something (eg, source code) and also needs to track updates (eg, an SCM repo, or upstream build pipeline) to know if a build is necessary. For custom cases, we support pluggable SCM materials. However, in your case, it would not have been fun to author a custom material plugin just to get your basic build working.

That said, I think it might be worthwhile looking into supporting —reference as it might be generally useful anyway and would probably help you not to require this workaround:).

I’ll talk to @aravindsv about it and see what he thinks.

Glad you’re up and running!

Reply all
Reply to author
Forward
0 new messages