Understanding git-path Plugin Material Fetching

14 views
Skip to first unread message

Jason Smyth

unread,
Dec 28, 2024, 7:46:15 PM12/28/24
to go-cd
Hi everyone,

I am starting with the git-path plugin and I am having trouble understanding how it should be configured to ensure the files end up where I expect them when the agent fetches them.

I am working with a pipeline that uses the following materials (whittled down to what I understand to be the relevant bits):

    materials:
      App.Trunk:
        plugin_configuration:
          id: "git-path"
        options:
          path: Trunk
        destination: source/Project/App/Trunk
      App.Documents.Spec:
        plugin_configuration:
          id: "git-path"
        options:
          path: Documents/Spec
        destination: source/Project/App/Documents/Spec

The intention was that the contents of App$/Trunk should be placed in source/Project/App/Trunk and the contents of App$/Documents/Spec should be placed in source/Project/App/Documents/Spec. Instead, the plugin seems to be fetching the entire repo into each of the destinations. Is this the expected behaviour?

If so, are there any guidelines for how to deal with multiple git-path materials that need to poll different paths in a single repo, while ensuring that the relative paths remain intact on the agent at job run time?

Things that I need to consider:
  • App.Trunk and App.Documents.Spec are likely to be reused across various pipelines, though not necessarily always together.
  • We probably do not want to configure a custom git-path material for every existing combination of paths.
  • There is a significant amount of cross-repository code, so relative paths both inside and across repositories can be relevant. (I.E., for any given file, the right version of that file needs to be downloaded to "./Project/Repo/path/to/file".)

I'm thinking I will need to pull the git-path materials into a separate location, then copy the relevant files to the expected location in the first (few) task(s) of the job. (E.G., fetch them into ./git-path/<materialName>, then copy "./git-path/App.Trunk/Trunk" to "./source/Project/App/Trunk" ) This feels incredibly hacky, though. Are there any cleaner options?

Any feedback or advice is appreciated.

Cheers,
Jason Smyth

Chad Wilson

unread,
Dec 29, 2024, 1:33:12 AM12/29/24
to go...@googlegroups.com
Hiya Jason

On Sun, Dec 29, 2024 at 8:46 AM Jason Smyth <jsm...@taqauto.com> wrote:
Hi everyone,

I am starting with the git-path plugin and I am having trouble understanding how it should be configured to ensure the files end up where I expect them when the agent fetches them.

I am working with a pipeline that uses the following materials (whittled down to what I understand to be the relevant bits):

    materials:
      App.Trunk:
        plugin_configuration:
          id: "git-path"
        options:
          path: Trunk
        destination: source/Project/App/Trunk
      App.Documents.Spec:
        plugin_configuration:
          id: "git-path"
        options:
          path: Documents/Spec
        destination: source/Project/App/Documents/Spec

The intention was that the contents of App$/Trunk should be placed in source/Project/App/Trunk and the contents of App$/Documents/Spec should be placed in source/Project/App/Documents/Spec. Instead, the plugin seems to be fetching the entire repo into each of the destinations. Is this the expected behaviour?

Yes, it is the expected behaviour. It clones the entire repo and leaves everything else behind in other paths, at the versions they are current to as of the specific `path` (aka git ref spec). To my knowledge there is no native git way to get part of a file system tree like you'd suggest (as a git ref represents the state of the entire repo at a given commit independent of file system knowledge), so the only other alternative for the plugin's implementation would likely be to do some file system level hijinx to remove paths not fetched, which from a practical perspective would likely mean removing ability to use all possible git ref specs (documented here) and instead allowing only simple path prefixes (which would be difficult to validate in its own right without diving into a git ref spec parser.

Basically the git path plugin allows you to mitigate excessive triggering and reinterpret up-to-dateness for a subset of a repo (as opposed to the allowlist/denylist approach which have other problems) - but doesn't introduce some fuller concept of only fetching part of the git repo. The remaining clone is still a fully functional git working directory and repository off a given commit, which it would cease to be if doing some non-git-native hijinx afterwards. This is somewhat discussed at https://github.com/TWChennai/gocd-git-path-material-plugin?tab=readme-ov-file#stale-data-in-agent-repository-clones and is why the language/examples focus on how it monitors for changes.

The other problem with this approach here is that if a commit was made that changes contents of both "Trunk" and "Documents/Spec", the independent materials could detect this single commit at different times due to the way material polling works. A triggered build may kick off with only the changes for "Trunk" and the previous ref for "Documents/Spec" (or vice versa). If these paths are not sufficiently independent, modelling as separate materials is likely to hurt rather than help.
 

If so, are there any guidelines for how to deal with multiple git-path materials that need to poll different paths in a single repo, while ensuring that the relative paths remain intact on the agent at job run time?
 

Things that I need to consider:
  • App.Trunk and App.Documents.Spec are likely to be reused across various pipelines, though not necessarily always together.
  • We probably do not want to configure a custom git-path material for every existing combination of paths.
  • There is a significant amount of cross-repository code, so relative paths both inside and across repositories can be relevant. (I.E., for any given file, the right version of that file needs to be downloaded to "./Project/Repo/path/to/file".)

Only sensible option IMHO is to use a single material off the same "wider" repo for both paths (violating your second requirement)

path: "Trunk, Documents/Spec"

If you used something non-yaml to generate your config repo contents you could conceptually programmatically generate this.

I'm not sure I understand what "cross repository code" means, but you could perhaps consider shifting some of that responsibility to git submodules - although I don't really like them personally due to complexity and the way they change developer flow. I also don't know how effectively submodules work with the git-path plugin specifically.
 

I'm thinking I will need to pull the git-path materials into a separate location, then copy the relevant files to the expected location in the first (few) task(s) of the job. (E.G., fetch them into ./git-path/<materialName>, then copy "./git-path/App.Trunk/Trunk" to "./source/Project/App/Trunk" ) This feels incredibly hacky, though. Are there any cleaner options?

Personally I don't think what you are trying to do is compatible with the conceptual design goals of either git or GoCD. If these various paths are really independent of one another and cannot be looked at within the scope of the wider repo or organised into a "simple" mono repo structure, should they possibly be independent repositories?

Otherwise you are losing all the guarantees of GoCD that the materials are at consistent ref versions with one another, etc, and that a git repo at a given ref is a complete representation of the repo at that ref/sha. The git-path plugin already slightly moves away from the GoCD integrity guarantee to "allow" for different subsets of a repo to be considered as independent materials and push more "risk" into the hands of the user - but there's probably a limit to how far you should consider pushing that compromise.

But to answer your question, no there are no cleaner options if trying to slice-and-dice a repository at various repository versions/refs and assemble it back together. Personally I would (and have historically) combined paths together if I still felt the plugin was useful enough to use in its current form.

-Chad
 

Chad Wilson

unread,
Dec 29, 2024, 1:38:32 AM12/29/24
to go...@googlegroups.com
Sorry, I should have mentioned that I am broadly aware of git sparse-checkout (most similar to a git native approach for this) but have not gone into this in detail or evaluated whether it could make sense in the context of something like GoCD - or is really only something that could effectively be used by an end-user.

This git feature was added to git subsequent to most of the rework I did on the GoCD git-path plugin and I haven't looked at whether other build automation tools have support for use of this server side.

-Chad

Jason Smyth

unread,
Dec 29, 2024, 4:36:16 PM12/29/24
to go-cd
Hi Chad,

Thank you for the feedback.

> Personally I don't think what you are trying to do is compatible with the conceptual design goals of either git or GoCD. If these various paths are really independent of one another and cannot be looked at within the scope of the wider repo or organised into a "simple" mono repo structure, should they possibly be independent repositories?

Part of our struggle is that we are moving from a TFVC-based monorepo to a set of slightly more segmented Git repos. TFVC allows checking out any arbitrary section of the file tree and the above materials are direct translations of the existing TFVC materials into Git-path materials. (I.E., the App Git repo is an export of TFVC path "$/Project/path/to/App" and the existing materials pull from "$/Project/path/to/App/Trunk" and "$/Project/path/to/App/Documents/Spec".)

In terms of cross-repository code, what I mean is that if I am working with an app that lives at ./Apps/App1, the project for this code is structured in such a way that it assumes someLib exists at ../../Libs/someLib/bin (relative to my App1 directory). This relative path is accurate within the TFVC monorepo, but with the move to Git, someLib has been moved into a separate Git repo. In order to ensure the builds continue to work without having too much impact on the developer workflow, we need to recreate that same relative pathing at checkout time.

Realistically, we probably should work with the development team to come up with some new workflows that are more aligned with Git and code separation principles, but the objective of our current initiative is to modernize our tech stack as cheaply as possible. In practical terms, that means we are primarily looking to recreate our existing workflows using the current version of GoCD (instead of 19.8.0), backed by Git on Azure DevOps rather than TFVC on TFS Server 2013.

I think you are probably right and the only sane way of tackling this for now is to use no more than one git-path material for any given Git repo on a pipeline. I will have to think a bit on how to sanely name the materials to ensure that:
  • When 2 pipelines should share a material, it is easy to give it the same name in both YAML files, and
  • There are no naming conflicts between multiple pipelines that need different slices of any given repo.
Thank you again for taking the time to respond. As always, your insights are appreciated.

Cheers,
Jason


Chad Wilson

unread,
Dec 29, 2024, 10:23:30 PM12/29/24
to go...@googlegroups.com
On Mon, Dec 30, 2024 at 5:36 AM Jason Smyth <jsm...@taqauto.com> wrote:

In terms of cross-repository code, what I mean is that if I am working with an app that lives at ./Apps/App1, the project for this code is structured in such a way that it assumes someLib exists at ../../Libs/someLib/bin (relative to my App1 directory). This relative path is accurate within the TFVC monorepo, but with the move to Git, someLib has been moved into a separate Git repo. In order to ensure the builds continue to work without having too much impact on the developer workflow, we need to recreate that same relative pathing at checkout time.

Given the situation you are in, and assuming splitting out this part to another repo is desirable for other reasons - you could consider
  • whether git submodules are a reasonable enough "bridge" to your earlier TFVC monorepo (keeping in mind the complexity)
  • modelling as an artifact dependency instead, where every time those libs are changed, they are uploaded/artifacted by GoCD and then the pipelines that need them fetch the artifact and put it at the expected ../../Libs/someLib/bin location prior to the rest of the tasks running. If I recall correctly, artifact destinations in GoCD are pretty flexible about where those artifacts go, since the fetch process happens post clone/fetch

Which solution works the best probably depends a bit on the local developer workflow the devs teams want to have, since presumably when working locally they also need a mechanism to get those libs/binaries into the right location; or perhaps to work concurrently on changing code that contributes to those libs as well as the code that uses them.

-Chad
 
--
You received this message because you are subscribed to the Google Groups "go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/go-cd/c23f9c20-a7f3-4bd4-b58c-ba7348c1fa65n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages