am i missing something about artifacts?

33 views
Skip to first unread message

Josh

unread,
Dec 17, 2024, 12:08:18 AM12/17/24
to go-cd

we've used gocd for many years and it's a great product.

been having an occasional issue that is increasing as we increase deployment frequency.
sometimes pipelines briefly get "stuck" on old dlls, meaning that sometimes a downstream pipeline will fail to run because it's been packaged with older dlls. it's only between upstream and downstream pipelines, never in the same pipeline. this occurs infrequently infrequently but perhaps as much as 1 out of every 5 builds.

the workaround fix is to run a script on all the agents that periodically refreshes and rebuilds all the pipelines manually.  not sure why this works but it always does.

haven't been able to figure out the cause, i'm wondering if it's a misunderstanding about artifacts, or otherwise misconfigured artifacts?

here's the situation:
  • we have a pipeline template that runs 8 or 9 pipelines
  • the template (and thus every pipeline) has 4 stages: prep, build, test and package
  • prep stage: doesn't do much, mostly just analysis
  • build stage: pulls code from the repo and builds it, builds artifacts from all binaries built and puts them in gocd at: #{project-name}/build
  • test stage, fetches those artifacts stored in #{project-name}/build, puts them in local build directory and then runs tests, saves test artifact (not used in build/pkging)
  • package stage: fetches artifacts stored at stored in #{project-name}/build, puts them in local build directory and packages them up
as i say, most of the time it works great but occasionally a mismatch between a previously built upstream pipeline (older version) gets mixed in with a newer pipeline build, and while it compiles when you run something with the mismatched versions it generates a runtime exception.

as i'm describing this, i believe the cause might be that since we have multiple agents, a given agent might not always be scheduled to build every pipeline stage. 

so eg if project2 is downstream from project1:

agentA builds project1.verX
agentB builds project2.verX

[project 2 changes]

agentA builds project2.verY
agentA still has project1.verX binaries locally, so these get built against project2.verY

then when the binaries get packaged up, you get the version mismatch.

it seems like what maybe should occur is that we should have pipelines also fetch artifacts from all their upstream dependencies (vs just fetching from their upstream stages, as i described above).

however I'm not certain how to do this with pipeline templates, since we could have multiple upstream pipelines to fetch from? 

so i wanted to add an arbitrary # of 'fetch artifact' tasks to a build stage's pipeline, and then put all it's upstream pipelines as parameters... how can i make the pipeline properly fetch all of:
  • zero upstream pipelines
  • one upstream pipeline
  • multiple upstream pipelines
?

Hopefully this makes sense. 

My Idea:
  • Is there a way i can somehow create a 'upstream-pipeline-list' parameter, have each pipeline list their upstreams in CSV fasion, and then have gocd fetch EACH of these upstream pipeline builds prior to actually building the stage?
To me putting #{upstream-pipeline-list} in a single 'fetch artifact' task doesn't seem right, since the context of the task seems to only take one source location, not multiple.  

But I misunderstood this before regarding resources, so I figured it was worth asking.

Or maybe there's some other even more obvious thing I"m missing (outside of a monorepo, we can't use a monorepo here at least not presently).   What is the 'GOCD WAY' to handle this properly?

appreciate any assistance

-j


Chad Wilson

unread,
Dec 17, 2024, 12:44:24 AM12/17/24
to go...@googlegroups.com
This does sound broadly like something that GoCD is designed to handle - ensuring consistent and reproducible artifact and/or material inputs. Using the server to mediate (store and fetch) artifacts between stages or pipelines is also intended usage.

To confirm - "local build directory" in your description is inside the normal agent working directory that GoCD creates inside pipelines/ rather than somewhere elsewhere on the agent file system?

1) do the DLLs get put/copied/fetched into a location that is inside a Git material repo clone? e.g <working-dir>/test-repo where "test-repo" is a Git material with alternate checkout location or if Git material is cloned directly to <working-dir>
2) if NOT, and they are inside the agent the working area, but OUTSIDE the clone does your pipeline that packages the DLLs clean its workspace from previous runs every time it executes, i.e have you enabled this for the stage?

image.png

If "no" to both questions - I possibly know a possibly root cause, as I've seen it myself. :-/

-Chad

--
You received this message because you are subscribed to the Google Groups "go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/go-cd/066669fa-b0ae-40ad-bfb2-0cf3e567e641n%40googlegroups.com.

Joshua Franta

unread,
Dec 17, 2024, 1:16:55 AM12/17/24
to go...@googlegroups.com

> To confirm - "local build directory" in your description is inside the normal agent working directory that GoCD creates inside pipelines/ rather than somewhere elsewhere on the agent file system?

yes exactly the 'fetch artifact' task pulls the binaries back into the agent's working directory for that pipeline (aka local build directory in my parlance)

1. not sure i understand the intent of this question, but most of these pipelines use svn/subversion not git. (there maybe 1-2 using git).
perhaps you mean how a given project/pipeline sources dependencies NOT in it's own repo/material?
if that's the question, the agents each have 'pipelines' folder that holds all the agent working directories, so eg:

agent-pipeline-dir
  • project1-working-dir/
  • project2-working-dir/
to avoid monorepo complexity, projects can assume their non-package upstream dependencies live one level up from their working directory.
so each project is either cloned (git) or checked out (svn) from the agent working directory to the project/pipeline working directory (that the agent is configured to use for each pipeline in template).

2. i just checked and we do NOT have 'clean working directory' set on any stage of this pipeline/templates.  
this would only apply to the project1/2-working-directory in my example in #1, yes?
how would this help making sure the upstream binaries were correct? (or maybe it wouldn't and you're just asking to understand, not to suggest)

so at least No on 2 i think, not sure about whether #1 is a no for you.

thx again for ur help

You received this message because you are subscribed to a topic in the Google Groups "go-cd" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/go-cd/7HMOd1Z_3oM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to go-cd+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/go-cd/CAA1RwH-jPedtiQtUt6qQhAGLSu3-CkZESg0Sx1qjRCkERFU6tg%40mail.gmail.com.

Chad Wilson

unread,
Dec 17, 2024, 2:52:28 AM12/17/24
to go...@googlegroups.com
If you're using subversion and you don't have "clean working directory" checked then the problem I have seen might explain something like this. (I mentioned git because most folks use git, and the git integration by default cleans the clone locations every build using git tools independent of "clean working directory").

If you enable "clean working directory" on these stages what I imagine will probably happen is that "1 in 5" of these builds will now fail at the "test" or "package" stages due to the DLLs being completely missing - rather than stale. That's probably better semantics. But cleaning the working dir might make things slower in other areas of your build so for some folks that's not ideal.

What I suspect could be happening is that the fetch artifact is actually downloading an "empty artifact" silently and instead of replacing the previous binaries/dlls is perhaps using whatever was there from the last run. is that possible given your agents and build scripting?
  • How big are the artifacts being uploaded?
  • Could you share the layout of the artifacts as uploaded by the "build" stage, and the "fetch" configuration used? 
e.g GoCD itself (we use GoCD to build GoCD) has a stage with artifacts uploaded like so:

image.png

The next stage in the same pipeline (very similar to your setup) does the following fetch
image.png

When specified this way (fetch entire artifact directory, not individual file) what the server does is zip up all of the things inside the "dist/zip" folder and send this zip to the agent. The agent then unzips into the working dir.

Why might this be the issue?

Something like I describe above can happen due to a design decision in GoCD which I personally consider a bug and which I have seen (but have never come across being documented - I should probably dig).

If I recall correctly the details of what can happen, it is basically possible for a subsequent stage to trigger, and start fetching artifacts before the previous stage's uploaded artifacts have actually been processed properly and are ready to download. I believe when you ask for an artifact directory to be fetched it might be possible for it to just download an empty zip rather than "failing fast" because requested directory is missing. This is much more likely to happen with large artifacts; with slow artifact uploads or slow GoCD server/networking.

  1. Possible workaround if you want to confirm this is the problem while getting things to "fail fast": I believe if you download an individual specific zip rather than a directory it will fail at the fetch step if the artifact is not there, after retrying. Not always possible if the file/zip name is not deterministic (e.g includes a build number or something)
  2. Possible workaround for the main problem:
    • Only worth doing if you confirm the root cause. Add a "sleep" type of step before the fetch :-( Not 100% reliable unless you make it sleep a lot.image.png
    • Use sequential tasks for build/test/package rather than relying on artifact upload/fetch, if you don't need the intermediary artifacts on the GoCD server for other reasons.

My main reservations/open questions as to whether I understand and this could explains your problem is that
  • you said "it's only between upstream and downstream pipelines, never in the same pipeline." but then you described a set of stages inside a single pipeline for build-->test-->package? What am I missing?
  • you implied the "package" step is affected, and that has the "test" stage in between build(upload) --> test --> package(fetch from build stage) , so I'd normally expect the artifacts to be fully ready to download by the time the package step runs, unless the test stage is incredibly fast.
  • if I've got it all wrong, probably need a fuller description of pipelines/stages/tasks that shows how they manifest :-/ There are many other reasons in pipeline design you could be fetching things wrongly outside the "bug" I refer to above :-)

You can see exactly which pipeline/stage it is fetching the artifacts from in the console, and thus check which versions should have been there. e.g
[go] Fetching artifact [dist/zip] from [installers/4437/dist/latest/dist]

- Chad

Joshua Franta

unread,
Dec 17, 2024, 4:27:46 AM12/17/24
to go...@googlegroups.com, Pracplay Support

chad thanks again for quick response.

TL;DR: I felt I didn't explain this very well and i don't think i've ever done this before but since the support on this forum is very good i recorded myself reading the key parts of this email : https://www.youtube.com/watch?v=m4C0o7u_Iow  
(ignore tl;dr if you find that pretentious or high maint or whatever, any help appreciated)

EMPTY ARTIFACTS

i don't think it's downloading an empty artifact.  typically when it can't find an artifact it will give a 404 error, but the stage will fail.
this happens way, way more rarely and it's almost always because the artifact max size setting got hit and it cleared some artifacts.
(i'm guessing this is why you are perhaps asking about size, we've carefully tuned our max artifact size so we probably haven't had  missing artifacts for almost a year. )
to answer your question our largest artifact store is about 80M, the smallest ones are around 3-4MB and the mean is probably 20-30M. the disk w/artifact store has about 700-800GB free tho, we don't clean artifacts until we get around 200-300GB free.  so more than enough to hold several previous revisions of every pipelines' artifacts.

the other reason i don't think this is the issue (also the same reason i don't have any reason to suspect it downloading old artifacts) is because
  1. we would get more failed tests (and to lesser extent, failed packages) because usually if a new test was added but run against old binaries this would cause the test failure.  this never happens
  2. also- the ultimate problem trying to fix here- we never have failed pipelines.  it's just occasionally a downstream will get packaged with an old upstream that fails on runtime (not in the pipeline, or at least never in these particular pipelines, which don't run code outside of tests)

UPSTREAM ARTIFACTS VS SAME-PIPELINE ARTIFACTS (w/variable upstream pipelines)

i think i didn't explain the scenario clearly enough that i think is happening

project2 depends on project1   
(project here is synonymous w/pipelines)

two agents:  agentA and agentB w/ this directory tree:
        agent-working/project1-working
        agent-working/project2-working

assume both agents have built both projects most recent revisions 
(iow both of their 'agent-working' directories are essentially identical)
agentA-project1-version=1
agentA-project2-version=1
agentB-project1-version=1
agentB-project2-version=1

then comes a new commit to project1 (commit-aka-version=2)

  1. project1 commit#2 build stage/job assigned to agent A by gocd, it builds and uploads it artifacts
  2. the rest of the stages complete by fetching the artifacts from commit#2
  3. project2 gets it's second commit, which gets assigned to agentB by gocd
  4. agentB builds it fine, but recall that agentB wasn't involved in project1-commit#2 and so it only has built project1-commit#1
  5. because project2 isn't a stage of project1, it can't fetch project1's build artifacts (unless you untemplate everything, or unless i can figure out how to templatize multiple upstream artifact fetches)
  6. so in this instance, project2 builds but it uses the builds from "../project1-working", which is commit#1
  7. this works in most cases and gets packaged up, but then fails in runtime because it's still got an old build from project1 mixed in with project2's second commit
i'm pretty sure this is whats happening, because when this happens if i go look at the versions of the dlls it's got the old ones.
that's also why the workaround of re-pulling and refetching the pipelines periodically works, tho this is messy/inefficient and done outside of gocd.

what i want to fix this is to add extra artifact fetches to project2.
if i can change project2 to fetch project1's artifacts always pre-build, it should work (this is essentially what the localized refetch-and-rebuild-everything hack does )
with templates tho, i have to parametize what projects is upstream (the "project2 depends on project1 build artifacts" relationship)
if it was just one upstream per pipeline, i know how to do this w/templates and parameters.

however for almost all the pipelines, there are MULTIPLE upstream pipelines that a given downstream pipeline needs to build.
and not just more than one, it's an unknown number (sometimes there are zero, 1, 2, and even one has 7-8)
how can i parameterize an unknown number of artifact fetches through a template?

if i could do this, then this is what i believe would happen in the above scenario:

  1. project1 commit#2 build stage/job assigned to agent A by gocd, it builds and uploads it artifacts
  2. the rest of the stages complete by fetching the artifacts from commit#2
  3. project2 gets it's second commit, which gets assigned to agentB by gocd
  4. agentB looks at it's 'upstream-pipeline-list' and sees it has to pull artifacts from 'project1' so it does this, and then gets the correct upstream version
  5. then everything builds and works.
is this possible?  

superficially it seems it's not, but i thought something similiar about having different resource requirements per-pipeline and you and some other people explained how to do it in the config.
not sure if it's the same but this seems like it should be possible, and maybe it's not clear through the gocd web pipeline editor.

other solutions i can think of are:

  • (least preferred) stop using pipeline templates, rebuild all pipelines as stand-alone and just put arbitrary #s of 'fetch artifacts' tasks on each pipeline (one for each upstream project as needed)
  • (still bad but better) hack something to mass rsync directories between agents
  • (more controlled but still bad and outside of gocd) using our hack script to rebuild pipelines and just automating it further to run on every agent periodically
  • (still very complicated but at least inside gocd) somehow scheduling/forcing gocd to periodically build all pipelines on all agents (eg so for any given project/pipeline, agentA and agentN all have identical trees)
  • (hacky but more deterministic and closer to "GOCD WAY") trying to put some fixed # of upstreams (upstream1,upstream2,upstream3, etc) into the template and see if it will properly ignore empty parameters
  • ("GOCD WAY" as i understand it) being able to somehow create a single parameter that holds multiple upstream pipelines and have gocd fetch them all before it builds/tests a downstream stage)











pracplay devs

unread,
Dec 17, 2024, 4:39:32 AM12/17/24
to go...@googlegroups.com, Pracplay Support

this is closer to the config option i think would solve it.  not sure if it can already do this:

PROJECT-TWO USING TEMPLATE: MULTI-UPSTREAM-PIPELINE-FETCH-ARTIFACTS
 -PARAMETER: fetch-upstream-list    // for example, fetch-upstream-list:project1,project0

MULTI-UPSTREAM-PIPELINE-FETCH-ARTIFACTS-TEMPLATE FOR
JOB:
  FETCH TARGET:  source:  #{fetch-upstream-list}/build  dest: artifacts/

// and then when project2's pipeline is started, it would fetch each artifacts from project1/build and project2/build to dump in project 2's artifacts

Jason Smyth

unread,
Dec 20, 2024, 4:58:58 PM12/20/24
to go-cd
Hi Josh,

First, I think you are on the right track as far as understanding the nature of the issue.

It seems like the process your builds currently follow assumes that there is always a "current enough" version of my dependencies present on the agent. Depending on a number of factors (e.g., number of dependencies, number of agents, frequency of builds in upstream vs downstream projects, etc.) this assumption seems to hold true for most of your builds, but proves false often enough that it is causing you grief.

You have also, in my opinion, identified the primary "GoCD way" of addressing the issue: downstream projects should pull the upstream artifacts from the GoCD artifact store using fetch tasks. The issue of "How can I make a template that runs an arbitrary number of fetch tasks?" is rooted in the primary reason we moved away from templates. We found the constraints too limiting.

In addition to the ideas you put forth, here are some others that might work for you:
  • Put X fetch tasks in the template, where X is the largest number of dependencies across all of your projects. For projects that have fewer than X upstream projects, use duplicate values for the extra parameters. (Caveat: I haven't tested this; GoCD may flat out reject it.)
  • Create X+1 templates, where X is the number of unique dependency counts across all of your pipelines (from 0 through the highest count). Assign each pipeline to the appropriate template based on its number of dependencies.
  • Build your own, customized version of the fetch task (as a Shell script, or whatever) that takes a list of 0 or more dependencies as input. Call that script from the template, passing in the appropriate dependencies for each project. The fetch task is essentially just an API call, so you should be able to replicate it without too much trouble. There is a risk here of potentially exposing secrets (the authentication keys required to call the APIs), so you would want to consider how to protect those.
  • Add a "publish" step to your workflow so that every project gets published to some centralized location. Have the agents pull their dependencies from there instead of the local "cache". Depending on how this is handled, there may be risks associated with either versioning or partially completed publications.
  • Add a "publish" step to your workflow that runs on every agent and downloads the latest version to a defined location on the local file system that is outside the agent working directory. This creates a lot of additional jobs for the agents to complete, though. And there is still the issue that a newly minted agent will need a way to get the latest version of everything before it runs its first build.
  • Move to elastic agents. Change the upstream project flows so that they push their updates into the elastic agent template(s) that the downstream projects depend on.
  • Move away from GoCD templates and use one of the config repository plugins. I have no experience with it personally, but I suspect the Groovy plugin may be able to generate a list of pipelines with an arbitrary number of fetch tasks, but that are otherwise identical.
  • Update the part of the existing process that assumes there is a "current enough" version. Change the assumption to "there is _a_ version", and do an explicit update and rebuild. This is probably the quickest and dirtiest solution. I don't recommend it long-term, but, depending on the impact of the current failure rate, it might be worth looking at until you can design and implement something better.
Hope this helps,
Jason Smyth


Joshua Franta

unread,
Dec 20, 2024, 5:39:53 PM12/20/24
to go...@googlegroups.com

jason,

thanks for your reply.

  • > Build your own, customized version of the fetch task (as a Shell script, or whatever) that takes a list of 0 or more dependencies as input. Call that script from the template, passing in the appropriate dependencies for each project. The fetch task is essentially just an API call, so you should be able to replicate it without too much trouble. There is a risk here of potentially exposing secrets (the authentication keys required to call the APIs), so you would want to consider how to protect those.

yes this also occurred to me after i wrote my email.
from the logs, i believe the artifacts fetches may be little more than just an http get themselves.
if this works it probably is the best solution.

relevant docs:



  • > Add a "publish" step to your workflow so that every project gets published to some centralized location. Have the agents pull their dependencies from there instead of the local "cache". Depending on how this is handled, there may be risks associated with either versioning or partially completed publications.

our 'package' stage in this template already does this, so might be an option.

> Add a "publish" step to your workflow that runs on every agent and downloads the latest version to a defined location on the local file system that is outside the agent working directory. This creates a lot of additional jobs for the agents to complete, though. And there is still the issue that a newly minted agent will need a way to get the latest version of everything before it runs its first build.

this is sort of what our workaround/hack does, it just forces all the agents to refresh/rebuild everything current.

--

i also want to have somebody look at the source, both to see how it's interpreting fetch parameters in the config file (resources can deal with multiple values in the same config, so perhaps fetch tasks also does this) and because I'd imagine it's not super hard to allow it to take multiple fetches from the same parameter.  this is probably the least-amount-of-code way, but it would require getting PR approved and then upgrading to latest gocd.

the quick and dirty way w/least-code-and-least-testing-required is probably what i originally suggested and which you also suggested, adding N-fetch parameters to template. but again this depends on it being able to ignore if the parameter is undefined.  if that is the case, it might work for our specific configuration because i don't think to many single pipelines have more than 3/4 immediate upstream dependencies.

I'm still a little bit surprised I guess that this isn't cleanly handled by templates, because this seems like something everyone would need to be able to do and that the "normal" way seems a bit janky is a bit out of character for gocd.  however perhaps it was assumed that if you are packaging every pipeline then you could just restore from your own package (tho this is all external to gocd, whereas fetch seems like it should be able to take it's arguments from pipeline template parameters... but nothing's perfect either).

i will try to remember to report back what we find out


pracplay devs

unread,
Dec 21, 2024, 7:09:44 AM12/21/24
to go...@googlegroups.com, sup...@pracplay.com

Probably even more against the "gocd-way", but the reverse might also work.
Rather than fetch arbitrary lists of artifacts in one spot, start from the end list of complete dependencies.
Upstream pipelines can all target a much smaller number of output pipelines.
Then each output pipeline fetches it's own dependencies in one shot.

eg:

pipeline-upstream-1: saves artifacts to api1 and monolith as parameters, iow:
  • paramers.upstream1=api1
  • paramers.upstream2=monolith
pipeline-upstream-2:
  • paramers.upstream1=api2
pipeline-upstream-3: saves artifacts to api2 and monolith as params
  • paramers.upstream1=api2
  • template.paramers.upstream2=monolith
pipeline:api1/2:
  • parameters.downstream-fetch1: #{pipeline-name}  ## can't remember if official is #{GOCD_PIPELINE_NAME}
  • template.job: fetch #{downstream-fetch1}
pipeline:monolith: (same)
  • parameters.downstream-fetch1: #{pipeline-name}  ## can't remember if official is #{GOCD_PIPELINE_NAME}
  • template.job: fetch #{downstream-fetch1}




pracplay devs

unread,
Dec 21, 2024, 7:22:29 AM12/21/24
to go...@googlegroups.com, sup...@pracplay.com

upside for this idea:
  • very simple
  • builds on what gocd already does well
possible downsides:
  • does it sometimes make the problem worse? bc won't guarantee anything for upstreams having correct builds, you're just trusting whatever is uploaded. or more specifically, you're trusting gocd's chain of green pipeline operations.  if the sequence was always correct/green, i think it should work?
  • might be gocd-code-golf: less configuration but depends on a deep understanding of gocd

Jason Smyth

unread,
Dec 21, 2024, 6:29:38 PM12/21/24
to go-cd
Hi Josh,

Assuming your GoCD configuration already handles the up/downstream relationship between projects (i.e., Pipeline2 depends on Pipeline1, so Pipeline1 is included in Pipeline2's material list), I agree with your statement that a customized fetch task is probably the best solution. I think this cuts directly to the heart of the original question: "Is there a way i can somehow create a 'upstream-pipeline-list' parameter, have each pipeline list their upstreams in CSV fasion, and then have gocd fetch EACH of these upstream pipeline builds prior to actually building the stage?"

This solution allows GoCD to continue to handle all of the things it does well, while addressing an apparent incongruency in its template implementation. Namely, I can assign an arbitrary number of upstream pipeline materials to a pipeline that is based on a template, but I cannot adjust the number of fetch tasks to align with the number of upstream pipelines.

I'm not sure I follow the reversal idea from pracplay devs (also Josh?), but I think it can be summarized as "Instead of having a task in a child pipeline that pulls from an arbitrary number of parents, have the parent pipelines push to an arbitrary number of children." If that is the case, it is not a model I would recommend. If the team responsible for App1 decides to switch from Lib1 to Lib2, it should be the App1 pipeline's responsibility to change, pulling in the new dependency in place of the old one. If the dependency tracking model is reversed, when App1 decides to change from Lib1 to Lib2, then Lib1 and Lib2 _both_ have to update their pipelines to account for the change.

Of course, all of this is just my opinion. You have a better understanding of the realities of your organization and will need to pick the solution that works best for you and your team.

Hope this helps,
Jason Smyth

P.S.: You wrote "resources can deal with multiple values in the same config". I played briefly with this concept but was never able to get it to work. Would you be willing to share an example of specifying multiple resources in a single pipeline parameter?

Cheers,
JS

Joshua Franta

unread,
Dec 26, 2024, 2:01:33 AM12/26/24
to go...@googlegroups.com
thanks again for your efforts jason

TL;DR

for those interested, all the stuff about resources i referenced should be in the forum search

i really do like/love gocd.  bc of templates and pipelines tho, i think being able to do fetches for multiple upstreams/downstreams via parameters should be more "native"
my .02 ymmv (my PR is in the mail, the actual mail not the forum mail ;)


Chad Wilson

unread,
Dec 26, 2024, 2:38:56 AM12/26/24
to go...@googlegroups.com
I haven't gone through the thread as it became a bit difficult/time consuming for me to digest, however if I understand the general "gist" of your challenge it's fair to say that the philosophy of GoCD was moving increasingly towards pipelines-as-code for a number of years, rather than templates (which were always going to have limitations).

There's a related discussion at https://github.com/gocd/gocd/issues/5675 which goes into some of why templates-inside-config-repos are not supported - in a sense the pipelines-as-code philosophy via config repo plugins was intended to support much more sophisticated approaches to templating and be the "native" approach that allows necessary flexibility.

This was manifested along the lines of the groovy plugin earlier mentioned, or the jsonnet plugin (note, I have never used the latter, and do not know of its pros/cons other than from a theoretical perspective). The main design goal was to move people away from click-opsing pipelines or groups-of-pipelines/VSMs entirely, which templates implied.

Defining ones own custom templating approach (like gocd templates/parameters) is inherently always going to be more limited than a general purpose templating/scripting language which allows assembling the "pieces" (GoCD domain concepts) in a variety of ways, and allows modelling the "shared/common components" of pipelines or sets-of-pipelines dynamically.

-Chad

--
You received this message because you are subscribed to the Google Groups "go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+un...@googlegroups.com.

Joshua Franta

unread,
Dec 26, 2024, 2:55:07 AM12/26/24
to go...@googlegroups.com

yes, whatever im missing about artifacts is very in the weeds. 
irrespective of the audience, for oss i usually write posts as if no one will ever respond.

i have written few posts here but i get a response more than i would generally expect fwiw.

i looked at groovy briefly but i'm not sure i understand, it's a different syntax that is more compact?
and can it then autogenerate the existing gocd configuration? so then you don't need templates?

if i just generated a config file programmatically, would this also solve the same problems (and probably quite a few others)??
is groovy is a dsl for gocd configs?







You received this message because you are subscribed to a topic in the Google Groups "go-cd" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/go-cd/7HMOd1Z_3oM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to go-cd+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/go-cd/CAA1RwH_d4c9bEL7bXMvP_FAGFv8Abe%2B6h%3D4D2AFyU8%3DeJotmPA%40mail.gmail.com.

Chad Wilson

unread,
Dec 26, 2024, 3:33:48 AM12/26/24
to go...@googlegroups.com
Groovy is a generic JVM-based scripting language. In the GoCD case, a groovy-based DSL has been modelled by the plugin that maps to GoCD's internal config repo API required to drive pipeline config from source control.

Since it's a generic scripting language, you can express things not possible within something like YAML or JSON (re-usable entire groups of pipelines or parts of pipelines, loops, arbitrary parameterisation, loops, ability to get some code completion inside a groovy-aware IDE). The groovy plugin specifically has some downsides related to security since plugins are not sandboxed within GoCD, and since it is an arbitrary scripting language - as well as the downsides to the Groovy language itself, or needing to learn another language.

There are a number of examples to give you an idea of what might be possible, e.g one which generates a single pipeline with 'n' jobs: https://github.com/gocd-contrib/gocd-groovy-dsl-config-plugin/blob/master/example/src/main/groovy/build_matrix/build.gocd.groovy

So essentially a plugin like the groovy config repo plugin allows an approach to generating pipelines programmatically with 
  • a more manageable, decentralised model than generating GoCDs internal XML format programmatically with better support for validation of the generated pipelines before GoCD ingests the config
  • avoiding needing to make GoCD API calls to update pipeline or GoCD configuration XML (as was a common approach for some folks earlier that wanted to programmatically generate/manage pipelines)
-Chad

Joshua Franta

unread,
Dec 26, 2024, 3:56:44 AM12/26/24
to go...@googlegroups.com

DSL-friendly is a great thing for so many use cases, 100.

#ocaml #fsharp #haskell

Josh

unread,
Jan 30, 2025, 5:41:44 AMJan 30
to go-cd

TL;DR follow-up post: I wasn't missing anything about artifacts, there's no way to process multiple pipelines to the same 'fetch' atm.

Of all the solutions I proposed (as well as other ideas from chad/jason, thx again ;).. the simplest was just to integrate our existing pkg mechanism tighter and perform this function.
As mentioned our last pipeline template stage was already packaging external artifacts that were more easily attainable (the next best option would've been to fetch artifacts directly from gocd, but this would have required re-creating things in our existing packaging tool).

Our existing pkg solution (cdpk) is not open source, but it's a minified run-anywhere clone of alpine apk, with less features and no binary lock-in (it's posix, so you can run on many os-s w/out modification).


We modified the build stage to check a pre-existing template parameter- lets call it 'upstream-pipelines'- that simply uses cdpk to do it's fetch business, extract locally.
Then put these files into an pipelines/<PIPELINE>/upstream/ folder.
Then modify the build process to be able to find it's dependencies in this location.

This required only modifying and testing one template stage's script, it took less than an hour to test.
About another hour or two to deploy to all pipelines (as they needed their builds modified slightly).

Overall it just worked.
In the first week the failure rate of the test pipelines about halved.

As I mentioned this solved a rarely occuring problem, but as our deployment frequency increased it seemed like it was occuring more.
Plus now we have access to more of our packaging tooling directly in gocd, which almost certainly will come in handy.

Thanks for all the assists everyone!

ps always give thanks to chad for his gocd leadership

Jason Smyth

unread,
Jan 30, 2025, 3:30:38 PMJan 30
to go-cd
Hi Josh,

Thank you for taking the time to share your experience working with GoCD and the solution that ultimately worked for you.

I, for one, appreciate seeing these loops get closed.

Cheers,
Jason
Reply all
Reply to author
Forward
0 new messages