cleaning up GO artifacts

1,656 views
Skip to first unread message

Jason D

unread,
Apr 29, 2015, 5:20:20 PM4/29/15
to go...@googlegroups.com
Due to problems the default artifact cleanup algorithm of GO can potentially cause, we have decided to turn it off and write our own.

Our simple script currently just cycles through the artifacts and deletes all but the latest xx (configurable) for every pipeline.  Very simple place to start and it works well.

We'd now like to add a bit more sophistication which brings me to the questions I have for you all.

Is there a programatic way (via the GO api, I assume) to tell if a particular artifact set has been successful or not, given only the info available from the file system for artifacts?  In the next iteration we'd like to tighten it up a bit and save, for instance, the last xx artifact sets that were the result of a pipeline that completed successfully, or similar.

If others have ideas on how you have managed this issue, I'd be grateful for your feedback.

Thanks.
jason

Carl Reid

unread,
May 1, 2015, 12:08:36 PM5/1/15
to go...@googlegroups.com
I can't help with your question sorry however I am interested in what you have done in terms of writing your own artifact clean-up routine.

We are finding that we have to constantkly grow our disk space requirements on the GO Server because of the opacity of what artifacts currently exist and the lack of control over artifact growth.

I would be interested to know how your routine works and whether you can share it with the community?

Thanks

carl

Jason D

unread,
May 11, 2015, 8:57:57 AM5/11/15
to go...@googlegroups.com
We use GO to run a powershell script for a once-a-week cleanup of the artifact dir.  Attached is the script, nothing fancy.  What I would like to do is somehow extend this to use the APIs and determine if a pipeline finished to success/fail, has a dependency...
 
We currently have the number to keep set to about 300 to be safe.  May eventually work it down to 100 or so.  The thing to be aware of is there may be some pipelines that haven't been run in a while and still have a reference to an artifact set that is really old.  This is why we started high and are working our way down a bit.
 
If you or others have any improvements to offer here I'd appreciate it.
CleanupGoArtifacts.ps1

Carl Reid

unread,
May 11, 2015, 12:09:24 PM5/11/15
to go...@googlegroups.com
I have had a brief look through the script and for its intended purpose (i.e. removing x number of items starting from the eldest from the artifact store) it looks fine.

One question, how does this differ from the default behaviour of the GO artifact clean-up code?

In terms of your question, how would you know which runs of the pipeline have been successful you can do that with the stage history and job history APIs.

I use these APIs to determine whether a stage has stuck jobs (jobs that are assigned to agents that are not responding) and it works well.

The JSON that comes back from the job api shows the state of the job run as to whether it has passed or not.

The JSON also contains a pipeline_counter field and a stage_counter field which correlate with the names of the directories under the artifact directories.
This should help you determine whether the jobs within the stage that relates to the folder name was successful or not and therefore should be deleted or not.

You can easily access the API from Powershell using Invoke-WebRequest.

Hope that helps

carl

Jason D

unread,
May 12, 2015, 11:59:44 AM5/12/15
to go...@googlegroups.com
The GO algorithm, afaik, does not take into account an individual pipeline group, it simply starts deleting the oldest artifacts regardless of if there is only 1 there for a pipeline. This has caused several of our deployments to fail which you don't realize until you try to do so. The script we are using will keep a set number for each pipeline. It could certainly be made better with some of the suggestions you gave. We'll take another pass at it, time permitting. For now, the simple script is adequate for our purposes. Contributions welcome =)

Jason D

unread,
Aug 7, 2015, 11:52:59 AM8/7/15
to go-cd
We did the work here to make this much more sophisticated.  It ended up being a C# app though so we could have a more robust set of unit tests.  In the end we, essentially, did the following:
- relied on an existing app's parsing and dump of GO's API.  This gave us API data in a SQL Server instance that was easily queryable.  Would be nice, in a future version to just rely on the API's directly
- using said data, queried for latest xx pipeline instances of all pipelines deployed successfully to an environment - added these to the "whitelist"
- followed the material dependencies all the way back to the farthest upstream pipeline - added these to the "whitelist"
- all pipeline artifact sets not added to the whitelist were deleted

In theory, this should leave all artifact sets in an entire pipeline workflow for the latest xx.  There were a few more safety net rules but this is the gist of it.  This has been run consistently for a few months with no issues, that we know of, to date.  The first run cleaned up about 50% of the artifacts, which for us was about 250GB, and gave us a lot of breathing room.  An app was necessary as it was getting challenging in time and complexity to determine what artifacts could be safely deleted.

Carl Reid

unread,
Aug 7, 2015, 4:44:49 PM8/7/15
to go-cd

This sounds extremely useful. We are constantly out of artifact disk space and see quite a performance hit as Go cleans up old files as well as the issues previously discussed of pipelines not being able to complete due to missing items.

Is there any possibility of sharing the source code of the application Jason? We could definitely make use of something similar.

Thanks

Carl


--
You received this message because you are subscribed to a topic in the Google Groups "go-cd" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/go-cd/HfOY_74OKhI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to go-cd+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jason D

unread,
Aug 13, 2015, 8:48:36 AM8/13/15
to go-cd
Possibly.  Let's chat offline.  Email me - jd...@rwbaird.com

Rinat Shagisultanov

unread,
Aug 25, 2015, 2:34:52 PM8/25/15
to go-cd
Having a traceable/dependency logic is very useful. At this point we are using the PS script pretty similar to the posted one. Any chance the C# app code can be shared (private git or else is good)

Jason D

unread,
Aug 26, 2015, 12:03:44 PM8/26/15
to go-cd
I'm working on that but have to get approval from some internal folks before doing so.  If I can, I will.  More to come.


On Wednesday, April 29, 2015 at 4:20:20 PM UTC-5, Jason D wrote:

Fredrik Wendt

unread,
Nov 4, 2015, 11:39:50 AM11/4/15
to go-cd
I would be very interested in this too. We will either have to write our own same thing, or try to fork yours.

As there's very limited meta data available in GoCD, we're thinking of how to best mark which pipelines to care about - ie to start the algorithm from.

/ Fredrik

Aravind SV

unread,
Nov 4, 2015, 11:50:19 AM11/4/15
to go...@googlegroups.com
As you think about it, can you update this post about what kind of metadata would be useful to you? I want GoCD to make that available, and having a starting list will be useful.

--
You received this message because you are subscribed to the Google Groups "go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+un...@googlegroups.com.

Fredrik Wendt

unread,
Nov 4, 2015, 11:52:22 AM11/4/15
to go...@googlegroups.com
(Sorry for spamming, I accidentally hit Send.)

Our use case is this: we collect a set of RPM files, put those in a folder and run create-repo to create YUM channel meta data. Downstream pipelines can then pick up a URL written to a file (go artifact: echo "${GO_SERVER_URL}files/${GO_PIPELINE_NAME}/${GO_PIPELINE_COUNTER}/${GO_STAGE_NAME}/${GO_STAGE_COUNTER}/${GO_JOB_NAME}/yum/" > yum-channel.txt) and the use the gocd server as a yum channel for simple installations etc. Convenient and also very much what will happen for real installations - we can re-use the same deployment mechanism through out.

These folders with RPM files are somewhere around 1 GB and as we trigger this build many many times per day, we're running out of disk very fast. We only need to save a few builds, all trackable by looking at (manual) promotion steps. One idea is to mark these promotion jobs with a specific resource (keep_artifacts) and use that as a source.

Would love to hear what others use to "mark" or start their algorithms.

/ Fredrik

--
You received this message because you are subscribed to the Google Groups "go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
+46 702 778511

Jason D

unread,
Nov 5, 2015, 2:30:14 PM11/5/15
to go-cd
My earlier response gave a high level of what our "algorithm" is when determinging what to delete or not.  The data we use though took a lot of work to parse and consolidate from the GO API's though.  The parsing was done based on early versions of GO, however (12.x ish).  It may be that there are API's that would simplify this approach.  I'm trying to publicize what we did but am somewhat stalled working through our legal & risk departments.  If I can I will.

Very simply, if there are/were API's that give the following it would make it much easier to work through this process:
  • list of all pipelines currently in xxxx environment (e.g. prod)
    • note: we group our pipelines by our environments (dev, sit, uat, prod)
  • list the last xx triggers of xxxx pipeline 
  • list of all pipelines upstream from xxxx pipeline
    • note: this would like produce a "tree" of dependencies
With all these we are looking for the revision number that corresponds to the artifact folder GO stores these in.  We use this type of info to build a list of what *not* to delete along with some additional factors.  Basically, we do not delete the latest xx pipelines or any upstream dependencies of that instance.  Most everything else is deleted.

Ideally, this would simply be an option in GO as it is w/pretty much every product that has an artifact repository.  That would greatly simplify an issue most anyone will run into eventually.  At the very least there should be a tool that we can run on a scheduled basis that could clean it up after the fact.

Again, I'll put it out there if I can but it is a slow slog through our company to get stuff like that to happen.



On Wednesday, April 29, 2015 at 4:20:20 PM UTC-5, Jason D wrote:

Fredrik Wendt

unread,
Nov 7, 2015, 6:35:36 PM11/7/15
to go...@googlegroups.com
Thanks.
We would start by looking at pipelines with a specific resource and whitlist that and all upstream. Everything else would be deleted, except logs.
Difference: we wouldn't care for "xx latest runs" - we would keep all builds that have been released to clients until they're no longer supported (after which we'd only keep logs, but not binaries).
/ Fredrik

--
You received this message because you are subscribed to the Google Groups "go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
+46 702 778511

Ashwanth Kumar

unread,
Nov 13, 2015, 5:38:10 AM11/13/15
to go...@googlegroups.com
I took Jason's approach and wrote a Java application to clean up artifacts on GoCD. The application was built and tested against 15.1.0, it should also work against the newer versions. 



Ashwanth Kumar / ashwanthkumar.in

Ketan Padegaonkar

unread,
Nov 13, 2015, 5:53:27 AM11/13/15
to go...@googlegroups.com
This is neat.

Would you mind submitting a PR to get it listed under a new "tools" section over at http://www.go.cd/community/plugins.html so that it's more discoverable?

Fredrik Wendt

unread,
Nov 13, 2015, 10:07:33 AM11/13/15
to go...@googlegroups.com
Looks great.
We'll probably clone this after christmas (some year) and add support for "keep all upstream pipelines' artifacts".
Would be great if GoCD server had an API call where you could say "wipe out all artifacts (except log) for: [($pipeline $run), ...]
Another option would be an API to add meta data to a run, saying "don't remove this run's artifacts, ever", which GoCD would then respect (just like "Keep artifacts forever" toggle on pipeline level).
/ Fredrik

Gerd Katzenbeisser

unread,
Feb 8, 2016, 3:36:19 AM2/8/16
to go-cd
We are also interested in this feature. This is probably the related issue on github: https://github.com/gocd/gocd/issues/410

Ashwanth Kumar

unread,
Feb 8, 2016, 9:51:47 AM2/8/16
to go...@googlegroups.com
Gerd, while I agree native support in GoCD would be the best. Let me know if there's any feature request on gocd-janitor that might be useful meanwhile. 

Leo Keuken

unread,
Nov 22, 2016, 12:08:18 PM11/22/16
to go-cd
We were facing pretty much facing the same issue everyone has come across by now I think: constant alerting on full disk on the vm where the go-server lives. Though there is a lot to say for project management of stuff that is being run inside the pipelines and what it saves onto the server in a typical build (e.g. massive test reports from certain suites), all in all it will fill up slowly as no build is ever discarded. With many projects a real headache!

Before looking on groups or the www extensively I just went to work and re-invent the wheel (of course) and I wrote a small shell script to get rid of the old builds that usually are obsolete in a fast progressing environment. I've put it on github: https://github.com/LeoK80/go-server-cleanup . Feel free to fork it and make fit for your own purposes or just use it out-of-the-box. Readme will tell you all you need to know about its use.

Defaults:
- go-server artifacts directory '/var/lib/go-server/artifacts/pipelines'
- keep anything younger than 180 days
- never delete from a pipeline when there is 15 or less builds present
There is parameter input provided for retention & minimum of builds to keep

Ashwanth Kumar

unread,
Nov 22, 2016, 1:20:56 PM11/22/16
to go...@googlegroups.com
Leo, I took a quick glance at the repo, it's simple and seems to do it's job. Thanks for the contribution. 

One feedback though, the approach of deleting anything older than 180 / 100 days would work fine if you're doing just CI builds. Cases when you're taking artifacts to various pipelines using Go's Artifact dependency - I would recommend to give gocd-janitor a shot. It looks at all the dependences of each pipeline version and maintains the respective versions accordingly.

I've personally been part of various solutions to this problem over the past 4+ years and gocd-janitor seems to be the best that works out for us so far. 


--
You received this message because you are subscribed to the Google Groups "go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Jay

unread,
Nov 22, 2016, 2:19:41 PM11/22/16
to go-cd, ashwan...@googlemail.com
We recently developed an internal script to clean and report (html email) any pipeline instances that were cleaned, and how much space was recovered. The gocd-janitor utility was a nice reference for what we wanted. We ended up creating a python script that only removes "failed" builds, and keeps the console.log files around. For us, the logs are still valuable to look at and do not take up a considerable amount of space. We also had a more specific definition of what a "failed" build was. There was also one very obvious space saver that we noticed. Removing "orphaned" pipeline artifact folders. This might not occur very frequently, but if you do remove a pipeline config at any point, you will also want to remove the pipeline folder on disk, otherwise it just sits around with not references.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+un...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Leo Keuken

unread,
Nov 22, 2016, 3:43:04 PM11/22/16
to go...@googlegroups.com
Hi Ashwanth, thanks for your feedback. Will most definitely have a good look at the gocd-janitor and take it for some test-spins :)

--
You received this message because you are subscribed to a topic in the Google Groups "go-cd" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/go-cd/HfOY_74OKhI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to go-cd+unsubscribe@googlegroups.com.

Carl Reid

unread,
Nov 24, 2016, 12:12:48 PM11/24/16
to go-cd, ashwan...@googlemail.com
This sounds very useful - can you share it with the community?
Reply all
Reply to author
Forward
0 new messages