Keep previous build dependencies

637 views
Skip to first unread message

Hugh Acland

unread,
May 23, 2017, 5:42:44 PM5/23/17
to go-cd
Hi

I am building a Node app in a pipeline using "npm install". This fetches all dependencies listed in the package.json file and stores them relative to the project base.

Is there a way to clean a pipeline directory and re-use so that the build job does not have to run npm install each time?

I guess one way would be to have a previous pipeline do the "npm install" and only run it when the package.json is changed, and then have the main pipeline just copy across the whole directory containing dependencies. 

Is there a more canonical way to deal with dependencies? NB, I guess this also goes for other languages such as JAR files etc. 

Many thanks
regards
Hugh

Ashwanth Kumar

unread,
May 23, 2017, 11:00:53 PM5/23/17
to go...@googlegroups.com
Today GoCD doesn't know about build dependencies being within the project root or not. Some ways we can workaround the problem I can think of
  1. For an existing project we can expose the entire .node_modules/ folder as an artifact and fetch it in the same pipeline of a different run. This is very similar to what you've also suggested. The disadvantage with this approach is that you'll end consuming too much disk space for artifacts especially if you've a lot of dependencies which might be major cause of this question in the first place. - NOT A RECOMMENDED APPROACH. 

  2. Today unlike other CI tools out there, GoCD doesn't have a notion of CACHE directory for the build pipelines. Now, if you can maintain a directory (or a separate partition say /data/go-cache) and create a symlink to /data/go-cache/<pipeline>/<stage>/<job> in the current working directory for each pipeline run. You will have your artifacts cached across runs, and you can clear them anytime since the cache would be re-build from scratch again next time the build runs. The major advantage of this approach is that you'll always have 1 copy of the dependencies that get's reused across multiple runs.

    This can be automated using an SCM plugin, I guess. Any takers? :)

    PS: This assumes that all go-agents run on the same machine. If they're not, you'll end up creating a CACHE directory for every VM / bare metal that runs your go-agents. Which may / may not be ideal depending on your infrastructure setup. There're some consequences you should be aware of if you're running go-agents in a container. 
I guess the same analogy can be applied to any language and their respective type of artifacts. 

--
You received this message because you are subscribed to the Google Groups "go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Ashwanth Kumar / ashwanthkumar.in

Fredrik Wendt

unread,
May 25, 2017, 1:23:23 PM5/25/17
to go-cd, ashwan...@googlemail.com
Hi!

1. This is a fine approach, but you may want to store your artifacts outside of GoCD server. I use Docker with most of my clients, and having a baseline of modules/jars/etc that many projects use, will allow you to only need to add a few each case. YMMV of course, and may want to rebuild these baselines every day or week.

2. A CACHE directory for build pipelines may be A VERY BAD IDEA. This of course depend on what you aim to achieve, but for your builds to be as repeatable (idempotent) as possible, and avoid producing snowflakes builds (builds that cannot be reproduced, or no-one knows how it ended up the way it did), you want to have an as clean slate as possible for each build. You don't want to depend (ie re-use artifacts from some random cache) on what a previous run of the same pipeline did. Using a CACHE, violates this principle.
Jenkins, out of the box, when using Maven for instance, violates this happily.
Again, that may be of course be valid depending on your specific context and requirements of your CI. As a rule of thumb for Continuous Integration - I recommend you think twice before violating this principle.
If you're into Continuous Delivery/Deployment and think this is a principle you can break, I'd say you're unprofessional about your work.

3. If you want to fetch dependencies (which are not GoCD material), and you think it's too slow, there are several options available to you. I've seen Maven builds drop from minutes to < 15 seconds, simply by putting Nexus on a (1000 USD) machine with fast disk and dedicated gigabit network interfaces.

4. I think there are some great consequences from running your gocd agents in containers, which are thrown away every 24 hours. Primarily these I think:

* you'll learn how to make every build quick (including the first build of the day/on a new agent), where in your pipelines to logically "cache" things - ie which step, how to "cache" it (GoCD artifacts, Nexus, yum repos, Docker images, tar.gz artifacts with pre-packaged dependencies updated only once a day and simply unpacked on builds, ...)
* you're less likely to create snowflake agents (related concepts: phoenix servers, "cattle vs pets") 

What ever you end up with - please share your solution and what you learned (and think about it), so we all can learn! :-)

/ Fredrik
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+un...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages