How exactly does stash / unstash work AND how to use the same node in different stages

13,898 views
Skip to first unread message

Christian Ditscher

unread,
Jun 30, 2016, 12:03:19 PM6/30/16
to Jenkins Users
Hello,

I was wondering how exactly the stash unstash commands work.  When are the files transferred?
  • stash transfers selected files to master and unstash loads them from master to a slave?
  • stash attaches a label to the files which are transferred between the nodes when unstashed?
  • something else?


Also I am asking myself how I can minimize the be transferred data during the build.

One Idea I have is the following (example): I have 10 nodes which should be used to build and run tests and some other nodes to do something else. To minimize the number of files to be transferred I'd like to check out the sources only once.

  • checkout and build on one of the 10 nodes.
  • do something with the build result on another node (stash -> unstash).
  • do something with the sources already on the node from before (without checking out again)

If another build is started during the first is still running any of the remaining nodes is used for the next build.

How could I achieve that? Is this procedure useful?


If I don't use the procedure described above I think have to stash all the sources and then unstash when I need them again. If checked out from source control again the sources already might already have changed.  --> Is this correct?


Thanks!
Chris
















Baptiste Mathus

unread,
Jun 30, 2016, 12:16:29 PM6/30/16
to jenkins...@googlegroups.com

For your second question, is that somehow what you have in mind:
https://github.com/batmat/jez-jobs/blob/47d63519b5232d6cd0c57e149b6cd57032c0d9a0/resources/naive-parallel.groovy#L7L11

Here I "prebuild" the sources, triggering also the download of the dependencies before stashing it to unstash them in parallel afterwards.

My 2 cents

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/0cbb89ee-8e95-47e7-8eac-58a424527b59%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Björn Pedersen

unread,
Jul 1, 2016, 4:05:35 AM7/1/16
to Jenkins Users

Christian Ditscher

unread,
Jul 4, 2016, 8:35:50 AM7/4/16
to Jenkins Users
Hi,

@Baptiste: Your suggestion sound more me like my last suggestion. Checking out, stashing and then unstashing again. (This could be a solution if it were clear how exactly the stash/unstash works. --> What is transferred? Where are stashed files located/saved?) In general I would like to reduce the a mount of data that has to be transferred over network connections to a minimum. Therefore I would prefer to only transfer the needed files only once and then use them again. (Something like Björn suggested).


@ Björn: the External Workspace Manager plugin sounds really interesting. This would be a great thing to reduce data transfer. Sadly it is not a "proven" plugin. Using it in my application is therefore not possible at this moment.  I will have to wait some time until it is stable.

Stephen Connolly

unread,
Jul 4, 2016, 11:50:15 AM7/4/16
to jenkins...@googlegroups.com
I haven't looked into this closely... it being mostly Jesse's baby... but:

The point of using stash and unstash is to enable Jenkins to be smarter about the files. If Jenkins knows you are stashing and unstashing then Jenkins can be smarter about what it does... e.g. using local copies where that is possible, etc.

You also should pay attention to parallel builds and interleaved scheduling... which could render an external workspace into an invalid state.

So I would encourage you to just use stash and unstash and file concrete bugs or RFEs against the current implementations (which my guess is that they are the naïve implementation) outlining how less naïve implementations could work.

If you build yourself a layer of hacks using external workspaces then:
  1. You will not be able to switch as easily to stash/unstash when they are improved
  2. You will not be a data point towards improving them
Now I can understand if you have pressing needs that force you to make a choice away from using stash/unstash (such as - for example - you are creating ISO images of DVDs)

Most people should typically not be falling into that use case, and the trade-off of concurrent build safety vs speed increase by zero-copy is likely not worth it... after all building the right thing slowly will always trump building the wrong thing fast.

So, if it were me... I would use stash/unstash for now (unless I was dealing with 1gb+ of stuff)... I would concentrate on ensuring that I stash / unstash only that which is needed (as it makes the pipeline better... not just the stash/unstash faster)... I would file RFEs for some improvements around stash/unstash:
  • stash/unstash implementation that reuses the file in-place if it has the same checksum as the stashed version
  • stash/unstash implementation that copies from local disk if a local workspace has the same checksum as the stashed version
  • unstash implementation that creates read-only links rather than copy
  • node implementation that allows for affinity to the stash
  • etc
-Stephen

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-use...@googlegroups.com.

Vincent Brouillet

unread,
Nov 21, 2016, 6:47:09 PM11/21/16
to Jenkins Users
Stephen,

We build a docker image in the first build stage and use it to spin up a temporary environment for automated testing with docker compose on jenkins nodes.
We rely on docker save and stash:
  • docker save --output="./my-image.tar" image_name'
  • stash includes: 'my-image.tar', name: 'project-image'
On the nodes (6 of them in parallel) we unstash. the docker image is 1.9GB. It takes 4 min to unstash.
We thought of using a private docker repo as an artifact manager, but that is one more dependency. A docker repo is faster than stash though.

What are your thoughts on passing docker images to nodes?

Baptiste Mathus

unread,
Nov 23, 2016, 4:33:00 PM11/23/16
to jenkins...@googlegroups.com
Hi Vincent,

Well, I think Stephen already answered, and I guess you didn't file a RFE like he was talking about? 
Passing Docker image to work around stash speed is a hack, since you're forced to change your pipeline script when in the logical sense of it you want to stash/unstash some build data.

But, here I'm not even sure what you're doing. If you're building a Docker image of an image targeting customers later or something, which then you're going to test somewhere else, then you want a Docker registry IMO. Because docker save + stash/unstash is a waste compared to pushing/pulling only the modified layers of a Docker image you're building. And if you write your Dockerfile in an interesting oriented way for example, the data need to be transfered could go down from hundreds of MB, if not only keeping a few MB of changes to be pushed around each time...

But to answer all that, we probably need to know more about your use case and setup.

Cheers

To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/0b739be7-9b74-4382-8554-eb1cd0f8bbb2%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages