Efficiently copying artifacts

394 views
Skip to first unread message

Simon Richter

unread,
Apr 24, 2015, 8:02:52 PM4/24/15
to jenkins...@googlegroups.com
Hi,

I have a project that outputs a few large files (compiled DLL and static
library) as well as a few hundred header files as artifacts for use by
the next project in the dependency chain. Copying these in and out of
workspaces takes quite a long time, and the network link is not even
near capacity, so presumably handling of multiple small files is not
really efficient.

Can this be optimized somehow, e.g. by packing and unpacking the files
for transfer? Manual inspection of artifacts is secondary, I think.

Simon

signature.asc

Matthew...@diamond.ac.uk

unread,
Apr 27, 2015, 4:27:43 AM4/27/15
to jenkins...@googlegroups.com
Are you using "Archive Artifacts" in the upstream job, and the "Copy Artifact" plugin in the downstream job? This is the standard method.
If so, maybe the upstream job should produce a single zip file , which the downstream job and get and unzip.
Matthew

--
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

Maciej Jaros

unread,
Apr 27, 2015, 4:40:09 AM4/27/15
to jenkins...@googlegroups.com
Simon Richter (2015-04-25 02:02):
If some of the files remain unchanged then it can be done more
efficently when you NOT pack the files. You could for example create a
respository (SVN) for artifacts and instead of copying all files you
would simple run `svn update` and get only changed files. Another option
would be using rsync for synchronisation but that might not work as good
as SVN would.

Regards,
Nux.

Matthew...@diamond.ac.uk

unread,
Apr 27, 2015, 4:43:16 AM4/27/15
to jenkins...@googlegroups.com
Note that in Jenkins, copying files directly from another workspace is an anti-pattern.

--

Matt Stave

unread,
Apr 29, 2015, 10:02:09 AM4/29/15
to jenkins...@googlegroups.com, Matthew...@diamond.ac.uk
I found that using that standard method is quite slow compared to scp.   So I use that method to copy just a few small files, and one with GUIDs for fingerprinting, and for the big ones I do something like

scp -v ${WORKSPACE}/bigfile.tar.gz user@jenkins_host_name:path_to_jenkins_root/jobs/${JOB_NAME}/builds/${BUILD_ID}/archive/ 2>&1 | tail -n 5

I think there's a ${JENKINS_HOME} or something for the path on the master.   That copies a 2-3 GB file in roughly 40 seconds instead of something like 4 minutes.  There was a fix put in recently for I think some Maven plugin where when copying files to the master, the master would poll the slave to send over the next packet with too many requests, and fixing that sped things up a ton, perhaps there's another fix coming for how other files are transferred.

Since "big" can sometimes be > 8GB, it would choke the normal archiver which uses tar under the covers, or at least it did.  In any case this is much faster, since pigz is multicore aware:

tar cf ${WORKSPACE}/bigfile.tar.gz --use-compress-program=pigz [files to pack]

YMMV

--- Matt

Tim Black

unread,
Mar 1, 2021, 4:43:02 PM3/1/21
to Jenkins Users
I'm trying to do same, but in both directions (archiving AND copying artifacts from upstream). I wonder how the scp approach to copying artifacts would work in multibranch pipelines? Can one deterministically construct the path to a branch job's artifact folder on the controller's disk?

As I commented here, I'm also seeking massive performance gains by replacing copyArtifact with a shell call in my pipelines. In lieu of installing a proper artifact management system and replacing all archive/copyArtifact with calls to its REST API (which I'll be doing later this year), I'm hoping to find a quick alternative. SCP would be a slam dunk for me if I could construct the source path correctly. The problem is that Jenkins is using an algorithm to create a unique folder name, both for workspace names, and for branch job names, but I'm not sure if that's consistent.

E.g. to fetch artifacts from the corresponding branch of an upstream multibranch pipeline job with Full project name of "ProjectFolder/MyProject/feature%2Ffoo", in the downstream multibranch pipeline, I would do something like:

scp -r jenkins-controller:<JENKINS_HOME>/jobs/ProjectFolder/jobs/MyProject/branches/<HOW_DO_I_COMPUTE_THE_BRANCH_PORTION_OF_PATH?>/lastSuccessfulBuild/artifact/<GLOB>

Tim Black

unread,
Mar 4, 2021, 11:08:05 PM3/4/21
to Jenkins Users
To whom it may concern, I ended up finding the code in Jenkins branch-api plugin that's creating that branch path segment (the NameMangler), however it turned out to be completely unnecessary since I can just get the build directory (to construct the path to the artifacts on the controller) from the project/job object, obtained by name from the jenkins instance in groovy. So my shared library function is even simpler now, works for any project type and safer because it doesn't need to use any core or plug in classes.

Melo Vi

unread,
May 16, 2022, 1:25:30 AM5/16/22
to Jenkins Users
> scp -r jenkins-controller:<JENKINS_HOME>/jobs/ProjectFolder/jobs/MyProject/branches/<HOW_DO_I_COMPUTE_THE_BRANCH_PORTION_OF_PATH?>/lastSuccessfulBuild/artifact/<GLOB>



Hi, sorry to revive an old thread but,

I'm going down the `scp` route. However, upon copying `jobs/${JOB_NAME}/builds/${BUILD_ID}/archive/` to the controller node, refreshing the Jenkins UI (https://{JENKINS_URL}/job/{JOB_NAME}/job/{BRANCH_NAME}/) does not show the copied artifact. Was this a problem you came across, if so how did you overcome it?


Alternatively, would you mind sharing the implementation of your shared library function?
Reply all
Reply to author
Forward
0 new messages