Speed up Artifact Copy between slave and master

82 views
Skip to first unread message

Marcelo Brunken

unread,
Oct 19, 2011, 5:36:27 AM10/19/11
to jenkin...@googlegroups.com
Hellow,

There are a few tickets alread about this problem ... our bottleneck is the copy process between slave and master, is there a solution on way ? Someone is working on it?
I am trying to figure out how it could be faster, I think if the transfer protocol is changed or something, HTTP sucks. (I am almost sure it is sent via HTTP)

Thanks

David Karlsen

unread,
Oct 19, 2011, 12:28:20 PM10/19/11
to jenkin...@googlegroups.com

It is also slow over ssh. I saw a fix and pull request for it here the other day - by using TCP nodelay. It has not been applied yet AFAIK.

Marcelo Brunken

unread,
Oct 21, 2011, 8:47:21 AM10/21/11
to jenkin...@googlegroups.com
Any Ideas when that release comes out ?

2011/10/19 David Karlsen <davidk...@gmail.com>

David Karlsen

unread,
Oct 21, 2011, 10:44:50 AM10/21/11
to jenkin...@googlegroups.com
No idea.
Not even if the pull request was handled and put onto master.

2011/10/21 Marcelo Brunken <brun...@gmail.com>:

--
--
David J. M. Karlsen - http://www.linkedin.com/in/davidkarlsen

Tim Black

unread,
Mar 1, 2021, 4:50:38 PM3/1/21
to Jenkins Developers
Reviving this very old thread, since this is still very much a problem in Jenkins core a decade later. As I commented here, I'm seeing massive (~13x) performance gains by replacing copyArtifact with a shell call to curl or wget in my pipelines. 

As I understand it, copyArtifact uses a single Jenkins "control channel", which has severely limited i/o and/or cpu resources, and this has been so as far back as I can see. This causes not only sluggish copying of artifacts from controller to agent, but also is a major factor in the similarly abysmal performance of archiving artifacts in the other direction (artifact compression being the other factor).

I am experimenting with workarounds. In lieu of installing a proper artifact management system and replacing all archive/copyArtifact with calls to its REST API (which I'll be doing later this year), I'm hoping to find a quick alternative. The two I'm considering ATM are:
  1. HTTP GET each artifact URL in question via curl, wget, etc.
    1. This is nice bc it can just use the same semantics I was already using with copyArtifact, that is, jobName, branchName, lastSuccessfulBuild symlinks..
    2. This is great for known individual artifacts, but http requires significant extra complexity to fetch whole artifact folders or artifacts matching wildcard/regex like copyArtifact supports. HTTP doesn't have a notion of a directory, so you have to pre-process by fetching an artifact index page, processing, and looping.
      1. This guy said that Jenkins supports http fetching a zip of any folder, but that's not working for me on jenkins 2.249.2.
    3. Another problem here is you have to deal with jenkins authentication / API tokens.
  2. SCP/RSYNC supports rich file/directory pattern matching, but 
    1. require knowledge of the location of the artifacts on the controller's disk. This is non-trivial for multibranch pipeline projects (which I use liberally). Scp would be an obvious choice if I could figure out how to deterministically construct the path to a multibranch pipeline branch job on the controller's disk.
    2. Authentication is trivial since all users/config in my jenkins infra is managed by ansible, so my jenkin user can automatically ssh to any other node in the infra without password.
Any insight into way of replacing copyArtifact with curl/scp would be greatly appreciated. Thanks for your time.

Tim Black

unread,
Mar 1, 2021, 4:58:11 PM3/1/21
to Jenkins Developers
Refining my request a bit further:  SCP as a copyArtifact alternative would be a slam dunk for me if I could construct the source path correctly. The problem is that I use multibranch pipelines liberally and Jenkins uses an algorithm to create a unique folder name, both for workspace names, and for branch job names, but I'm not sure if that's consistent, and therefore I do not know if it would be safe to attempt to re-construct and reference job paths on the controller's disk.

E.g. I want to fetch artifacts from the corresponding branch of an upstream multibranch pipeline job whose Full project name is "ProjectFolder/MyProject/feature%2Ffoo", in the downstream multibranch pipeline, I would do something like:

scp -r jenkins-controller:<JENKINS_HOME>/jobs/ProjectFolder/jobs/MyProject/branches/<HOW_DO_I_COMPUTE_THE_BRANCH_PORTION_OF_PATH?>/lastSuccessfulBuild/artifact/<GLOB> ./

Jesse Glick

unread,
Mar 1, 2021, 5:49:13 PM3/1/21
to Jenkins Dev
On Mon, Mar 1, 2021 at 4:50 PM Tim Black <timb...@gmail.com> wrote:
In lieu of installing a proper artifact management system and replacing all archive/copyArtifact with calls to its REST API

Suggest https://plugins.jenkins.io/artifact-manager-s3/ (or some other JEP-202 implementation) instead. 

Tim Black

unread,
Mar 1, 2021, 6:22:02 PM3/1/21
to Jenkins Developers
I agree external artifact mgmt is the way to go, and I'll be doing that in some work later this year. (Thanks for the link to JEP-202, I've already learned a lot from skimming it.)

I have now gleaned that the branch segment of a multibranch job path is the same for all jobs using a given branch. This indicates there's a common algorithm computing it from the branch name. Can anyone point in the right direction in the source code where the multibranch job path is created?

If I can compute that from the downstream job, I can then fully form the src path to pass to scp and I'm done. 

Jesse Glick

unread,
Mar 2, 2021, 12:48:33 PM3/2/21
to Jenkins Dev
On Mon, Mar 1, 2021 at 6:22 PM Tim Black <timb...@gmail.com> wrote:
Can anyone point in the right direction in the source code where the multibranch job path is created?

Look in the `branch-api` plugin. 

Tim Black

unread,
Mar 2, 2021, 9:03:33 PM3/2/21
to Jenkins Developers
Thanks. I see where the directory name is constructed for a workspace: 


But not where the branch job dir is created. Any help?

Jesse Glick

unread,
Mar 2, 2021, 11:03:57 PM3/2/21
to Jenkins Dev
The principal class to look at is `MultiBranchProject`.

Tim Black

unread,
Mar 3, 2021, 12:44:25 AM3/3/21
to Jenkins Developers
Think I found it: NameMangler.apply(). Would it be possible/advised to import the NameMangler class in my Shared Library vars/scpArtifacts.groovy (assuming my Jenkins instance has branch-api plugin installed, which it does.) Something like this:

```
import jenkins.branch.NameMangler
def mangled_branch_name = NameMangler.apply(branch_name)
```

I'll try this out in the morning, just curious if anyone can confirm whether this looks feasible or I'm way off track. Thanks.

Baptiste Mathus

unread,
Mar 3, 2021, 2:38:17 AM3/3/21
to Jenkins Developers


Le mer. 3 mars 2021 à 06:44, Tim Black <timb...@gmail.com> a écrit :
Think I found it: NameMangler.apply(). Would it be possible/advised to import the NameMangler class in my Shared Library vars/scpArtifacts.groovy (assuming my Jenkins instance has branch-api plugin installed, which it does.) Something like this:

```
import jenkins.branch.NameMangler
def mangled_branch_name = NameMangler.apply(branch_name)
```

I'll try this out in the morning, just curious if anyone can confirm whether this looks feasible or I'm way off track. Thanks.

Using core java classes from Jenkins pipeline shared library is generally strongly discouraged.
This could break from one day to another without notice.
What you're working on looks like it should rather be done in a full-blown Jenkins plugin.
(If not, this discussion should be on the users mailing list)
 

On Tuesday, March 2, 2021 at 8:03:57 PM UTC-8 Jesse Glick wrote:
The principal class to look at is `MultiBranchProject`.

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/40b68498-b2ba-4e02-9c23-1eb76c04709an%40googlegroups.com.

Tim Black

unread,
Mar 3, 2021, 8:05:04 PM3/3/21
to Jenkins Developers
Points taken; I'm not surprised by your response. As I have no intention to write a plug-in for this, I'll wrap this up and think about presenting it on the user's list. (Where I already had two posts with no responses still).

My case is a bit exceptional in that I'm in complete control of all configuration of all Jenkins clusters at my company, which are configured using ansible and configuration as code, so I've got the Jenkins version and plug-in versions all locked in so there should be no surprises if the name Mangler class changed in a subsequent release.

Importing and using the name Mangler class worked just fine, however it turned out to be completely unnecessary since I can just get the build directory (to construct the path to the artifacts on the controller) from the project/job object. So my shared library function is even simpler now and safer because it doesn't need to use any core or plug in classes.

Most importantly, I've now got a robust and highly performant workaround to the old issue of very sluggish copying of artifacts. I'm getting about a 12-15x performance boost here. (We have several large artifacts) Thanks for your time..

Reply all
Reply to author
Forward
0 new messages