Performance of Jenkins slaves

985 views
Skip to first unread message

Alex Demitri

unread,
Sep 17, 2014, 4:24:53 PM9/17/14
to jenkins...@googlegroups.com
Hi! We just started using Jenkins for continuous intergration. The code is pulled from Perforce. We have one jenkins master (Windows VM) and 3 slaves (Windows VMs). I am more the VMware admin than a programmer.

I have been trying to tweak more and more the Jenkins slave setup. Now they are configured as 16vCPUs + 48GB of RAM per slave. Each time during a build, the CPU is always spiking at 100%. We are closing the build in 2h20m but the goal is to reach 1hr.

What is the best way to do so? What type of tweaks in VMware? How can we push through the build faster?

Thanks!
Alex

Scott Evans

unread,
Sep 17, 2014, 4:38:57 PM9/17/14
to jenkins...@googlegroups.com
Alex,

The first question I'd ask is what part of the build is running at 100% cpu, and can you tell what processes are consuming the cpu at that time?  You need to figure out the bottlenecks and where your pain points are first, before trying to solve your performance issues (or perceived issues).  If your build is doing a ton of CPU-intensive work that's all legitimate, then I'd think your 1 hour goal is not realistic, given the information you've provided and assuming you've already set up your VMware nodes properly for good performance.

Scott

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alex Demitri

unread,
Sep 17, 2014, 4:41:52 PM9/17/14
to jenkins...@googlegroups.com
What are good points of say, VMware nodes set up for good performance?

Also, what would you recommend for reaching 1hr?

Scott Evans

unread,
Sep 17, 2014, 5:02:38 PM9/17/14
to jenkins...@googlegroups.com
Alex,

I'm not a VMWare admin, so your experience would probably be better than anything I could suggest for setting up VMware nodes.  Is your build set up (such as with a multiconfiguration / matrix build) to use all 3 nodes for its work?  If your build already uses all 3 slaves and all 3 are maxing out their CPU for 2+ hours, then I doubt there's much that you can do, as-is.  If you need to reduce the build time by that much, you probably need to look at splitting up the build into more parallel pieces that can run simultaneously on more nodes.  If your current build is all serial on one slave and the other two are idle, you need to look at how you can split up your build into multiple pieces that can run simultaneously on more than one slave at a time.  For example, if you are building for multiple targets one at a time, see if you can change your build setup to build targets in parallel on different slaves rather than all of them sequential in one build.

In absence of other solutions, the only other thing that will likely give you significantly better performance is looking at your hardware configuration in regards to disk speed, processing power, etc.  Make sure your slaves are set for unlimited CPU slices, etc.  We've seen significant build speedups on some of our builds when going with solid state disks, but I don't know if that's feasible in your situation.  Also look to see if you have other stuff running on your slave nodes that are slowing things down such as active virus scanners, inventory management tools, live disk backups, etc.

When you say you're maxing out the cpu, is that on the slave node, or host system that's running the VM's?

One thing that will help you figure out what's going on in regards to CPU usage is to look at the slave nodes during a build and seeing what process(es) are taking up the CPU, and see what you can do to optimize that.   Is the 100% CPU usage continuous during the build, or only at brief times during the build?  You need to look at your whole build and see what parts of the build are taking the most time.  Is it the pull from Perforce, the build, the packaging, the verification, or some other step(s) that are taking a lot of time?

Scott


Alex Demitri

unread,
Sep 17, 2014, 7:05:27 PM9/17/14
to jenkins...@googlegroups.com
Thanks Scott for the quick reply.

After talking to DevOps it seems as there is no real process that's the culprit. As of right now, the issue is also instability as builds don't always take 2h20m, but sometimes, for some reason it lasts 5-6 hours. I asked them if there is any specific process that's maxing out that CPU and it seems to be random. If two builds went on for longer, one time it was process "x" the other time it was process "y".

Is there anything at the JVM level that needs to be tweaked? Max, min memory levels? I am not familiar with that but i heard it during conversations.

We are looking into adding more slaves. As of right now, it is my understanding we have:

- one master
- two slaves
- one slave that writes the build results to a cifs share

It is a possibility to soon host on a solid state drive infrastructure as i am working to implement that. Also, i am suggesting to write the build results locally rather than a cifs share hosted on the network. The 100% is during certain times of the build. 

Les Mikesell

unread,
Sep 18, 2014, 1:36:13 AM9/18/14
to jenkinsci-users
On Wed, Sep 17, 2014 at 3:24 PM, Alex Demitri <alex.d...@gmail.com> wrote:
>
> I have been trying to tweak more and more the Jenkins slave setup. Now they
> are configured as 16vCPUs + 48GB of RAM per slave.

How does this relate to the physical host's available resources? Are
you overcommitted with the number of active VMs sharing the host?
Likewise, what else is contending for any disk resources that might be
shared?

--
Les Mikesell
lesmi...@gmail.com

Alex Demitri

unread,
Sep 18, 2014, 9:48:35 AM9/18/14
to jenkins...@googlegroups.com
Hosts are dedicated to these VMs. Hosts are configured as 20 vCPUs / 384 gb ram per host. There are three hosts in the cluster. Only VMs living on the cluster are the jenkins slaves (3 vms) - jenkins master (1 vm).

Alex

Les Mikesell

unread,
Sep 18, 2014, 10:53:43 AM9/18/14
to jenkinsci-users
On Thu, Sep 18, 2014 at 8:48 AM, Alex Demitri <alex.d...@gmail.com> wrote:
> Hosts are dedicated to these VMs. Hosts are configured as 20 vCPUs / 384 gb
> ram per host. There are three hosts in the cluster. Only VMs living on the
> cluster are the jenkins slaves (3 vms) - jenkins master (1 vm).
>

Have you run the jobs on native hardware to know what times to expect?
I'd normally expect less than 10% overhead from VM infrastructure -
except where you have disk head contention among guests or memory or
CPU overcommit. I think current versions of VMware are pretty good at
CPU overcommit but older versions would not give a timeslice unless
the full number of virtual CPUs for the guest could be allocated at
once.

Beyond that you have to look at what the jobs are doing and what
resources they use.

--
Les Mikesell
lesmi...@gmail.com

Alex Demitri

unread,
Sep 18, 2014, 10:55:49 AM9/18/14
to jenkins...@googlegroups.com
Also, i received suggestions to move to SSD and contention of disk resources.. How are disk resources affecting the time of the build?

Alex

Scott Evans

unread,
Sep 18, 2014, 11:25:11 AM9/18/14
to jenkins...@googlegroups.com
Alex,

IF your builds are doing a lot of disk reading/writing and you're running multiple builds in parallel, you might be limited in build performance by how fast your hardware can read/write to disk.  If multiple builds are all doing a bunch of disk activity at the same time, you may be maxing out the bandwidth of your disk system, slowing things down.  However, until you know where your performance bottlenecks are you won't be able to accurately make changes to increase performance. 

In a different analogy, you can put nice big fat tires on a little car with a little engine, but unless you know that the tires were the limiting factor in your car's performance before your tire upgrade, you may be throwing money/time/resources at improvements that really aren't going to make any difference in the results.

What I suggest you do (and has been alluded to by others) is to really sit down (with your build engineers) and work through your build metrics and performance stats, and details of process usage of each piece of the build, start to finish and analyze where the performance bottlenecks are. Then look to put your time/money/resources in places where the most improvements can realistically be made.  Sometimes throwing better hardware at a build performance issue will help, but more often it will just be small percentages of increase.  To do it right, it will take some time to analyze what's being done now, and figure out how to streamline things.  Also make sure that your Jenkins VM system isn't being throttled by other VM's running on the same host which are sucking up resources of disk performance, memory, or cpu cycles. 

Out of curiosity, in your existing Jenkins system when a build is running, is it utilizing all 3 of your slaves, or are 2 idle and 1 is doing all the work?  Unless you specifically structure your build to do multiple sub-build pieces in parallel, Jenkins won't magically partition the work across all 3 slaves by itself.

Just my opinion, but I think your search for a magical VM tweak to double your build performance is going to be mostly futile, and if your goal is to get your build done in an hour then significant build changes will be needed to reach that goal.

Scott

--

Mark Waite

unread,
Sep 18, 2014, 11:28:32 AM9/18/14
to jenkins...@googlegroups.com
As another alternative, you could ask several of the developers on your team if they would allow you to temporarily run a slave on their computer for a performance test.  Configure the same job to run first on your VM environment, then on the borrowed developer computers, and compare the results.  If the differences are significant (they were in my case), then you have a basic understanding of one change you could make (switch from virtual to physical) and roughly how much gain that one change might provide.

Mark Waite
--
Thanks!
Mark Waite

Les Mikesell

unread,
Sep 18, 2014, 11:39:26 AM9/18/14
to jenkinsci-users
On Thu, Sep 18, 2014 at 9:55 AM, Alex Demitri <alex.d...@gmail.com> wrote:
> Also, i received suggestions to move to SSD and contention of disk
> resources.. How are disk resources affecting the time of the build?
>

With disks, it is usually head seek time that is much more of a
problem than data transfer speed. And if your VMs share the same
physical disks for their virtual disk images, they are always going to
want the heads to be in different places. You may be able to avoid
some of this problem by arranging to put the virtual disks on
different physical host drives, or using raids with a very large
number of drive members. SSDs avoid it by not needing to move a
physical head to access different locations.

--
Les Mikesell
lesmi...@gmail.com
Reply all
Reply to author
Forward
0 new messages