performance labs

97 views
Skip to first unread message

Michael Neale

unread,
Sep 30, 2015, 10:50:03 PM9/30/15
to Jenkins Developers
Hey all - I have thought it would be a great idea to have some quasi formal "performance lab" setups for Jenkins. 

Recently around Jenkins 2.0 planning threads there have been lots of comments around performance challenges. Often things like launch time (talking many minutes to an hour for large workspaces - launch times are probably a good proxy for a whole lot of issues, but there are other issues too). 

At JUC west there was an excellent talk by Akshay Dayal from Google, on scaling jenkins. I highly recommend flicking through the slides or watching the talk if you have time. 

Basically, they had some performance goals and started by setting up measurements and test scenarios to validate their progress - both around scalability of slaves (an interesting issue) but also on bootup time (time to recovery) which is very interesting. It reminded me that to improve something like this you kind of need easily repeatable measurements in controlled environments, which currently I don't think the Jenkins project has set up? (correct me if wrong). 

I know Stephen Connolly did some work a few years back on slave scalability which was interesting (building out a test suite infrastructure), but I am not aware of subsequent efforts. 

Is this something people would be interested in? 

Having either large sample JENKINS_HOME specimens or test code that can generate pathological data would be required, as well as automation around running it on a variety of machines (not necessarily cloud, ideally want to be testing code not cloud infrastructure). 


Andrew Bayer

unread,
Oct 1, 2015, 3:54:30 AM10/1/15
to jenkin...@googlegroups.com
+1 - that'd be fantastic. I'd love to help with that.

A.

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/76a12929-8f10-4b50-bf01-04cc77768149%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrew Bayer

unread,
Oct 1, 2015, 3:55:08 AM10/1/15
to jenkin...@googlegroups.com
...and I can most likely provide builds.apache.org's jobs/builds/load/etc as a use case.

A.

Michael Neale

unread,
Oct 1, 2015, 8:18:33 PM10/1/15
to Jenkins Developers
Oh wow - they may be a perfect test workload. Do you know if boot up times are in the many many minutes for those instances? Some data on the jenkins_home dir sizes?
It would be ideal to use opensource workloads (even if it is a point in time) vs something contrived, or a scrubbed version of a private users data that has donated it, however it would want to be pretty hefty (not necessarily 2TB jenkins_homes that I have heard of, or 40 minute boot up, but something up there would be nice). 

I guess the next step is an initial scope of what we want to measure. To keep things focussed I am thinking or boot up to job load time, and listing a few things. 

Artur Szostak

unread,
Oct 2, 2015, 5:12:24 AM10/2/15
to jenkin...@googlegroups.com
Hi,

You do not necessarily need very large setups to do these performance tests. What you need to be doing is performing proper measurements on the right things. I would even go so far as to say that large setups might actually make things difficult to disentangle.

Since it sounds like the project has got nothing in terms of systematic performance measurements, I would advise to start smaller. You certainly want larger setups to be part of the mix, but I think the first focus should be on the smaller setups for a baseline. Also, one should check behaviour when you have a large mix of plugins.

Again, the more important point is measurement. To anyone who is setting this up who is not a physicist by training (maybe you need reminding if you are): a single number is not a measurement. At a minimum, a measurement is 2 numbers, lower and upper range. And even better is a mean + standard deviation or confidence interval. Why am I pointing this out you may ask. Well, if I supposedly measured the time for something and tell you it took 30 seconds. Then I change some code and time again and get 25 seconds. Have I improved things? You would maybe say yes. But what if I tell you the measurement was 30 +/- 10 versus 25 +/- 10 seconds? I haven't really improved anything, now have I. It's just noise. So, if we are serious about test driven software development, we should also be serious about measurement.

It is also important to record and keep trends of the timings. There will be outliers and there will be weird stuff in the trends from time to time, which needs to be checked, analysed and understood.

As a last point on measurement, I dont know if there is an easy way to get profile information, but a break down of how much CPU and I/O each plugin or core service consumes should be a goal. If you can measure that, you can make quick progress on weeding out the culprits.

Kind regards.

Artur



From: jenkin...@googlegroups.com [jenkin...@googlegroups.com] on behalf of Michael Neale [michae...@gmail.com]
Sent: 02 October 2015 02:18
To: Jenkins Developers
Subject: Re: performance labs

Vojtech Juranek

unread,
Oct 2, 2015, 7:59:45 AM10/2/15
to jenkin...@googlegroups.com
Hi,

> Is this something people would be interested in?

yes, sounds interesting

> Having either large sample JENKINS_HOME specimens or test code that can
> generate pathological data would be required, as well as automation around
> running it on a variety of machines (not necessarily cloud, ideally want to
> be testing code not cloud infrastructure).

IMHO it's better to have some code do generate various type of workspaces and
loads - same as in the mentioned presentation, you should check performance
characteristics for various job types, log sizes, number of plugin used etc.
Using one big workspace can be harder to understand as it can combine multiple
issues together and you can end up with tuned Jenkins which works fine with
this use case, but performs not that well with other use cases.

I did very simple job generator of freestyle jobs [1] for PerfCake [2] to
measure responsiveness of job upload in the past. If you are interested, I can
updated it to generate various jobs or it can be done in any other tool you
prefer (or standalone application if you like).

Cheers
Vojta

[1] https://github.com/vjuranek/jenkins-perf-tests/blob/master/perfcake/src/main/resources/scenarios/create-freestyle.xml
[2] https://www.perfcake.org/
signature.asc

Michael Neale

unread,
Oct 6, 2015, 6:29:47 PM10/6/15
to jenkin...@googlegroups.com
Yes that would be quite interesting. A stand alone tool could be useful. There are lots of things to measure but generating a lot of noise and jobs would be a great start. When you say "job upload" what were you measuring?
--
You received this message because you are subscribed to a topic in the Google Groups "Jenkins Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jenkinsci-dev/1F9DHyMOutw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jenkinsci-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/1541911.yeAEkfGOKe%40localhost.localdomain.

Michael Neale

unread,
Oct 6, 2015, 11:18:00 PM10/6/15
to Jenkins Developers, aszo...@partner.eso.org
Agree on all counts, as long as it is sufficiently large/complicated enough to exercise enough code that it represents what people are observing. 

Certainly the result would have to be a trend of some meaningful statistic over time, with permutations of versions. I think sticking to launch time (or time to launch/render some important page) is worth measuring. 


I guess some tool people can clone and try themselves (which implies something that generates a large-enough workspace would be great, but downloading a tar.gz of a large one could also do) would be great to encourage experimentation and ultimately profiling. Obviously to get a trend over time we need somewhere to run it regularly against versions and permutations of plugins. Quickly gets complicated. 

So where to start, a repo with some parametrised launch script people can try? Use Jenkins itself to test launch times and calculate and establish trends? 

Andrew Bayer

unread,
Oct 7, 2015, 8:33:03 AM10/7/15
to jenkin...@googlegroups.com
So builds.apache.org is like 1500 jobs plus another ~30k Maven modules (stupid Maven project type!), $JENKINS_HOME is somewhere around 1tb. Until recently, startup time was a good 15 minutes or so, but since going from 1.565 to 1.609 seems to have made a *massive* difference in startup time - down to like three minutes.

A. 

Michael Neale

unread,
Oct 9, 2015, 12:01:52 AM10/9/15
to jenkin...@googlegroups.com
Wow fantastic. Actually 3 minutes means that the changes are pretty successful - I doubt there would be a whole lot to optimise in that case right? or could be even more lazy loaded? Still, probably a great example. Taking that base and then adding more plugins and config changes to the mix would also shed light on when things suddenly go bad. 

Is there publicly available tarball backups of that JENKINS_HOME or are there secrets in it? 

You received this message because you are subscribed to a topic in the Google Groups "Jenkins Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jenkinsci-dev/1F9DHyMOutw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jenkinsci-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CAPbPdOZLUVXneHYHL6B9cg912EDbRL3PePRbjWz%2BEgV0jqpf4g%40mail.gmail.com.

Vojtech Juranek

unread,
Oct 9, 2015, 4:18:38 AM10/9/15
to jenkin...@googlegroups.com
On Tuesday 06 October 2015 22:29:31 Michael Neale wrote:
> When you say "job upload" what were you measuring?

it was actually a measurement how Jenkins responsiveness to POST requests
changes when you switch underlying web server and was done by one of my
students as part of his master thesis, see [1] for more details

(perf tests were very basic - the main goal of the thesis was to implement
Winstone -> Undertow switch in Jenkins, as this thesis was started before
Kohsuke implemented switch to Jetty)

[1] https://groups.google.com/forum/#!topic/jenkinsci-dev/7dOnX2mNaw0
signature.asc

James Nord

unread,
Oct 9, 2015, 5:15:53 AM10/9/15
to Jenkins Developers
So I actually tried creating test data a year or so ago (maven job type with a large number of sub modules) and creating several of them in folders - but I never saw the issues (3 hour cold startup time) I was seeing on the production instance :(

Maven project is available at https://github.com/jtnord/maven-test-project if you want to experiment.

It may well have been around fingerprinting as my fingerprint file on production was > 2GB
but I invested in some better storage and got the startup to under 3 minutes so no longer had the inclination to try any further...

Robert Sandell

unread,
Oct 9, 2015, 6:29:55 AM10/9/15
to jenkin...@googlegroups.com
A theory of mine is that startup times can depend on how "old" your build records are, if there needs to be a lot of conversion of old data structures in new plugin versions that could have a measurable impact, maybe even OldDataMonitor gets involved and slows things down.

So there could be a difference in generated test data vs. "real world" data where it has grown over time.

/B

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/06f87cca-9af7-4624-90d2-6b85516e3eb0%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Robert Sandell
Software Engineer
CloudBees Inc.

Michael Neale

unread,
Oct 13, 2015, 7:29:37 AM10/13/15
to jenkin...@googlegroups.com
Yes recently I have heard rumblings about fingerprints and other accreted files over time.

Looking at the variation of times people see, I am questioning the utility of a generic test suite. Things vary so much there may be too many variables at play to make something like this useful right now. It's certainly useful to profile specific cases when people have a problem, and it's great there have been recent improvements (eg Apache example), but it may be a bit hard to justify right now.


Michael Neale

unread,
Oct 13, 2015, 7:34:31 AM10/13/15
to jenkin...@googlegroups.com
Can you tell us more about the hardware used?

Jesse Glick

unread,
Oct 13, 2015, 12:42:09 PM10/13/15
to Jenkins Dev
On Tue, Oct 13, 2015 at 7:29 AM, Michael Neale <michae...@gmail.com> wrote:
> Looking at the variation of times people see, I am questioning the utility
> of a generic test suite. Things vary so much there may be too many variables
> at play to make something like this useful right now.

Well a generic test suite is not going to predict any given
installation’s performance, of course. But it can serve a controlled
baseline by which you can measure the effects of changes. And many
widely applicable bugs, like the ones Google engineers found, can be
reproduced this way. When Stephen and I were poring over results from
sample tests using his scalability framework, which did really generic
stuff—run lots of builds from lots of jobs, each build producing gobs
of output—it was immediately clear what was broken. You set n=10 and
all is well. You set n=500 and things start to look worse. You set
n=10000 and the system basically hangs, and you look at a thread dump,
and oh yes a thousand threads are waiting on this one lock for no good
reason. So you fix that problem and rerun the test and you find the
next problem.

Michael Neale

unread,
Oct 13, 2015, 9:52:46 PM10/13/15
to Jenkins Dev
Ok so it sounds like exhuming Stephens scalability stuff (not sure if it did startup time, but it doesn't sound like it would be hard to ad) would be a great place to start. Like you said, turning the dials and seeing what happens is super useful. Even on VMs (vs bare metal) would be informative as its increasingly common to run Jenkins not on bare hardware.
--
You received this message because you are subscribed to a topic in the Google Groups "Jenkins Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jenkinsci-dev/1F9DHyMOutw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jenkinsci-de...@googlegroups.com.

Artur Szostak

unread,
Oct 14, 2015, 7:37:02 AM10/14/15
to jenkin...@googlegroups.com
The thread has been focusing on performance in terms of speed. But let me add another performance dimension that honestly is much more important to me right now (and causing me a lot of plain):
performance as in stability.

The following kinds of tests might go a long way in first quantitatively evaluating how stable Jenkins is and fixing these problems down the line.
- Perform continual start/stop cycles of the Jenkins master under various loads (system stress).
- Perform continual build slave start/build/stop cycles under various loads of the system and network. Ideally one would add simulations of intermittent network failure and check that Jenkins follows the expected error path.

I dont know about other people's experience, but I see that above a handful of build slave nodes one starts seeing a lot of connectivity and start up / shutdown issues. I also suspect there are a number of race conditions in there.


Sent: 14 October 2015 03:52
To: Jenkins Dev
Subject: Re: performance labs

You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CAKVMTi5sSUG%3DY9rNcQ4MoZLdV1%3DWq9uQoYt6rGTrB8LJOL1poQ%40mail.gmail.com.

Andrew Bayer

unread,
Oct 14, 2015, 10:04:19 AM10/14/15
to jenkin...@googlegroups.com
Yeah, stability is my biggest concern, and an even harder thing to test for than performance. Might be worth scraping through JIRA to find examples of behavior that tends to trigger instability in various ways to come up with some ideas...

A.

Michael Neale

unread,
Oct 14, 2015, 4:47:03 PM10/14/15
to jenkin...@googlegroups.com
If broadening scope a bit, I would like to include memory footprint measurements too (something I spend time thinking about). Stability of of more pressing importance I agree.

Reply all
Reply to author
Forward
0 new messages