As any serious Jenkins users would know, a big Jenkins instance takes a
long time to start up. One of the problems here is that it loads every
build record before it starts accepting HTTP requests, and therefore a
natural solution to this problem is to try to lazy load build records
on-demand.
I've been working on this in and out, but I finally got to a milestone.
And incidentally I got contacted by Christian Wolfgang about this, as
this very topic was discussed in the Copenhagen hackathon [3].
My changes are in the lazy-load branch [1]. You can also see the commits
made in this branch [2] that aren't in master.
What's already done
===================
RunMap is conceptually a map from build number to Run, and Runs form a
bi-directional linked list between their adjacent neighbors. So the
first step was to turn RunMap into a virtual map. That is, it loads each
build records on-demand as requested. Ditto to the bi-directional
linking between Runs.
What makes this interesting is that build records are keyed by their
timestamps on disk, not by numbers (and numbers are available as
symlinks but not in all the platforms.) So RunMap uses a binary search
to locate the build that's currently needed. For this to work, I made an
assumption that for build #M and #N. M>N iff M.timestamp>N.timestamp.
This process also incorporates symlinks as a hint to speed up the search
if they are available.
Much of this logic is in AbstractLazyLoadRunMap, which has no
dependencies to the rest of Jenkins core. This was so that I can test
this class quickly and efficiently without using HudsonTestCase (and
it's because historically I developed this code outside core.) See its
search method that is the heart of this logic.
RunMap was then modified to inherit this class to provide this necessary
behaviour.
The next step was to make Job/AbstractProject take advantages of this.
The Job class expects subtypes to provide the _getRuns method that
returns SortedMap of the builds, and all the other methods on Job that
deal with obtaining build records work on this map. Some of these
methods are overriden in AbstractProject to take advantages of RunMap.
I then proceeded to update RunList, which is a class that's used to list
up builds that satisfy a certain criteria. In the master, this class
extends from ArrayList and every time a new filter is applied, all
builds that satisfy the criteria gets copied into this array. This
doesn't work very well with lazy loading, so I modified this class to
only lazily walk through the builds and pick up ones that satisfy the
criteria.
For example, a typical use case of this class is "take all the builds of
a job, narrow it down those that have failed, then list up first 10,
render it to RSS". The new lazy implementation works very well with this.
Unfortunately, to make this work, I needed to make a signature breaking
change --- the class now extends from AbstractList and not from
ArrayList. I did scan the source code of all the plugins to see their
use of RunList, and I didn't spot anywhere it's casted to ArrayList, so
I think the impact of this would be small.
And at this point, it passes all the unit tests.
What needs to be done
=====================
Clearly more testing needs to be done. Code coverage of
AbstractLazyLoadRunMap is actually pretty good, but the logic is complex.
I also need to try this with a real Jenkins instance, mainly to see if
there's some code in Jenkins or plugins that tries to eagerly load all
the build records of all the jobs.
To help us find this, we probably need some logging that tells us who's
loading build records when.
I'll try this with some real Jenkins deployments, and when that's ready,
I'd like to merge this to the master. If anyone is willing to give this
a shot before it hits the master, I'd highly appreciate that.
The next goal after that is to deal with places where someone tries to
load all the build records of one job. Unfortunately, this happens today
in numerous places. Just in the job top page, test report trend or code
coverage trend will cover the entire build history. So this will be a
slow process. More about this later.
WDYT?
[1]
http://github.com/jenkinsci/jenkins/tree/lazy-load
[2]
https://github.com/jenkinsci/jenkins/compare/master...lazy-load
[3]
http://wiki.praqma.net/jcicodecamp12/sessions/improve-start-up-time
--
Kohsuke Kawaguchi | CloudBees, Inc. |
http://cloudbees.com/
Try Nectar, our professional version of Jenkins