We recently migrated our Jenkins server from running on Solaris (a VM)
to Ubuntu (also a VM). The VMs are similarly configured.
After the move to Ubuntu, commonly within a couple of hours (but we've
also seen it survive for most of a week), the memory usage of the
Jenkins master jvm goes vertical, and top indicates that the CPU goes
from perhaps a few 10's of a percent, to indicating 100%+ (typically
something likke 104%), and stays there until we intervene (typically
by killing and restarting Jenkins)
Pictures being helpful sometimes, I've posted a plot of the RSS size
for the jvm process, which includes three instances of the behavior we
see. (So, the graph is depicting multiple difference jvm/jenkins
instances over time):
http://www.purestorage.foxcove.com/jenkins-mem-woes.jpg
The three spikes are pretty clear to see. The first, on the far left,
happened just after we enabled the logging that collected the graph
data. We killed/restarted Jenkins at the end of the time it was
spiking. We then see what we consider the "normal" operation of the
server between 100 and 5700 minutes. All this time, the process runs
with a mean RSS of something like 600m.
Then, our problem happens again near 5700 minutes... within two or
three minutes, the RSS for the jvm climbs to around 2.5g. The CPU also
changes gears at the point, going from a few 10s of a % to 100+%
(presumably this is gc kicking in).
One other clue: In the first few minutes of the spike, the Jenkins web
UI gets slower and slower, until any operation is painfully slow. But
I have been able to peek inside the jvm with the Monitoring plugin
during the spikes, and noticed something that looked odd: a seemingly
immortal request. Unfortunately I don't have a copy of the request
information, but it was a very long-lived thread (many hours) that
would *always* show up in the Current Request list. Sometimes, it
would be the only current request, but when repeatedly reloading the
page, I'd see other requests coming and going.
Here's a little of the possibly-relevant configuration information for
the systems where we are and are not seeing the problem:
===== Old Jenkins Server host (problem never observed here):
Jenkins version: 1.410
root $ uname -a
SunOS hudson 5.11 oi_147 i86pc i386 i86pc
host memory: 4Gb
root $ java -version
java version "1.6.0_18"
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) Server VM (build 16.0-b13, mixed mode)
Jenkins jvm heap size: 512m (-Xmx512m when starting java)
===== New Jenkins Server host (where the problem happens)
Jenkins version: 1.417
root $ uname -a
Linux jenkins 2.6.38-8-server #42-Ubuntu SMP Mon Apr 11 03:49:04 UTC
2011 x86_64 x86_64 x86_64 GNU/Linux
host memory: 4Gb
root $ java -version
java version "1.6.0_22"
OpenJDK Runtime Environment (IcedTea6 1.10.1) (6b22-1.10.1-0ubuntu1)
OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)
Jenkins jvm heap size: 2048m (-Xmx2048m when starting java; this is an
increased value - we bumped it in an early attempt at troubleshooting
the problem).
Obviously, there are many things we could change in further
experiments, but given that it can take several days for the problem
to crop up, such experimentation is very slow. I'm hoping that the
description here might be enough for somebody to recognize a known
problem or gotcha.