99% of requests fast, 1% hang forever (80%/20% in the afternoon)

129 views
Skip to first unread message

Wayne Walker

unread,
Oct 23, 2015, 12:02:48 PM10/23/15
to Jenkins Users
Jenkins 1.634
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)

Lots more info at https://gist.github.com/wwalker/fc85ea65033ac9b0f714

In short, if I make 1000 simple requests (https://jenkins/hudson/job/$i/config.xml) where $i is iterating over a list of job names, these return in a very consistent 0.3-0.4 seconds), except those that take longer than a second.  They end in a 502 error from apache after 2 minutes (what we set TimeOut to in Apache).

Right after a restart (2 PM), only 3 error out of 2000. An hour later it is up to 12 out of 1000 (1%).  A couple of hours later (5:30 PM) at was around 15% (still the 85% that don't hang are still 0.35 seconds).

This morning at 9 AM, 8 out of 1000.

We are seeing this in all requests.  There doesn't seem to be a complexity problem.  More complex requests fail about as often as simple requests.  Whether they are made through the GUI via browser or api calls through curl or the Python Jenkins API.

Wayne Walker

unread,
Oct 23, 2015, 12:08:17 PM10/23/15
to Jenkins Users
Additional basic info:

The jenkins master is running on a 32 core box, we have 38 slaves (of which 10 are currently down), everything in the build queue (6) is waiting on a specially tagged slave.  I can run api or UI requests.   System load average on the master is 0.06, jenkins is using about 75% of a core most of the time (32 cores, 8 GB Xmx, GC sawtooth drops by more than 2 GB each time, so not starving for memory).

Summary of a recent ThreadDump:

[wwalker@jenkins ~]$ cat jstack.14042.out | grep -A1 ^Thread | sed -e 's/Thread [0-9]*:/Thread <pid>:/' | sort | uniq -c | sort -nr 
    206 --
    115 Thread <pid>: (state = IN_NATIVE)
     92 Thread <pid>: (state = BLOCKED)
     46  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
     44  - java.io.FileInputStream.readBytes(byte[], int, int) @bci=0 (Compiled frame; information may be imprecise)
     32  - java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, int, int) @bci=0 (Compiled frame; information may be imprecise)
     25  - java.lang.Thread.sleep(long) @bci=0 (Compiled frame; information may be imprecise)
     22  - java.lang.UNIXProcess.waitForProcessExit(int) @bci=0 (Interpreted frame)
     12  - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
      8  - sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0 (Compiled frame; information may be imprecise)
      7  - sun.nio.ch.ServerSocketChannelImpl.accept() @bci=7, line=225 (Interpreted frame)
      4  - java.net.PlainSocketImpl.socketAccept(java.net.SocketImpl) @bci=0 (Interpreted frame)
      2  - sun.nio.ch.ServerSocketChannelImpl.accept0(java.io.FileDescriptor, java.io.FileDescriptor, java.net.InetSocketAddress[]) @bci=0 (Interpreted frame)
      2  - java.net.PlainDatagramSocketImpl.receive0(java.net.DatagramPacket) @bci=0 (Interpreted frame)
      2 
      1  - sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0 (Interpreted frame)
Reply all
Reply to author
Forward
0 new messages