Lazy Loading jobs - massive slowdown after upgrade to 1.509.1-LTS on AIX

89 views
Skip to first unread message

Martin Kutter

unread,
May 10, 2013, 2:00:41 AM5/10/13
to jenkin...@googlegroups.com
Hi,

I've upgraded our jenkins installation to 1.509.1-LTS recently.
Unfortunately, our system is
experiencing a massive delay in both startup time and loading list views -
loading the main
view showing all jobs takes about 5 Minutes, (without the columns
lastSuccess, lastError and
lastDuration - I wasn't patient enough to give it a try with these
columns).

This is of course also due to our job structure - we currently have around
300 jobs, with
around 700 additional promotion steps.

A closer look into this behavior reveals that:

- on startup, jenkins tries to update the symlinks for lastSuccessful etc
- loading a list view (after a garbage collection) means loading all jobs
in this view (and,
depending on the columns, following the symlinks to lastSuccessful etc.)


Our system runs on java6 on a tomcat 6 on AIX 7.1
AIX does not seem to be supported by jna-posix. This means jenkins execs
the readlink program
every time a symlink is followed, which is probably the cause for the
massive delays.

As far as I can see, there's two (three) alternatives available:

1. upgrading to Java7, which can follow readlinks via the nio.* classes
I'm not sure, however, whether Jenkins would use the nio.* methods
instead of readlink here

2. Patch jna-posix to support AIX
This should not be too much of an issue, as AIX is POSIX-compliant AFAIK.

(3. make lazy loading optional in jenkins or change the behaviour loading
status)
I guess this is not an option...

Can you provide some advice on whether 1 or 2 would be the better option
im my case?


Best regards,

Martin

Jesse Glick

unread,
May 10, 2013, 7:39:08 AM5/10/13
to jenkin...@googlegroups.com
On 05/10/2013 02:00 AM, Martin Kutter wrote:
> jenkins execs the readlink program every time a symlink is followed, which is probably the cause for the massive delays

You are speculating, or you measured this (minimally with a series of thread dumps)?

> upgrading to Java7, which can follow readlinks via the nio.* classes

This is what I would recommend. Not sure what IBM’s support policy is, but Oracle has dropped unpaid support for JDK 6.

> I'm not sure, however, whether Jenkins would use the nio.* methods instead of readlink here

It should; if it does not, please investigate ASAP.

> Patch jna-posix to support AIX

There is already PR #770 to update to jnr-posix, which at least has AIX-specific classes in it; please help evaluate this PR to see if it improves performance on AIX.

Martin Kutter

unread,
May 12, 2013, 6:13:33 AM5/12/13
to jenkin...@googlegroups.com
Hi Jesse,

thanks for your reply.

> > jenkins execs the readlink program every time a symlink is followed, which is probably the cause for the massive delays
>
> You are speculating, or you measured this (minimally with a series of thread dumps)?

The readlink invocation is "measured" (observed) - this being the reason
for the delay is speculation, though.

> > upgrading to Java7, which can follow readlinks via the nio.* classes
>
> This is what I would recommend. Not sure what IBM’s support policy is, but Oracle has dropped unpaid support for JDK 6.
>
> > I'm not sure, however, whether Jenkins would use the nio.* methods instead of readlink here

In the meantime I've tried 1.509.1-LTS on AIX on java7. Startup and
(initial) list view loading times were reduced by a factor of around 2
(which is already enough in my case for not seeing timeouts).

> It should; if it does not, please investigate ASAP.
>
> > Patch jna-posix to support AIX
>
> There is already PR #770 to update to jnr-posix, which at least has AIX-specific classes in it; please help evaluate this PR to see if it improves performance on AIX.

Thanks for the hint - from looking at the Jenkins code, it looks like
Jenkins tries the java7 methods first, then GNU libc, then jna-posix.

Best regards,

Martin

Jesse Glick

unread,
May 13, 2013, 1:23:19 PM5/13/13
to jenkin...@googlegroups.com
On 05/12/2013 06:13 AM, Martin Kutter wrote:
> from looking at the Jenkins code, it looks like Jenkins tries the java7 methods first, then GNU libc, then jna-posix.

Right. Perhaps we could get rid of the libc usage if jnr-posix proves a good option for Java 6 users.

Chris Graham

unread,
May 29, 2013, 9:38:00 PM5/29/13
to jenkin...@googlegroups.com
Hey All.

IBM Java 6 has a few years left in it yet. 1.5 support was *extended* by a year to around sept 2013, from memory.

For WAS, you normally cann't change the underlying JDK. Although, I do believe this is an option now for WAS 8.5.

I've found that readlink is availble as a part of the coreutils package on AIX. It's installed into /usr/freeware/bin, which is not noramlly in the path, nor is it symlinked into /usr/...

So, adding /usr/freeware/bin into the path for the wasadmin user, we can see the following details:

Jenkins 1.510:
<<without readlink>>
[10:36:46.966] GET http://10.225.0.69/jenkins/ [HTTP/1.1 200 OK 98563ms]
[10:39:50.486] GET http://10.225.0.69/jenkins/ [HTTP/1.1 200 OK 96515ms]

<<with readlink>>
[10:58:11.697] GET http://10.225.0.69/jenkins/ [HTTP/1.1 200 OK 43375ms]
[10:59:11.871] GET http://10.225.0.69/jenkins/ [HTTP/1.1 200 OK 37594ms]
[11:00:20.580] GET http://10.225.0.69/jenkins/ [HTTP/1.1 200 OK 40219ms]


jenkins 1.500
<<with readlink>>
[11:30:52.488] GET http://10.225.0.69/jenkins/ [HTTP/1.1 200 OK 7953ms]
[11:32:05.379] GET http://10.225.0.69/jenkins/ [HTTP/1.1 200 OK 4609ms]


So, somewhere between 1.500 and 1.510 there was an introduction of some massive overhead.

-Chris

Baptiste Mathus

unread,
May 30, 2013, 2:59:59 AM5/30/13
to jenkin...@googlegroups.com

Hi Chris,
Maybe you could consider doing a got bisect to find the culprit commit? I did that a few years ago to fix an AIX issue with jenkins and it's simple and powerful.

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Jesse Glick

unread,
May 30, 2013, 9:29:59 AM5/30/13
to jenkin...@googlegroups.com
On 05/29/2013 09:38 PM, Chris Graham wrote:
> somewhere between 1.500 and 1.510 there was an introduction of some massive overhead.

Symlinks are simply used more extensively in newer versions of Jenkins, magnifying any existing problems. The fix will be to make Util.resolveSymlink work efficiently on
your platform. So please test PR #770.
Reply all
Reply to author
Forward
0 new messages