Memory issues with Jenkins

1,213 views
Skip to first unread message

Sverre Moe

unread,
Jul 2, 2019, 8:34:25 AM7/2/19
to Jenkins Users
We have assigned 8GB of memory to our Jenkins instance.
JAVA_OPTIONS=-Xmx8g

Still we experience memory issues after a while running.
java.lang.OutOfMemoryError: unable to create new native thread

We have:
aprox 40 connected build agents
aprox 400 pipeline jobs

We have a test Jenkins instance running with the same jobs, this one connects to the same build agents (though on a different home directory).

Lately we have been getting disconnected build agents, that we cannot get up again without restarting Jenkins.

Can we assign more memory to a build agent? Would it have any affect on this issue?

This we got from one of our latest Pipeline builds that failed on a sh("find **** -exec ***") step. It failed on that build agent that is now disconnected.


java.lang.OutOfMemoryError: unable to create new native thread
		at java.lang.Thread.start0(Native Method)
		at java.lang.Thread.start(Thread.java:714)
		at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
		at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
		at java.lang.UNIXProcess.initStreams(UNIXProcess.java:288)
		at java.lang.UNIXProcess.lambda$new$2(UNIXProcess.java:258)
		at java.security.AccessController.doPrivileged(Native Method)
		at java.lang.UNIXProcess.<init>(UNIXProcess.java:257)
		at java.lang.ProcessImpl.start(ProcessImpl.java:134)
		at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
		at hudson.Proc$LocalProc.<init>(Proc.java:249)
Also:   java.io.IOException: error=11, Resource temporarily unavailable



SEVERE: Unexpected error when retrieving changeset
hudson.plugins.git.GitException: Error: git whatchanged --no-abbrev -M "--format=commit %H%ntree %T%nparent %P%nauthor %aN <%aE> %
ai%ncommitter %cN <%cE> %ci%n%n%w(76,4,4)%s%n%n%b" -n 1 b2c871830a03ea5f2fd2b21245afb09d51d69686 in /home/build/jenkins/workspace/
project_user_work
       at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$6.execute(CliGitAPIImpl.java:1012)
       at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:153)
       at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:146)
       at hudson.remoting.UserRequest.perform(UserRequest.java:212)
       at hudson.remoting.UserRequest.perform(UserRequest.java:54)
       at hudson.remoting.Request$2.run(Request.java:369)
       at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
       Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to master-sles12.3-x86_64_3
               at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741)
               at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357)
               at hudson.remoting.Channel.call(Channel.java:955)
               at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:146)
               at sun.reflect.GeneratedMethodAccessor678.invoke(Unknown Source)
               at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
               at java.lang.reflect.Method.invoke(Method.java:498)
               at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:132)
               at com.sun.proxy.$Proxy104.execute(Unknown Source)
               at io.jenkins.blueocean.autofavorite.FavoritingScmListener.getChangeSet(FavoritingScmListener.java:159)
               at io.jenkins.blueocean.autofavorite.FavoritingScmListener.onCheckout(FavoritingScmListener.java:84)
               at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:140)
               at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:93)
               at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:80)
               at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingS
tepExecution.java:47)

Jul 01, 2019 11:51:12 AM hudson.init.impl.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler uncaughtException
SEVERE: A thread (Timer-9692/111139) died unexpectedly due to an uncaught exception, this may leave your Jenkins in a bad way and
is usually indicative of a bug in the code.
java.lang.OutOfMemoryError: unable to create new native thread
       at java.lang.Thread.start0(Native Method)
       at java.lang.Thread.start(Thread.java:714)
       at java.util.Timer.<init>(Timer.java:160)
       at java.util.Timer.<init>(Timer.java:132)
       at org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.scheduleRetryQueueProcessing(EventDispatcher.java:296)
       at org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.processRetries(EventDispatcher.java:437)
       at org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$1.run(EventDispatcher.java:299)
       at java.util.TimerThread.mainLoop(Timer.java:555)
       at java.util.TimerThread.run(Timer.java:505)

INFO: Ping failed. Terminating the channel master-sles12.3-x86_64_3.
java.util.concurrent.TimeoutException: Ping started at 1561982408948 hasn't completed by 1561982648948
       at hudson.remoting.PingThread.ping(PingThread.java:134)
       at hudson.remoting.PingThread.run(PingThread.java:90)

Jul 01, 2019 2:04:11 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
INFO: I/O error in channel master-sles12.3-x86_64_3
java.io.IOException: Unexpected termination of the channel
WARNING: Failed to monitor master-sles12.3-x86_64_3 for Free Temp Space

Jul 01, 2019 2:04:11 PM hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor monitorDetailed
WARNING: Failed to monitor master-sles12.3-x86_64_3 for Free Swap Space



The latest problem we got. It did not take down the build node. On all occasions of this problem it happens when the Pipeline is doing some IO on the Jenkins master. Here we manually restart the build again, and it builds fine.

Running on Jenkins in /var/lib/jenkins/workspace/project_master
[Pipeline] {
[Pipeline] parallel
[Pipeline] { (Branch: Setup)
[Pipeline] End of Pipeline
java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method)
	at java.lang.Thread.start(Thread.java:714)
	at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1366)
	at com.google.common.eventbus.AsyncEventBus.dispatch(AsyncEventBus.java:90)
	at com.google.common.eventbus.AsyncEventBus.dispatchQueuedEvents(AsyncEventBus.java:81)
	at com.google.common.eventbus.EventBus.post(EventBus.java:264)
	at org.jenkinsci.plugins.pubsub.GuavaPubsubBus$1.publish(GuavaPubsubBus.java:70)
	at org.jenkinsci.plugins.pubsub.PubsubBus.publish(PubsubBus.java:141)
	at io.jenkins.blueocean.events.PipelineEventListener.publishEvent(PipelineEventListener.java:196)
	at io.jenkins.blueocean.events.PipelineEventListener.onNewHead(PipelineEventListener.java:85)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1463)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:458)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Finished: FAILURE

Sverre Moe

unread,
Jul 2, 2019, 9:30:06 AM7/2/19
to Jenkins Users
Today it has been chaotic.
Several build agents disconnected

Unexpected termination of the channel

Many builds failed because of Memory error.

I have tried restarting Jenkins several times today.

Anyone have any suggestions?

Baptiste Mathus

unread,
Jul 4, 2019, 5:04:55 PM7/4/19
to jenkins...@googlegroups.com
Did you enable GC logging to have a better understanding of the profile of your memory consumption? If not, I would recommend you do it first and analyze them.
https://jenkins.io/blog/2016/11/21/gc-tuning/ explained this part (and much more) quite well.

Then, once you understand better when it crashes, possibly you'll want to analyze a heap dump to see what is causing the problem.

Cheers


--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/6b1f3729-e456-41a9-a464-c63d061e2912%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jan Monterrubio

unread,
Jul 4, 2019, 9:17:38 PM7/4/19
to jenkins...@googlegroups.com
Correct me if I’m wrong but I don’t think increasing heap size will actually affect your ability to create more native threads. 


Sverre Moe

unread,
Jul 9, 2019, 7:23:20 AM7/9/19
to Jenkins Users
I will try turning on GC logging.


torsdag 4. juli 2019 23.04.55 UTC+2 skrev Baptiste Mathus følgende:
Did you enable GC logging to have a better understanding of the profile of your memory consumption? If not, I would recommend you do it first and analyze them.
https://jenkins.io/blog/2016/11/21/gc-tuning/ explained this part (and much more) quite well.

Then, once you understand better when it crashes, possibly you'll want to analyze a heap dump to see what is causing the problem.

Cheers


To unsubscribe from this group and stop receiving emails from it, send an email to jenkins...@googlegroups.com.

Sverre Moe

unread,
Jul 9, 2019, 7:24:18 AM7/9/19
to Jenkins Users

Since we don't use 32bit, the reason must be
- the virtual memory of the OS has been fully depleted

How can I check for this, and remedy it?
To unsubscribe from this group and stop receiving emails from it, send an email to jenkins...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkins...@googlegroups.com.

Sverre Moe

unread,
Jul 9, 2019, 7:31:50 AM7/9/19
to Jenkins Users
Could it be issue with the virtual memory in the jenkins server? Because Jenkins does consume a lot of virtual memory.

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                        
13565 jenkins   20   0 12.641g 0.011t  13552 S 0.000 56.62 625:25.18 /usr/bin/java -XX:+UseG1GC -Xmx10g

It has amassed 12G of virtual memory. It is a problem with java and glibc
that can be remedied with
MALLOC_ARENA_MAX=1

I have tried adding ENVIRONMENT to /etc/systemd/system/jenkins.service, but it is not set before running jenkins.

Ivan Fernandez Calvo

unread,
Jul 13, 2019, 7:02:15 AM7/13/19
to Jenkins Users
Hi,

When the issue happens, Did you check the number of threads that Jenkins has open? How many file descriptors can your process open (run ulimit -a with the user jenkins)? here you have a good KB about memory and user limit on Jenkins Prepare Jenkins for Support

Sverre Moe

unread,
Jul 14, 2019, 7:29:22 AM7/14/19
to Jenkins Users
jenkins@meoscorebs12:~> ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 80229
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 80229
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited


I have check the number of processes, with "ps aux", but not the number of threads.
Right now, (Jenkins was restarted 2 day ago and few builds have run), there are 264 threads for Jenkins currently.
Is there any way I can find out what each thread is for?

Sverre Moe

unread,
Jul 14, 2019, 7:48:57 AM7/14/19
to Jenkins Users

Sverre Moe

unread,
Jul 29, 2019, 5:30:58 AM7/29/19
to Jenkins Users

guerkan demirci

unread,
Aug 3, 2019, 7:04:10 AM8/3/19
to Jenkins Users
Hi Sverre Moe,

could you resolve the issue?
I have the same problem with Jenkins.

Best

Sverre Moe

unread,
Aug 5, 2019, 5:08:27 AM8/5/19
to Jenkins Users
Have not yet resolved the issue. Not found a solution, nor what the cause actually is.

Ivan Fernandez Calvo

unread,
Aug 5, 2019, 1:39:47 PM8/5/19
to Jenkins Users
if it is the same instance with the agent disconnection errors, your problem is the NFS performance, it blocks tons of threads in IO operations that make agents cannot disconnect or connects blocking thousands of threads, the Jenkins instance works until is collapsed by the number of threads blocked or the amount of memory used by them. As I said in the other email thread rid of the NFS and your problems go away, if it is not possible, make that your NFS performs like a hard drive (+150MB/s) with more than 100 concurrent IO operations on small files for a period of 5-10 min, if you cannot make it that NFS will not work properly with Jenkins.

guerkan demirci

unread,
Aug 5, 2019, 3:35:40 PM8/5/19
to Jenkins Users
We don't have NFS for JENKINS_HOME. Master and all clients are using local file system.

Anyway, I can see the stack count going extremely high before going into OutOfMemory and then disconnecting all agents.

But we do have a network cifs folder where all agents are reading and writing to lot's of data.

I don't understand exactly what kind of disc access operation on Jenkins would spawn lot's of threads without waiting for the end of the IO operation.
Might this be caused by a Script in one of our projects Jenkinsfile?

Please, could you share some additional information or any idea for metering additional numbers othen then the number of threads?

This started to happen three weeks ago after an update. Meanwhile we added some more agents into the pool and started using docker for some jobs.

How can I know what's the reason of all the threads other then NFS?

Thank you

Ivan Fernandez Calvo

unread,
Aug 5, 2019, 4:17:46 PM8/5/19
to Jenkins Users
Hi,

Severe has another email thread open, I think it is the same Jenkins instance https://groups.google.com/d/msgid/jenkinsci-users/cc2d0bdb-b15f-4bec-a0a3-0562ea8c7df7%40googlegroups.com?utm_medium=email&utm_source=footer. I dunno what happens on your instance but probably it isn’t better that you open another email thread with the description of your issue

Sverre Moe

unread,
Aug 6, 2019, 3:48:50 AM8/6/19
to Jenkins Users
Sadly I was mistaken. We do not use NFS for JENKINS_HOME.

We do however use NFS for the location where builds copy the RPM build artifacts.

Sverre Moe

unread,
Aug 14, 2019, 7:11:52 AM8/14/19
to Jenkins Users
We got an 30 minute free CloudBees support. It was too short to dig deeper to find the problem, but the person I was talking to (after examining our logs) mentioned what he thought was the problem and gave a suggestion.

We should not use Jenkins master at all for builds (allocated with the node("master") step). We had 15 Executors for Jenkins master.

We could also try to Increase limits of hard nofile and nproc for jenkins user, but the main recomondation was to remove all Executors for Jenkins master.
> /etc/security/limits.conf
jenkins          soft    core            unlimited
jenkins          hard    core            unlimited
jenkins          soft    fsize           unlimited
jenkins          hard    fsize           unlimited
jenkins          soft    nofile          4096
jenkins          hard    nofile          10240 #Was 8192
jenkins          soft    nproc           30654
jenkins          hard    nproc           60654 #Was 30654


To remove Jenkins master Executors will take some time. We use Jenkins master when we publish our build artifacts RPMs to our NFS file storage.
Since our RPM NFS is only attached to the Jenkins master it is not possible at the moment. Unless we can use any other agent, then do a SCP onto our Jenkins master with the RPM artifacts.


We had a few other circumstances where we used Jenkins master. Like checking out a file to determine which build agent to actually use. These I have already changed to use any available build agent instead.

Devin Nusbaum

unread,
Aug 14, 2019, 9:38:17 AM8/14/19
to jenkins...@googlegroups.com
I have not read the whole thread in detail, but the “Unable to create new native thread” OutOfMemoryErrors from your original thread where one of the stack traces involves org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.scheduleRetryQueueProcessing looks like it could be related to https://issues.jenkins-ci.org/browse/JENKINS-58684, which is a thread leak caused by the SSE Gateway Plugin. You could try reverting the SSE Gateway Plugin to version 1.17 to see if that helps, although that might reintroduce a different, somewhat rarer memory leak (https://issues.jenkins-ci.org/browse/JENKINS-51057). To test my hypothesis, if you are running SSE Gateway Plugin version 1.19, you can collect thread dumps over time and see if you seem to have a large number of threads named “EventDispatcher.retryProcessor” (unfortunately in version 1.18 and below the threads are automatically named “Timer #n”, which is less useful), which would confirm that you are hitting JENKINS-58684.

The advice to stop building on master is definitely a good idea as well.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/3e728790-b2f5-4ae1-a9fe-512a5c912d61%40googlegroups.com.

Félix Belzunce Arcos

unread,
Aug 14, 2019, 10:07:02 AM8/14/19
to Jenkins Users
Hi Sverre Moe,

I am the person who talked to you this morning :-)

Long term solution is to avoid building on the master to avoid performance issue and the need to increase the number of processes and open files in the machine where the jenkins master is located. Building on the master is also not recommended from a security point of view.

Short term solution would be to increase the number of new processes on this machine + take thread dumps from the master each 10 minutes. For this, you can create a cron freestyle job executed every 10 minutes executing jstack <JENKINS_PID>. When the issue happens, you could take a look at the latest 10 builds with their thread dumps and try to figure out what is actually consuming so many threads on the master.

I hope this helps,


El miércoles, 14 de agosto de 2019, 15:38:17 (UTC+2), Devin Nusbaum escribió:
I have not read the whole thread in detail, but the “Unable to create new native thread” OutOfMemoryErrors from your original thread where one of the stack traces involves org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.scheduleRetryQueueProcessing looks like it could be related to https://issues.jenkins-ci.org/browse/JENKINS-58684, which is a thread leak caused by the SSE Gateway Plugin. You could try reverting the SSE Gateway Plugin to version 1.17 to see if that helps, although that might reintroduce a different, somewhat rarer memory leak (https://issues.jenkins-ci.org/browse/JENKINS-51057). To test my hypothesis, if you are running SSE Gateway Plugin version 1.19, you can collect thread dumps over time and see if you seem to have a large number of threads named “EventDispatcher.retryProcessor” (unfortunately in version 1.18 and below the threads are automatically named “Timer #n”, which is less useful), which would confirm that you are hitting JENKINS-58684.

The advice to stop building on master is definitely a good idea as well.
On Aug 14, 2019, at 07:11, Sverre Moe <sver...@gmail.com> wrote:

We got an 30 minute free CloudBees support. It was too short to dig deeper to find the problem, but the person I was talking to (after examining our logs) mentioned what he thought was the problem and gave a suggestion.

We should not use Jenkins master at all for builds (allocated with the node("master") step). We had 15 Executors for Jenkins master.

We could also try to Increase limits of hard nofile and nproc for jenkins user, but the main recomondation was to remove all Executors for Jenkins master.
> /etc/security/limits.conf
jenkins          soft    core            unlimited
jenkins          hard    core            unlimited
jenkins          soft    fsize           unlimited
jenkins          hard    fsize           unlimited
jenkins          soft    nofile          4096
jenkins          hard    nofile          10240 #Was 8192
jenkins          soft    nproc           30654
jenkins          hard    nproc           60654 #Was 30654


To remove Jenkins master Executors will take some time. We use Jenkins master when we publish our build artifacts RPMs to our NFS file storage.
Since our RPM NFS is only attached to the Jenkins master it is not possible at the moment. Unless we can use any other agent, then do a SCP onto our Jenkins master with the RPM artifacts.


We had a few other circumstances where we used Jenkins master. Like checking out a file to determine which build agent to actually use. These I have already changed to use any available build agent instead.

tirsdag 6. august 2019 09.48.50 UTC+2 skrev Sverre Moe følgende:
Sadly I was mistaken. We do not use NFS for JENKINS_HOME.

We do however use NFS for the location where builds copy the RPM build artifacts.

mandag 5. august 2019 22.17.46 UTC+2 skrev Ivan Fernandez Calvo følgende:
Hi,

Severe has another email thread open, I think it is the same Jenkins instance https://groups.google.com/d/msgid/jenkinsci-users/cc2d0bdb-b15f-4bec-a0a3-0562ea8c7df7%40googlegroups.com?utm_medium=email&utm_source=footer. I dunno what happens on your instance but probably it isn’t better that you open another email thread with the description of your issue

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkins...@googlegroups.com.

Sverre Moe

unread,
Aug 14, 2019, 10:08:20 AM8/14/19
to Jenkins Users
I only have the option to downgrade to 1.18 of Server Sent Events (SSE) Gateway Plugin
I would have to download the 1.17 and manually downgrade it.

From the discussions it seems I also need to downgrade BlueOcean to 1.17
> Downgrading to BlueOcean 1.17 (which in turn uses sse-gateway 1.17) appears to have resolved our issue

This would be much more work. I would need to install all the BlueOcean 1.17 manually as I can only downgrade to 1.18.0

I might be willing to try this, even with the risk of https://issues.jenkins-ci.org/browse/JENKINS-51057

I am running SSE 1.19, and have previously recorded jstack from Jenkins PID. I could not find any EventDispatcher.retryProcessor \t


onsdag 14. august 2019 15.38.17 UTC+2 skrev Devin Nusbaum følgende:
I have not read the whole thread in detail, but the “Unable to create new native thread” OutOfMemoryErrors from your original thread where one of the stack traces involves org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.scheduleRetryQueueProcessing looks like it could be related to https://issues.jenkins-ci.org/browse/JENKINS-58684, which is a thread leak caused by the SSE Gateway Plugin. You could try reverting the SSE Gateway Plugin to version 1.17 to see if that helps, although that might reintroduce a different, somewhat rarer memory leak (https://issues.jenkins-ci.org/browse/JENKINS-51057). To test my hypothesis, if you are running SSE Gateway Plugin version 1.19, you can collect thread dumps over time and see if you seem to have a large number of threads named “EventDispatcher.retryProcessor” (unfortunately in version 1.18 and below the threads are automatically named “Timer #n”, which is less useful), which would confirm that you are hitting JENKINS-58684.

The advice to stop building on master is definitely a good idea as well.
On Aug 14, 2019, at 07:11, Sverre Moe <sver...@gmail.com> wrote:

We got an 30 minute free CloudBees support. It was too short to dig deeper to find the problem, but the person I was talking to (after examining our logs) mentioned what he thought was the problem and gave a suggestion.

We should not use Jenkins master at all for builds (allocated with the node("master") step). We had 15 Executors for Jenkins master.

We could also try to Increase limits of hard nofile and nproc for jenkins user, but the main recomondation was to remove all Executors for Jenkins master.
> /etc/security/limits.conf
jenkins          soft    core            unlimited
jenkins          hard    core            unlimited
jenkins          soft    fsize           unlimited
jenkins          hard    fsize           unlimited
jenkins          soft    nofile          4096
jenkins          hard    nofile          10240 #Was 8192
jenkins          soft    nproc           30654
jenkins          hard    nproc           60654 #Was 30654


To remove Jenkins master Executors will take some time. We use Jenkins master when we publish our build artifacts RPMs to our NFS file storage.
Since our RPM NFS is only attached to the Jenkins master it is not possible at the moment. Unless we can use any other agent, then do a SCP onto our Jenkins master with the RPM artifacts.


We had a few other circumstances where we used Jenkins master. Like checking out a file to determine which build agent to actually use. These I have already changed to use any available build agent instead.

tirsdag 6. august 2019 09.48.50 UTC+2 skrev Sverre Moe følgende:
Sadly I was mistaken. We do not use NFS for JENKINS_HOME.

We do however use NFS for the location where builds copy the RPM build artifacts.

mandag 5. august 2019 22.17.46 UTC+2 skrev Ivan Fernandez Calvo følgende:
Hi,

Severe has another email thread open, I think it is the same Jenkins instance https://groups.google.com/d/msgid/jenkinsci-users/cc2d0bdb-b15f-4bec-a0a3-0562ea8c7df7%40googlegroups.com?utm_medium=email&utm_source=footer. I dunno what happens on your instance but probably it isn’t better that you open another email thread with the description of your issue

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkins...@googlegroups.com.

Sverre Moe

unread,
Aug 14, 2019, 10:17:06 AM8/14/19
to Jenkins Users
I created a Pipeline job to run jstack every 10 minutes (though running on Jenkins master since that is where the Jenkins is running).

Sverre Moe

unread,
Sep 6, 2019, 4:47:37 AM9/6/19
to Jenkins Users
We haven't had this OutOfMemoryError now for 3 weeks running Jenkins.

We did four things.
1) Reduced master executors from 15 to 4
2) Reduced some job steps running on "master" and instead use a build agent for these steps. We still have one stage/step that needs to run on master.
3 Configured many of our build agents to be offline and come online on demand
4 Upgraded our Jenkins server: The old server was running SLES12. We Set up a new VM with SLES15, and copied JENKINS_HOME over to this new server.
Reply all
Reply to author
Forward
0 new messages