[JIRA] [monitoring-plugin] (JENKINS-20947) Failed to monitor for Free Swap Space

1,915 views
Skip to first unread message

stephenconnolly@java.net (JIRA)

unread,
Jun 24, 2015, 4:23:01 AM6/24/15
to jenkinsc...@googlegroups.com
stephenconnolly commented on Bug JENKINS-20947
 
Re: Failed to monitor for Free Swap Space

I wonder if this is related to JENKINS-25218 where the ping thread intervenes and kills the channel (thereby breaking the livelock in Channel termination)

You could try disabling the ping thread on both sides, e.g. in the master set hudson.slaves.ChannelPinger.pingInterval=-1 and on the slave set hudson.remoting.Launcher.pingIntervalSec=-1 If my theory is correct your failure mode should turn in to the same failure mode as JENKINS-25218

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v6.4.2#64017-sha1:e244265)
Atlassian logo

andreas.kuttruff@openmind-tech.com (JIRA)

unread,
Jun 30, 2015, 2:08:01 AM6/30/15
to jenkinsc...@googlegroups.com

Hi Stephen,

I made the changes you mentioned and got following failure, but I don't know if it is the same failure as JENKINS-25218

Building remotely on 01-w7x64DEHS-HyperCAD-S (DE uitest vm Windows7 64bit hypersnapshot) in workspace C:\JenkinsSlave\workspace\Trunk_Smoke_Sim_HCAD-S_DE
FATAL: java.io.EOFException
hudson.remoting.RequestAbortedException: java.io.EOFException
at hudson.remoting.Request.abort(Request.java:296)
at hudson.remoting.Channel.terminate(Channel.java:815)
at org.jenkinsci.remoting.nio.NioChannelHub$3.run(NioChannelHub.java:613)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
at ......remote call to 01-w7x64DEHS-HyperCAD-S(Native Method)
at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1361)
at hudson.remoting.Request.call(Request.java:171)
at hudson.remoting.Channel.call(Channel.java:752)
at hudson.FilePath.act(FilePath.java:980)
at hudson.FilePath.act(FilePath.java:969)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:897)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:833)
at hudson.scm.SCM.checkout(SCM.java:485)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1282)
at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:610)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:532)
at hudson.model.Run.execute(Run.java:1744)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:374)
Caused by: java.io.EOFException
at org.jenkinsci.remoting.nio.NioChannelHub$3.run(NioChannelHub.java:613)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

stephenconnolly@java.net (JIRA)

unread,
Jun 30, 2015, 4:14:02 AM6/30/15
to jenkinsc...@googlegroups.com

On the basis of that stack trace, your issue is not the exact same as JENKINS-25218... though if you have not set the hudson.remoting.Launcher.pingIntervalSec=-1 property correctly on the slave (check the slave's system properties screen in Jenkins to verify) then the ping thread would still be causing the disconnect.

At present the stack trace indicates that the TCP/IP socket has been killed. If you have disabled both ping threads then that would point to a network issue between the master and slave (perhaps a firewall killing long lived sockets or a NAT gateway timeout). If you have not disabled both ping threads then the test needs repeating

andreas.kuttruff@openmind-tech.com (JIRA)

unread,
Jun 30, 2015, 4:48:02 AM6/30/15
to jenkinsc...@googlegroups.com

Here the command line which starts the JNLP Slave
.\Java\bin\java -jar -Dhudson.remoting.Launcher.pingIntervalSec=-1 slave.jar -jnlpUrl http://om-ui1.mum.de:8080/computer/01-w7x64DEHS-HyperCAD-S/slave-agent.jnlp -secret b9d8fde46213d8fe26bfc0f7fc36f7.....

stephenconnolly@java.net (JIRA)

unread,
Jun 30, 2015, 5:04:01 AM6/30/15
to jenkinsc...@googlegroups.com

isn't the -D supposed to be before the -jar but in any case, that looks like you have it the correct side of the jar file argument, so I would start looking for networking issues

evernat@free.fr (JIRA)

unread,
Feb 22, 2016, 8:13:03 AM2/22/16
to jenkinsc...@googlegroups.com
evernat updated an issue
 
Jenkins / Bug JENKINS-20947
Change By: evernat
Component/s: monitoring-plugin

magnayn@java.net (JIRA)

unread,
May 10, 2016, 6:50:03 AM5/10/16
to jenkinsc...@googlegroups.com
magnayn commented on Bug JENKINS-20947
 
Re: Failed to monitor for Free Swap Space

I have just come across this on two types of host.

First on Illumos(Smartos)/LX zones, it occasionally reports gigantic free memory (probably a bug), after which the buld log stops updating

{{Failed to monitor Triton-455640a8f44f for Free Swap Space
java.util.concurrent.ExecutionException: java.io.IOException: Failed to parse: '18446744073709099244' out of 'MemFree: 18446744073709099244 kB'
at hudson.remoting.Channel$2.adapt(Channel.java:813)
at hudson.remoting.Channel$2.adapt(Channel.java:808)
at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:96)
at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
Caused by: java.io.IOException: Failed to parse: '18446744073709099244' out of 'MemFree: 18446744073709099244 kB'
at org.jvnet.hudson.ProcMemInfo.monitor(ProcMemInfo.java:56)
at hudson.node_monitors.SwapSpaceMonitor$MonitorTask.call(SwapSpaceMonitor.java:113)
at hudson.node_monitors.SwapSpaceMonitor$MonitorTask.call(SwapSpaceMonitor.java:103)
at hudson.remoting.UserRequest.perform(UserRequest.java:120)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:326)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
at ......remote call to Triton-455640a8f44f(Native Method)
at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1416)
at hudson.remoting.UserResponse.retrieve(UserRequest.java:220)
at hudson.remoting.Channel$2.adapt(Channel.java:811)
... 4 more}}

I also now get this on Jenkins 2.x on windows slaves:

Failed to monitor winbuild for Free Swap Space
java.util.concurrent.ExecutionException: java.lang.UnsatisfiedLinkError: C:\Users\magnayn\AppData\Local\Temp\jna-829177819\jna2657881109746216710.dll: A dynamic link library (DLL) initialization routine failed
at hudson.remoting.Channel$2.adapt(Channel.java:813)
at hudson.remoting.Channel$2.adapt(Channel.java:808)
at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:96)
at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
Caused by: java.lang.UnsatisfiedLinkError: C:\Users\magnayn\AppData\Local\Temp\jna-829177819\jna2657881109746216710.dll: A dynamic link library (DLL) initialization routine failed
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary0(Unknown Source)
at java.lang.ClassLoader.loadLibrary(Unknown Source)
at java.lang.Runtime.load0(Unknown Source)
at java.lang.System.load(Unknown Source)
at com.sun.jna.Native.loadNativeDispatchLibraryFromClasspath(Native.java:851)
at com.sun.jna.Native.loadNativeDispatchLibrary(Native.java:826)
at com.sun.jna.Native.<clinit>(Native.java:140)
at com.sun.jna.Pointer.<clinit>(Pointer.java:41)
at com.sun.jna.Structure.<clinit>(Structure.java:2078)
at org.jvnet.hudson.Windows.monitor(Windows.java:42)
at hudson.node_monitors.SwapSpaceMonitor$MonitorTask.call(SwapSpaceMonitor.java:113)
at hudson.node_monitors.SwapSpaceMonitor$MonitorTask.call(SwapSpaceMonitor.java:103)
at hudson.remoting.UserRequest.perform(UserRequest.java:120)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:332)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)


at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

at hudson.remoting.Engine$1$1.run(Engine.java:85)
at java.lang.Thread.run(Unknown Source)
at ......remote call to winbuild(Native Method)
at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1416)
at hudson.remoting.UserResponse.retrieve(UserRequest.java:220)
at hudson.remoting.Channel$2.adapt(Channel.java:811)
... 4 more

Failure seems to kill the channel

Ping failed. Terminating the channel winbuild.
java.util.concurrent.TimeoutException: Ping started at 1462866895203 hasn't completed by 1462867135208
at hudson.remoting.PingThread.ping(PingThread.java:126)
at hudson.remoting.PingThread.run(PingThread.java:85)

I don't know why the space monitor re-throws the error as an exception - it might be an idea to swallow it and log.

mtaylor@opto22.com (JIRA)

unread,
Nov 3, 2016, 2:25:02 PM11/3/16
to jenkinsc...@googlegroups.com
Matt Taylor updated an issue
 
Change By: Matt Taylor
Attachment: jenkins.err.zip
This message was sent by Atlassian JIRA (v7.1.7#71011-sha1:2526d7c)
Atlassian logo

mtaylor@opto22.com (JIRA)

unread,
Nov 3, 2016, 2:25:03 PM11/3/16
to jenkinsc...@googlegroups.com
Matt Taylor commented on Bug JENKINS-20947
 
Re: Failed to monitor for Free Swap Space

Experiencing this same issue with a slave that is used on demand.

jenkins.err.zip

mtaylor@opto22.com (JIRA)

unread,
Nov 22, 2016, 3:13:06 PM11/22/16
to jenkinsc...@googlegroups.com
Matt Taylor updated an issue
Change By: Matt Taylor
Priority: Major Critical

mtaylor@opto22.com (JIRA)

unread,
Jan 12, 2017, 6:30:01 PM1/12/17
to jenkinsc...@googlegroups.com
 
Re: Failed to monitor for Free Swap Space

Please someone help still experiencing this.

randall.becker@nexbridge.ca (JIRA)

unread,
Apr 29, 2020, 12:37:08 PM4/29/20
to jenkinsc...@googlegroups.com

I also just started experiencing this situation under 2.222.3 under Java 1.8_242 under z/OS USS. This was after increasing the Node SSH timeout to something large. The Node log is showing:

ERROR: Failed to monitor for Response Time
java.util.concurrent.TimeoutException
{{ at hudson.remoting.Request$1.get(Request.java:316)}}
{{ at hudson.remoting.Request$1.get(Request.java:240)}}
{{ at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)}}
{{ at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114)}}
{{ at hudson.node_monitors.ResponseTimeMonitor$1.monitor(ResponseTimeMonitor.java:57)}}
{{ at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)}}
ERROR: Failed to monitor for Free Disk Space
java.util.concurrent.TimeoutException
{{ at hudson.remoting.Request$1.get(Request.java:316)}}
{{ at hudson.remoting.Request$1.get(Request.java:240)}}
{{ at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)}}
{{ at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114)}}
{{ at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:78)}}
{{ at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)}}
ERROR: Failed to monitor for Free Temp Space
java.util.concurrent.TimeoutException
{{ at hudson.remoting.Request$1.get(Request.java:316)}}
{{ at hudson.remoting.Request$1.get(Request.java:240)}}
{{ at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)}}
{{ at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114)}}
{{ at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:78)}}
at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)
...

in a loop until the SSH timeout hits, which then cancels the job by closing the connection. Are we possibly missing a dependency on the server? The server only has a straight up JDK with no additional tools (other than git and perl).

This message was sent by Atlassian Jira (v7.13.12#713012-sha1:6e07c38)
Atlassian logo

randall.becker@nexbridge.ca (JIRA)

unread,
Apr 29, 2020, 12:38:03 PM4/29/20
to jenkinsc...@googlegroups.com
Randall Becker edited a comment on Bug JENKINS-20947
Reply all
Reply to author
Forward
0 new messages