[JIRA] [core] (JENKINS-32036) Slave disconnects after moving the windows clock forward

65 views
Skip to first unread message

cameron.spencer@caseware.com (JIRA)

unread,
Dec 11, 2015, 3:46:03 PM12/11/15
to jenkinsc...@googlegroups.com
Cam Spencer created an issue
 
Jenkins / Bug JENKINS-32036
Slave disconnects after moving the windows clock forward
Issue Type: Bug Bug
Assignee: Unassigned
Components: core
Created: 11/Dec/15 8:45 PM
Environment: Jenkins version 1.639
java.version 1.7.0_17
os.arch x86
os.name Windows Server 2008 R2
os.version 6.1

Slave:
Windows 7
Priority: Critical Critical
Reporter: Cam Spencer

During one of our tests, we need to move the windows clock forward. We already have disabled "Clock Difference" from the Preventive Node Monitoring configure page.

Command used to move the clock forward

Process process = Runtime.getRuntime().exec("cmd /C date " + dateToChangeTo);

After this command, our slave will disconnect.

Slave went offline during the build
ERROR: Connection was broken: java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@1278fd1[name=b30b19f8-da4f-5463-8ed0-53cbae44a1c7]
	at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
	at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:628)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: An existing connection was forcibly closed by the remote host
	at sun.nio.ch.SocketDispatcher.read0(Native Method)
	at sun.nio.ch.SocketDispatcher.read(Unknown Source)
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
	at sun.nio.ch.IOUtil.read(Unknown Source)
	at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
	at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
	at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
	at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
	... 7 more
Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v6.4.2#64017-sha1:e244265)
Atlassian logo

dbeck@cloudbees.com (JIRA)

unread,
Dec 14, 2015, 11:32:05 AM12/14/15
to jenkinsc...@googlegroups.com
Daniel Beck updated an issue

Looks more like a remoting issue. The log on the slave side would be interesting as well.

Reducing priority, as this does not appear to be a critical issue (it may be for you because your process breaks, but this would then apply to every possible feature behaving different from someone's expectations).

Change By: Daniel Beck
Component/s: remoting
Component/s: core
Priority: Critical Minor

cameron.spencer@caseware.com (JIRA)

unread,
Dec 14, 2015, 11:36:02 AM12/14/15
to jenkinsc...@googlegroups.com
Cam Spencer commented on Bug JENKINS-32036
 
Re: Slave disconnects after moving the windows clock forward

Is this the slave log you're talking about?

JNLP agent connected from /192.168.221.160
ERROR: Connection terminated
java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@1016a6[name=b30b19f8-da4f-5463-8ed0-53cbae44a1c7]
at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:628)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: An existing connection was forcibly closed by the remote host
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(Unknown Source)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.read(Unknown Source)
at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
... 7 more
<===[JENKINS REMOTING CAPACITY]===>Slave.jar version: 2.44
This is a Windows slave
just before slave b30b19f8-da4f-5463-8ed0-53cbae44a1c7 gets online ...
executing prepare script ...
setting up slave b30b19f8-da4f-5463-8ed0-53cbae44a1c7 ...
slave setup done.
[StartupTrigger] - Scanning jobs for node b30b19f8-da4f-5463-8ed0-53cbae44a1c7
Slave successfully connected and online

dbeck@cloudbees.com (JIRA)

unread,
Dec 14, 2015, 12:08:04 PM12/14/15
to jenkinsc...@googlegroups.com

cameron.spencer@caseware.com (JIRA)

unread,
Dec 14, 2015, 1:10:02 PM12/14/15
to jenkinsc...@googlegroups.com

The log I posted in my previous comment was from http://jenkins:8080/computer/b30b19f8-da4f-5463-8ed0-53cbae44a1c7/log. Where would I find the log you're talking about?

dbeck@cloudbees.com (JIRA)

unread,
Dec 14, 2015, 1:34:02 PM12/14/15
to jenkinsc...@googlegroups.com

cameron.spencer@caseware.com (JIRA)

unread,
Dec 14, 2015, 2:04:02 PM12/14/15
to jenkinsc...@googlegroups.com

Sorry, just to clarify, is this the directory that is set on the node for the "Remote root directory" property?. If so I'm looking on the slave in the C:\Jenkins directory but all I see is a workspace folder. I don't see a log in the workspace directory either.

dbeck@cloudbees.com (JIRA)

unread,
Dec 14, 2015, 4:26:01 PM12/14/15
to jenkinsc...@googlegroups.com

How is the slave connected? JNLP? SSH? If JNLP,was it installed as a service, or is it run ad-hoc in a console, or another way?

cameron.spencer@caseware.com (JIRA)

unread,
Dec 15, 2015, 9:29:01 AM12/15/15
to jenkinsc...@googlegroups.com

JNLP. After our virtual machine is created we connect to the node with

start /min java -Dhudson.util.ProcessTree.disable=true -jar slave.jar -jnlpUrl http://jenkins:8080/computer/%UUID%/slave-agent.jnlp

dbeck@cloudbees.com (JIRA)

unread,
Dec 15, 2015, 10:52:01 AM12/15/15
to jenkinsc...@googlegroups.com

That probably dumps the logs to NUL

maybe try redirecting the output (both sterr and stdout I think) to a file?

cameron.spencer@caseware.com (JIRA)

unread,
Dec 15, 2015, 10:57:02 AM12/15/15
to jenkinsc...@googlegroups.com

How would I go about doing that? Feel free to link me to a resource if that's too many steps to list out.

dbeck@cloudbees.com (JIRA)

unread,
Dec 15, 2015, 11:03:06 AM12/15/15
to jenkinsc...@googlegroups.com

cameron.spencer@caseware.com (JIRA)

unread,
Dec 17, 2015, 11:44:01 AM12/17/15
to jenkinsc...@googlegroups.com
Cam Spencer updated an issue
 

I attached the slave log. The disconnect happens after we move the clock forward 2 years

Change By: Cam Spencer
Attachment: Slave log.txt

cameron.spencer@caseware.com (JIRA)

unread,
Dec 17, 2015, 11:46:03 AM12/17/15
to jenkinsc...@googlegroups.com
Cam Spencer edited a comment on Bug JENKINS-32036
 
Re: Slave disconnects after moving the windows clock forward
I attached the slave log. The disconnect happens after we move the clock forward 2 years . The clock is rolled forward twice during the entire test which is why entries for 2019 appear.

dbeck@cloudbees.com (JIRA)

unread,
Dec 17, 2015, 12:13:04 PM12/17/15
to jenkinsc...@googlegroups.com

Oleg Nenashev Any idea whether this can be done in a way that doesn't break everything?

ben.apgar@caseware.com (JIRA)

unread,
Jan 6, 2016, 1:35:03 PM1/6/16
to jenkinsc...@googlegroups.com

Hi Cam. It looks like this is related to (or is a duplicate of) JENKINS-9541

teilo@java.net (JIRA)

unread,
Feb 17, 2016, 8:46:02 AM2/17/16
to jenkinsc...@googlegroups.com

What is the version of the slave.jar?

This sounds like the PingThread - and that code was rewritten from System.currentTimeMillis() to System.nanoTime() so that this exact thing should not happen. (https://github.com/jenkinsci/remoting/pull/33)

Even so the logs do not mention the ping thread - so this could be a spurious wake-up that is not handled correctly (just want to make sure the salves are all up to date first)

teilo@java.net (JIRA)

unread,
Feb 17, 2016, 8:47:01 AM2/17/16
to jenkinsc...@googlegroups.com

I assume that the clock is changed on the slave rather than the master - but just want to check.

cameron.spencer@caseware.com (JIRA)

unread,
Feb 17, 2016, 9:16:02 AM2/17/16
to jenkinsc...@googlegroups.com

Hey James Nord, I'm not sure what version the slave.jar is, but I can guarantee it's not up to date. It probably hasn't been touched in a couple of years. As per https://wiki.jenkins-ci.org/display/JENKINS/Distributed+builds we never really concerned ourselves with using the latest slave.jar.

"Updating slave.jar
Technically speaking, in this set up you should update slave.jar every time you upgrade Jenkins to a new version. However, in practice slave.jar changes infrequently enough that it's also practical not to update until you see a fatal problem in start-up."

I'll make sure to try this with a new slave.jar (Currently our Jenkins is 1.639) and update this bug with the results. You're correct that the clock change is happening on the slave. None of our tests run on the master.

cameron.spencer@caseware.com (JIRA)

unread,
Feb 18, 2016, 11:42:03 AM2/18/16
to jenkinsc...@googlegroups.com

Even with the updated slave.jar, the disconnecting continues.


Feb 18, 2016 10:36:44 AM hudson.remoting.jnlp.Main createEngine
INFO: Setting up slave: 31c75d25-a047-41ef-c1aa-83f121769104
Feb 18, 2016 10:36:44 AM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Feb 18, 2016 10:36:45 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://tdjenkins:8080/]
Feb 18, 2016 10:36:45 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Feb 18, 2016 10:36:45 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to tdjenkins:54935
Feb 18, 2016 10:36:45 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP2-connect
Feb 18, 2016 10:36:45 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
Feb 18, 2016 10:36:47 AM com.youdevise.hudson.slavestatus.SlaveListener call
INFO: Slave-status listener starting
Feb 18, 2016 10:36:47 AM com.youdevise.hudson.slavestatus.SocketHTTPListener wai
tForConnection
INFO: Slave-status listener ready on port 3141
Feb 18, 2016 10:38:07 AM hudson.Proc$LocalProc join
WARNING: Process leaked file descriptors. See http://wiki.jenkins-ci.org/display
/JENKINS/Spawning+processes+from+build for more information
java.lang.Exception
        at hudson.Proc$LocalProc.join(Proc.java:329)
        at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1153)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvo
cationHandler.java:608)
        at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocat
ionHandler.java:583)
        at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocat
ionHandler.java:542)
        at hudson.remoting.UserRequest.perform(UserRequest.java:120)
        at hudson.remoting.UserRequest.perform(UserRequest.java:48)
        at hudson.remoting.Request$2.run(Request.java:326)
        at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecut
orService.java:68)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at hudson.remoting.Engine$1$1.run(Engine.java:62)
        at java.lang.Thread.run(Unknown Source)

Feb 18, 2020 11:20:35 AM hudson.remoting.SynchronousCommandTransport$ReaderThrea
d run
SEVERE: I/O error in channel channel
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(Unknown Source)
        at java.net.SocketInputStream.read(Unknown Source)
        at java.io.BufferedInputStream.fill(Unknown Source)
        at java.io.BufferedInputStream.read(Unknown Source)
        at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStr
eam.java:82)
        at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java
:72)
        at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.
java:103)
        at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTrans
port.java:39)
        at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(Abs
tractSynchronousByteArrayCommandTransport.java:34)
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(Synchron
ousCommandTransport.java:48)

Feb 18, 2020 11:20:35 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated
Feb 18, 2020 11:20:46 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://tdjenkins:8080/]
Feb 18, 2020 11:20:46 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Feb 18, 2020 11:20:46 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to tdjenkins:54935
Feb 18, 2020 11:20:46 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP2-connect
Feb 18, 2020 11:20:46 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
Feb 18, 2020 11:20:49 AM com.youdevise.hudson.slavestatus.SlaveListener call
INFO: Slave-status listener starting
Feb 18, 2020 11:20:49 AM com.youdevise.hudson.slavestatus.SlaveListener$1 run
SEVERE: Could not listen on port
java.net.BindException: Address already in use: JVM_Bind
        at java.net.DualStackPlainSocketImpl.bind0(Native Method)
        at java.net.DualStackPlainSocketImpl.socketBind(Unknown Source)
        at java.net.AbstractPlainSocketImpl.bind(Unknown Source)
        at java.net.PlainSocketImpl.bind(Unknown Source)
        at java.net.ServerSocket.bind(Unknown Source)
        at java.net.ServerSocket.<init>(Unknown Source)
        at java.net.ServerSocket.<init>(Unknown Source)
        at com.youdevise.hudson.slavestatus.SocketHTTPListener.waitForConnection
(SlaveListener.java:129)
        at com.youdevise.hudson.slavestatus.SlaveListener$1.run(SlaveListener.ja
va:63)
        at com.youdevise.hudson.slavestatus.Daemon.go(Daemon.java:16)
        at com.youdevise.hudson.slavestatus.SlaveListener.call(SlaveListener.jav
a:83)
        at hudson.remoting.UserRequest.perform(UserRequest.java:120)
        at hudson.remoting.UserRequest.perform(UserRequest.java:48)
        at hudson.remoting.Request$2.run(Request.java:326)
        at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecut
orService.java:68)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at hudson.remoting.Engine$1$1.run(Engine.java:62)
        at java.lang.Thread.run(Unknown Source)

o.v.nenashev@gmail.com (JIRA)

unread,
Mar 13, 2018, 10:33:24 PM3/13/18
to jenkinsc...@googlegroups.com
Oleg Nenashev assigned an issue to Unassigned
 

Unfortunately I have no capacity to work on Remoting in medium term, so I will unassign it and let others to take it. If somebody is interested to submit a pull request, I will be happy to help to get it reviewed and released.

Change By: Oleg Nenashev
Assignee: Oleg Nenashev
This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)
Atlassian logo

pjdarton@gmail.com (JIRA)

unread,
Apr 24, 2018, 11:33:02 AM4/24/18
to jenkinsc...@googlegroups.com
pjdarton commented on Bug JENKINS-32036
 
Re: Slave disconnects after moving the windows clock forward

TL;DR:  Setting the clock is a Bad Thing; don't do that.  Adjust it gradually instead.

 

To make Jenkins tolerate a clock change, we'd have to find a way of making all of Jenkins' "wait for a duration" code use nanotime instead of UTC.  If the system "long millisecondsSince1970" clock changes, all of Java's normal "wait until" methods will return early or late.  It's only code that only looks at system nanosecondsOfUptime that'll be unaffected.

This (kind of) issue has bitten me many times in my career and it extends to more than just Java & Jenkins, e.g. the Windows DHCP Client doesn't cope with the clock changing either (it'll either re-request an IP early, or it'll fail to renew the lease in time, sometimes causing a brief network outage that Windows cascades into a fatal networking error that kills all current TCP connections and thus briefly disconnects the slave).  Been there; done that ... and have the mental scars to prove it.

The solution is fairly simple - do NOT set the clock while Jenkins (or anything else you care about) is running.  If you need to tweak the clock then use the Windows Time Service (or an ntpd service) to gradually adjust the clock (by running it a bit fast or a bit slow) until it's in sync.

e.g. what we do on our dynamically-provisioned slaves, before the Jenkins slave process starts, is (a) force the time service to re-sync and set the time, (b) halt & disable all services that might adjust the time (c) force Windows to renew the DHCP lease and only after the clock is correct and the network "renewed" do we then allow the Jenkins slave.jar to run, thus ensuring that no other process will adjust the clock while Jenkins is running.

 

Going back to the original problem ... I think you need to refactor your code to make it testable by injecting in a test-controlled time source so that, during this test, the code-under-test isn't actually reading the system's real time clock at all, but is instead reading the time provided by the test harness.  e.g. we declared an interface `ITimeProvider` which had a getUpimeInNanoseconds(), a getUTCInMilliseconds() and a delay(long milliseconds) method, and then had a default implementation that the code-under-test usually used unless the test harness needed to control time.

Reply all
Reply to author
Forward
0 new messages