[JIRA] [core] (JENKINS-32036) Slave disconnects after moving the windows clock forward

cameron.spencer@caseware.com (JIRA)

unread,

Dec 11, 2015, 3:46:03 PM12/11/15

to jenkinsc...@googlegroups.com

Cam Spencer created an issue

Jenkins /

JENKINS-32036

Slave disconnects after moving the windows clock forward

Issue Type:	Bug
Assignee:	Unassigned
Components:	core
Created:	11/Dec/15 8:45 PM
Environment:	Jenkins version 1.639 java.version 1.7.0_17 os.arch x86 os.name Windows Server 2008 R2 os.version 6.1 Slave: Windows 7
Priority:	Critical
Reporter:	Cam Spencer

During one of our tests, we need to move the windows clock forward. We already have disabled "Clock Difference" from the Preventive Node Monitoring configure page.

Command used to move the clock forward

 
                                                    Process process = Runtime.getRuntime().exec("cmd /C date " + dateToChangeTo);

After this command, our slave will disconnect.

 
                                                    Slave went offline during the build
ERROR: Connection was broken: java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@1278fd1[name=b30b19f8-da4f-5463-8ed0-53cbae44a1c7]
	at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
	at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:628)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: An existing connection was forcibly closed by the remote host
	at sun.nio.ch.SocketDispatcher.read0(Native Method)
	at sun.nio.ch.SocketDispatcher.read(Unknown Source)
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
	at sun.nio.ch.IOUtil.read(Unknown Source)
	at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
	at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
	at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
	at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
	... 7 more
 
                                                

Add Comment

This message was sent by Atlassian JIRA (v6.4.2#64017-sha1:e244265)

dbeck@cloudbees.com (JIRA)

unread,

Dec 14, 2015, 11:32:05 AM12/14/15

to jenkinsc...@googlegroups.com

Daniel Beck updated an issue

Jenkins /

JENKINS-32036

Slave disconnects after moving the windows clock forward

Looks more like a remoting issue. The log on the slave side would be interesting as well.

Reducing priority, as this does not appear to be a critical issue (it may be for you because your process breaks, but this would then apply to every possible feature behaving different from someone's expectations).

Change By:	Daniel Beck
Component/s:	remoting
Component/s:	core
Priority:	Critical Minor

Add Comment

cameron.spencer@caseware.com (JIRA)

unread,

Dec 14, 2015, 11:36:02 AM12/14/15

to jenkinsc...@googlegroups.com

Cam Spencer commented on

JENKINS-32036

Re: Slave disconnects after moving the windows clock forward

Is this the slave log you're talking about?

 
                                                    JNLP agent connected from /192.168.221.160
ERROR: Connection terminated
java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@1016a6[name=b30b19f8-da4f-5463-8ed0-53cbae44a1c7]
at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:628)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: An existing connection was forcibly closed by the remote host
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(Unknown Source)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.read(Unknown Source)
at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
... 7 more
<===[JENKINS REMOTING CAPACITY]===>Slave.jar version: 2.44
This is a Windows slave
just before slave b30b19f8-da4f-5463-8ed0-53cbae44a1c7 gets online ...
executing prepare script ...
setting up slave b30b19f8-da4f-5463-8ed0-53cbae44a1c7 ...
slave setup done.
[StartupTrigger] - Scanning jobs for node b30b19f8-da4f-5463-8ed0-53cbae44a1c7
Slave successfully connected and online
 
                                                

Add Comment

dbeck@cloudbees.com (JIRA)

unread,

Dec 14, 2015, 12:08:04 PM12/14/15

to jenkinsc...@googlegroups.com

Daniel Beck commented on

JENKINS-32036

Re: Slave disconnects after moving the windows clock forward

No, the slave records its own log.

Add Comment

cameron.spencer@caseware.com (JIRA)

unread,

Dec 14, 2015, 1:10:02 PM12/14/15

to jenkinsc...@googlegroups.com

Cam Spencer commented on

JENKINS-32036

Re: Slave disconnects after moving the windows clock forward

The log I posted in my previous comment was from http://jenkins:8080/computer/b30b19f8-da4f-5463-8ed0-53cbae44a1c7/log. Where would I find the log you're talking about?

Add Comment

dbeck@cloudbees.com (JIRA)

unread,

Dec 14, 2015, 1:34:02 PM12/14/15

to jenkinsc...@googlegroups.com

Daniel Beck commented on

JENKINS-32036

Re: Slave disconnects after moving the windows clock forward

In the slave home directory.

Add Comment

cameron.spencer@caseware.com (JIRA)

unread,

Dec 14, 2015, 2:04:02 PM12/14/15

to jenkinsc...@googlegroups.com

Cam Spencer commented on

JENKINS-32036

Re: Slave disconnects after moving the windows clock forward

Sorry, just to clarify, is this the directory that is set on the node for the "Remote root directory" property?. If so I'm looking on the slave in the C:\Jenkins directory but all I see is a workspace folder. I don't see a log in the workspace directory either.

Add Comment

dbeck@cloudbees.com (JIRA)

unread,

Dec 14, 2015, 4:26:01 PM12/14/15

to jenkinsc...@googlegroups.com

Daniel Beck commented on

JENKINS-32036

Re: Slave disconnects after moving the windows clock forward

How is the slave connected? JNLP? SSH? If JNLP,was it installed as a service, or is it run ad-hoc in a console, or another way?

Add Comment

cameron.spencer@caseware.com (JIRA)

unread,

Dec 15, 2015, 9:29:01 AM12/15/15

to jenkinsc...@googlegroups.com

Cam Spencer commented on

JENKINS-32036

Re: Slave disconnects after moving the windows clock forward

JNLP. After our virtual machine is created we connect to the node with

 
                                                    start /min java -Dhudson.util.ProcessTree.disable=true -jar slave.jar -jnlpUrl http://jenkins:8080/computer/%UUID%/slave-agent.jnlp

Add Comment

dbeck@cloudbees.com (JIRA)

unread,

Dec 15, 2015, 10:52:01 AM12/15/15

to jenkinsc...@googlegroups.com

Daniel Beck commented on

JENKINS-32036

Re: Slave disconnects after moving the windows clock forward

That probably dumps the logs to NUL…

maybe try redirecting the output (both sterr and stdout I think) to a file?

Add Comment

cameron.spencer@caseware.com (JIRA)

unread,

Dec 15, 2015, 10:57:02 AM12/15/15

to jenkinsc...@googlegroups.com

Cam Spencer commented on

JENKINS-32036

Re: Slave disconnects after moving the windows clock forward

How would I go about doing that? Feel free to link me to a resource if that's too many steps to list out.

Add Comment

dbeck@cloudbees.com (JIRA)

unread,

Dec 15, 2015, 11:03:06 AM12/15/15

to jenkinsc...@googlegroups.com

Daniel Beck commented on

JENKINS-32036

Re: Slave disconnects after moving the windows clock forward

Don't have a Windows machine around. Maybe try any of these?
http://lmgtfy.com/?q=windows+batch+redirect+output+to+file

Add Comment

cameron.spencer@caseware.com (JIRA)

unread,

Dec 17, 2015, 11:44:01 AM12/17/15

to jenkinsc...@googlegroups.com

Cam Spencer updated an issue

Jenkins /

JENKINS-32036

Slave disconnects after moving the windows clock forward

I attached the slave log. The disconnect happens after we move the clock forward 2 years

Change By:	Cam Spencer
Attachment:	Slave log.txt

Add Comment

cameron.spencer@caseware.com (JIRA)

unread,

Dec 17, 2015, 11:46:03 AM12/17/15

to jenkinsc...@googlegroups.com

Cam Spencer edited a comment on

JENKINS-32036

Re: Slave disconnects after moving the windows clock forward

I attached the slave log. The disconnect happens after we move the clock forward 2 years . The clock is rolled forward twice during the entire test which is why entries for 2019 appear.

Add Comment

dbeck@cloudbees.com (JIRA)

unread,

Dec 17, 2015, 12:13:04 PM12/17/15

to jenkinsc...@googlegroups.com

Daniel Beck commented on

JENKINS-32036

Re: Slave disconnects after moving the windows clock forward

Oleg Nenashev Any idea whether this can be done in a way that doesn't break everything?

Add Comment

ben.apgar@caseware.com (JIRA)

unread,

Jan 6, 2016, 1:35:03 PM1/6/16

to jenkinsc...@googlegroups.com

Ben Apgar commented on

JENKINS-32036

Re: Slave disconnects after moving the windows clock forward

Hi Cam. It looks like this is related to (or is a duplicate of) JENKINS-9541

Add Comment

teilo@java.net (JIRA)

unread,

Feb 17, 2016, 8:46:02 AM2/17/16

to jenkinsc...@googlegroups.com

James Nord commented on

JENKINS-32036

Re: Slave disconnects after moving the windows clock forward

What is the version of the slave.jar?

This sounds like the PingThread - and that code was rewritten from System.currentTimeMillis() to System.nanoTime() so that this exact thing should not happen. (https://github.com/jenkinsci/remoting/pull/33)

Even so the logs do not mention the ping thread - so this could be a spurious wake-up that is not handled correctly (just want to make sure the salves are all up to date first)

Add Comment

teilo@java.net (JIRA)

unread,

Feb 17, 2016, 8:47:01 AM2/17/16

to jenkinsc...@googlegroups.com

James Nord commented on

JENKINS-32036

Re: Slave disconnects after moving the windows clock forward

I assume that the clock is changed on the slave rather than the master - but just want to check.

Add Comment

cameron.spencer@caseware.com (JIRA)

unread,

Feb 17, 2016, 9:16:02 AM2/17/16

to jenkinsc...@googlegroups.com

Cam Spencer commented on

JENKINS-32036

Re: Slave disconnects after moving the windows clock forward

Hey James Nord, I'm not sure what version the slave.jar is, but I can guarantee it's not up to date. It probably hasn't been touched in a couple of years. As per https://wiki.jenkins-ci.org/display/JENKINS/Distributed+builds we never really concerned ourselves with using the latest slave.jar.

"Updating slave.jar
Technically speaking, in this set up you should update slave.jar every time you upgrade Jenkins to a new version. However, in practice slave.jar changes infrequently enough that it's also practical not to update until you see a fatal problem in start-up."

I'll make sure to try this with a new slave.jar (Currently our Jenkins is 1.639) and update this bug with the results. You're correct that the clock change is happening on the slave. None of our tests run on the master.

Add Comment

cameron.spencer@caseware.com (JIRA)

unread,

Feb 18, 2016, 11:42:03 AM2/18/16

to jenkinsc...@googlegroups.com

Cam Spencer commented on

JENKINS-32036

Re: Slave disconnects after moving the windows clock forward

Even with the updated slave.jar, the disconnecting continues.

 
                                                    
Feb 18, 2016 10:36:44 AM hudson.remoting.jnlp.Main createEngine
INFO: Setting up slave: 31c75d25-a047-41ef-c1aa-83f121769104
Feb 18, 2016 10:36:44 AM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Feb 18, 2016 10:36:45 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://tdjenkins:8080/]
Feb 18, 2016 10:36:45 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Feb 18, 2016 10:36:45 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to tdjenkins:54935
Feb 18, 2016 10:36:45 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP2-connect
Feb 18, 2016 10:36:45 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
Feb 18, 2016 10:36:47 AM com.youdevise.hudson.slavestatus.SlaveListener call
INFO: Slave-status listener starting
Feb 18, 2016 10:36:47 AM com.youdevise.hudson.slavestatus.SocketHTTPListener wai
tForConnection
INFO: Slave-status listener ready on port 3141
Feb 18, 2016 10:38:07 AM hudson.Proc$LocalProc join
WARNING: Process leaked file descriptors. See http://wiki.jenkins-ci.org/display
/JENKINS/Spawning+processes+from+build for more information
java.lang.Exception
        at hudson.Proc$LocalProc.join(Proc.java:329)
        at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1153)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvo
cationHandler.java:608)
        at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocat
ionHandler.java:583)
        at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocat
ionHandler.java:542)
        at hudson.remoting.UserRequest.perform(UserRequest.java:120)
        at hudson.remoting.UserRequest.perform(UserRequest.java:48)
        at hudson.remoting.Request$2.run(Request.java:326)
        at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecut
orService.java:68)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at hudson.remoting.Engine$1$1.run(Engine.java:62)
        at java.lang.Thread.run(Unknown Source)

Feb 18, 2020 11:20:35 AM hudson.remoting.SynchronousCommandTransport$ReaderThrea
d run
SEVERE: I/O error in channel channel
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(Unknown Source)
        at java.net.SocketInputStream.read(Unknown Source)
        at java.io.BufferedInputStream.fill(Unknown Source)
        at java.io.BufferedInputStream.read(Unknown Source)
        at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStr
eam.java:82)
        at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java
:72)
        at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.
java:103)
        at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTrans
port.java:39)
        at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(Abs
tractSynchronousByteArrayCommandTransport.java:34)
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(Synchron
ousCommandTransport.java:48)

Feb 18, 2020 11:20:35 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated
Feb 18, 2020 11:20:46 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://tdjenkins:8080/]
Feb 18, 2020 11:20:46 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Feb 18, 2020 11:20:46 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to tdjenkins:54935
Feb 18, 2020 11:20:46 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP2-connect
Feb 18, 2020 11:20:46 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
Feb 18, 2020 11:20:49 AM com.youdevise.hudson.slavestatus.SlaveListener call
INFO: Slave-status listener starting
Feb 18, 2020 11:20:49 AM com.youdevise.hudson.slavestatus.SlaveListener$1 run
SEVERE: Could not listen on port
java.net.BindException: Address already in use: JVM_Bind
        at java.net.DualStackPlainSocketImpl.bind0(Native Method)
        at java.net.DualStackPlainSocketImpl.socketBind(Unknown Source)
        at java.net.AbstractPlainSocketImpl.bind(Unknown Source)
        at java.net.PlainSocketImpl.bind(Unknown Source)
        at java.net.ServerSocket.bind(Unknown Source)
        at java.net.ServerSocket.<init>(Unknown Source)
        at java.net.ServerSocket.<init>(Unknown Source)
        at com.youdevise.hudson.slavestatus.SocketHTTPListener.waitForConnection
(SlaveListener.java:129)
        at com.youdevise.hudson.slavestatus.SlaveListener$1.run(SlaveListener.ja
va:63)
        at com.youdevise.hudson.slavestatus.Daemon.go(Daemon.java:16)
        at com.youdevise.hudson.slavestatus.SlaveListener.call(SlaveListener.jav
a:83)
        at hudson.remoting.UserRequest.perform(UserRequest.java:120)
        at hudson.remoting.UserRequest.perform(UserRequest.java:48)
        at hudson.remoting.Request$2.run(Request.java:326)
        at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecut
orService.java:68)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at hudson.remoting.Engine$1$1.run(Engine.java:62)
        at java.lang.Thread.run(Unknown Source)
 
                                                

Add Comment

o.v.nenashev@gmail.com (JIRA)

unread,

Mar 13, 2018, 10:33:24 PM3/13/18

to jenkinsc...@googlegroups.com

Oleg Nenashev assigned an issue to Unassigned

Unfortunately I have no capacity to work on Remoting in medium term, so I will unassign it and let others to take it. If somebody is interested to submit a pull request, I will be happy to help to get it reviewed and released.

Jenkins /

JENKINS-32036

Slave disconnects after moving the windows clock forward

Change By:	Oleg Nenashev
Assignee:	Oleg Nenashev

Add Comment

This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)

pjdarton@gmail.com (JIRA)

unread,

Apr 24, 2018, 11:33:02 AM4/24/18

to jenkinsc...@googlegroups.com

pjdarton commented on

JENKINS-32036

Re: Slave disconnects after moving the windows clock forward

TL;DR: Setting the clock is a Bad Thing; don't do that. Adjust it gradually instead.

To make Jenkins tolerate a clock change, we'd have to find a way of making all of Jenkins' "wait for a duration" code use nanotime instead of UTC. If the system "long millisecondsSince1970" clock changes, all of Java's normal "wait until" methods will return early or late. It's only code that only looks at system nanosecondsOfUptime that'll be unaffected.

This (kind of) issue has bitten me many times in my career and it extends to more than just Java & Jenkins, e.g. the Windows DHCP Client doesn't cope with the clock changing either (it'll either re-request an IP early, or it'll fail to renew the lease in time, sometimes causing a brief network outage that Windows cascades into a fatal networking error that kills all current TCP connections and thus briefly disconnects the slave). Been there; done that ... and have the mental scars to prove it.

The solution is fairly simple - do NOT set the clock while Jenkins (or anything else you care about) is running. If you need to tweak the clock then use the Windows Time Service (or an ntpd service) to gradually adjust the clock (by running it a bit fast or a bit slow) until it's in sync.

e.g. what we do on our dynamically-provisioned slaves, before the Jenkins slave process starts, is (a) force the time service to re-sync and set the time, (b) halt & disable all services that might adjust the time (c) force Windows to renew the DHCP lease and only after the clock is correct and the network "renewed" do we then allow the Jenkins slave.jar to run, thus ensuring that no other process will adjust the clock while Jenkins is running.

Going back to the original problem ... I think you need to refactor your code to make it testable by injecting in a test-controlled time source so that, during this test, the code-under-test isn't actually reading the system's real time clock at all, but is instead reading the time provided by the test harness. e.g. we declared an interface `ITimeProvider` which had a getUpimeInNanoseconds(), a getUTCInMilliseconds() and a delay(long milliseconds) method, and then had a default implementation that the code-under-test usually used unless the test harness needed to control time.

Add Comment

Reply all

Reply to author

Forward