[JIRA] (JENKINS-51863) EC2 Plugin: Connectivity issues

5 views
Skip to first unread message

dfrank@connectwise.com (JIRA)

unread,
Jun 11, 2018, 9:23:02 AM6/11/18
to jenkinsc...@googlegroups.com
David Frank created an issue
 
Jenkins / Bug JENKINS-51863
EC2 Plugin: Connectivity issues
Issue Type: Bug Bug
Assignee: Francis Upton
Components: ec2-plugin
Created: 2018-06-11 13:22
Environment: EC2 Plugin 1.39
Master
- Jenkins ver. 2.124
- Java 1.8.0_171 (64 bit)
Slave
- M5.Large
- Windows Server 2016
- Java 1.8.0_171 (64 bit)
Labels: connection remoting
Priority: Major Major
Reporter: David Frank

About half of the time a new node is spun up, there is a connectivity issue between master and slave. The master successfully connects to the slave, but then the connection fails soon-after. When a true successful connection occurs (about half of the time), which I define as having the `Agent successfully connected and online` message appearing in the log, the connection remains stable and is not an issue.

 

First time poster - if any additional information is needed, please let me know.

 

Logged error below:

```

FINER: Node ***** is ready

          • booted at 1528718031000
            Connecting to ***** with WinRM as administrator
            Connected with WinRM.
            Creating tmp directory if it does not exist
            slave.jar sent remotely. Bootstrapping it
            <===[JENKINS REMOTING CAPACITY]===>Remoting version: 3.20
            This is a Windows agent
            Connection terminated
            ERROR: Failed to install restarter
            Ouch:
            hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown: Command Close created at
            at hudson.remoting.Request.abort(Request.java:340)
            at hudson.remoting.Channel.terminate(Channel.java:1038)
            at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1271)
            at hudson.remoting.Channel$1.handle(Channel.java:565)
            at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:87)
            Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to *****
            at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741)
            at hudson.remoting.Request.call(Request.java:202)
            at hudson.remoting.Channel.call(Channel.java:954)
            at org.jenkinsci.modules.upstart_slave_installer.SlaveInstallerFactoryImpl.createIfApplicable(SlaveInstallerFactoryImpl.java:29)
            at org.jenkinsci.modules.slave_installer.SlaveInstallerFactory.createIfApplicable(SlaveInstallerFactory.java:29)
            at org.jenkinsci.modules.slave_installer.SlaveInstallerFactory.createFor(SlaveInstallerFactory.java:46)
            at org.jenkinsci.modules.slave_installer.impl.ComputerListenerImpl.onOnline(ComputerListenerImpl.java:30)
            at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:620)
            at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:416)
            at hudson.plugins.ec2.win.EC2WindowsLauncher.launch(EC2WindowsLauncher.java:70)
            at hudson.plugins.ec2.EC2ComputerLauncher.launch(EC2ComputerLauncher.java:122)
            at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:288)
            at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
            at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            Caused by: hudson.remoting.Channel$OrderlyShutdown: Command Close created at
            ... 3 more
            Caused by: Command Close created at
            at hudson.remoting.Command.<init>(Command.java:65)
            at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1265)
            at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1263)
            at hudson.remoting.Channel.close(Channel.java:1436)
            at hudson.remoting.Channel.close(Channel.java:1403)
            at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1270)
            ... 2 more
            Command Close created at
            at hudson.remoting.Command.<init>(Command.java:65)
            at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1265)
            at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1263)
            at hudson.remoting.Channel.close(Channel.java:1436)
            at hudson.remoting.Channel.close(Channel.java:1403)
            at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1270)
            Caused: hudson.remoting.Channel$OrderlyShutdown
            Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to *****
            at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741)
            at hudson.remoting.Request.call(Request.java:202)
            at hudson.remoting.Channel.call(Channel.java:954)
            at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller.install(JnlpSlaveRestarterInstaller.java:61)
            at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller.access$000(JnlpSlaveRestarterInstaller.java:35)
            at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$Install.call(JnlpSlaveRestarterInstaller.java:49)
            at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$Install.call(JnlpSlaveRestarterInstaller.java:40)
            at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
            at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            Caused: hudson.remoting.RequestAbortedException
            at hudson.remoting.Request.abort(Request.java:340)
            at hudson.remoting.Channel.terminate(Channel.java:1038)
            at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1271)
            at hudson.remoting.Channel$1.handle(Channel.java:565)
            at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:87)

```

 

Around the same time as the error appearing above there is an error on detected by the master (also occurs for free temp space, free disk space, etc.), and then the node is made offline:

```

Failed to monitor ***** for Response Time
java.util.concurrent.TimeoutException
at hudson.remoting.Request$1.get(Request.java:316)
at hudson.remoting.Request$1.get(Request.java:240)
at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:96)
at hudson.node_monitors.ResponseTimeMonitor$1.monitor(ResponseTimeMonitor.java:59)
at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)

```

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)
Atlassian logo

dfrank@connectwise.com (JIRA)

unread,
Jun 11, 2018, 9:26:03 AM6/11/18
to jenkinsc...@googlegroups.com
David Frank updated an issue
Change By: David Frank
About half of the time a new node is spun up, there is a connectivity issue between master and slave. The master successfully connects to the slave, but then the connection fails soon-after. When a true successful connection occurs (about half of the time), which I define as having the `Agent successfully connected and online` message appearing in the log, the connection remains stable and is not an issue.

 

First time poster - if any additional information is needed, please let me know.

 

Logged error below:

```

{{ FINER: Node ***** is ready
}}{{ ***** booted at 1528718031000 }}

dfrank@connectwise.com (JIRA)

unread,
Jun 11, 2018, 11:43:03 AM6/11/18
to jenkinsc...@googlegroups.com
David Frank updated an issue
About half of the time a new node is spun up, there is a connectivity issue between master and slave. The master successfully connects to the slave, but then the connection fails soon-after. When a true successful connection occurs (about half of the time), which I define as having the `Agent successfully connected and online` message appearing in the log, the connection remains stable and is not an issue.

 

First time poster - if any additional information is needed, please let me know.

 

Logged error below:

{ { code:java}
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:87) {code } }

 

Around the same time as the error appearing above there is an error on detected by the master (also occurs for free temp space, free disk space, etc.), and then the node is made offline:

 

{ { code:java}
Failed to monitor ***** for Response Time }}
{{

java.util.concurrent.TimeoutException }}
{{

at hudson.remoting.Request$1.get(Request.java:316) }}
{{

at hudson.remoting.Request$1.get(Request.java:240) }}
{{

at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59) }}
{{

at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:96) }}
{{

at hudson.node_monitors.ResponseTimeMonitor$1.monitor(ResponseTimeMonitor.java:59) }}
{{

at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) {code } }
 

It looks like the closest issue to this is https://issues.jenkins-ci.org/browse/JENKINS-34044

dfrank@connectwise.com (JIRA)

unread,
Jun 13, 2018, 10:15:03 AM6/13/18
to jenkinsc...@googlegroups.com
David Frank commented on Bug JENKINS-51863
 
Re: EC2 Plugin: Connectivity issues

I think what may be happening:

  • initial WinRM connection is established (determined by seeing "This is a Windows agent" in the log)
  • initialization step where it is trying to set up jenkins on the instance (install the restarter and other things?)
    • before jenkins is fully initialized on the instance the jenkins monitoring system attempts to monitor the instance and fails because jenkins is not set up yet on it to receive and respond to the request

 

Maybe it is just a timing thing in the logs, as I'm not entirely sure what is being set up after the WinRM connection is established, but it seems like this could be it. I am thinking that the node shouldn't show as "online and functional" until it is fully initialized.

 

 

Log from the master that shows the monitoring turning off the agent:

Jun 13, 2018 2:02:18 PM WARNING hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor monitor
Failed to monitor ***** for Response Time
java.util.concurrent.TimeoutException
at hudson.remoting.Request$1.get(Request.java:316)
at hudson.remoting.Request$1.get(Request.java:240)
at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:96)
at hudson.node_monitors.ResponseTimeMonitor$1.monitor(ResponseTimeMonitor.java:59)
at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)

Jun 13, 2018 2:02:18 PM WARNING hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor monitor
Failed to monitor ***** for Free Swap Space
java.util.concurrent.TimeoutException
at hudson.remoting.Request$1.get(Request.java:316)
at hudson.remoting.Request$1.get(Request.java:240)
at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:96)
at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)

Jun 13, 2018 2:02:18 PM WARNING hudson.node_monitors.ResponseTimeMonitor$1 monitor
Making ***** offline because it is not responding
Jun 13, 2018 2:02:18 PM WARNING hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor monitor
Failed to monitor ***** for Free Temp Space
java.util.concurrent.TimeoutException
at hudson.remoting.Request$1.get(Request.java:316)
at hudson.remoting.Request$1.get(Request.java:240)
at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:96)
at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)

Jun 13, 2018 2:02:18 PM WARNING hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor monitor
Failed to monitor ***** for Free Disk Space
java.util.concurrent.TimeoutException
at hudson.remoting.Request$1.get(Request.java:316)
at hudson.remoting.Request$1.get(Request.java:240)
at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:96)
at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)

Jun 13, 2018 2:02:18 PM WARNING hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor monitor
Failed to monitor ***** for Clock Difference
java.util.concurrent.TimeoutException
at hudson.remoting.Request$1.get(Request.java:316)
at hudson.remoting.Request$1.get(Request.java:240)
at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:96)
at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)

Jun 13, 2018 2:02:18 PM WARNING hudson.remoting.Request$2 run
Failed to send back a reply to the request hudson.remoting.Request$2@585264c
java.io.IOException
at hudson.remoting.Channel.close(Channel.java:1447)
at hudson.remoting.Channel.close(Channel.java:1403)
at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:746)
at hudson.slaves.SlaveComputer.access$800(SlaveComputer.java:99)
at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:664)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@60a911aa:*****": channel is already closed
at hudson.remoting.Channel.send(Channel.java:717)
at hudson.remoting.Request$2.run(Request.java:382)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at org.jenkinsci.remoting.CallableDecorator.call(CallableDecorator.java:19)
at hudson.remoting.CallableDecoratorList$1.call(CallableDecoratorList.java:21)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Jun 13, 2018 2:02:18 PM WARNING hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor monitor
Failed to monitor ***** for Architecture
java.util.concurrent.TimeoutException
at hudson.remoting.Request$1.get(Request.java:316)
at hudson.remoting.Request$1.get(Request.java:240)
at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:96)
at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)

Jun 13, 2018 2:02:18 PM SEVERE hudson.plugins.ec2.win.winrm.WinRMClient sendRequest
I/O Exception in HTTP POST
java.io.IOException: Attempted read from closed stream.
at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:167)
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.Reader.read(Reader.java:140)
at org.apache.http.util.EntityUtils.toString(EntityUtils.java:247)
at org.apache.http.util.EntityUtils.toString(EntityUtils.java:291)
at hudson.plugins.ec2.win.winrm.WinRMClient.sendRequest(WinRMClient.java:264)
at hudson.plugins.ec2.win.winrm.WinRMClient.sendRequest(WinRMClient.java:191)
at hudson.plugins.ec2.win.winrm.WinRMClient.signal(WinRMClient.java:116)
at hudson.plugins.ec2.win.winrm.WindowsProcess.destroy(WindowsProcess.java:88)
at hudson.plugins.ec2.win.EC2WindowsLauncher$1.onClosed(EC2WindowsLauncher.java:73)
at hudson.remoting.Channel.terminate(Channel.java:1059)
at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1271)
at hudson.remoting.Channel$1.handle(Channel.java:565)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:87)

Jun 13, 2018 2:02:18 PM SEVERE hudson.remoting.Channel terminate
Listener hudson.plugins.ec2.win.EC2WindowsLauncher$1@42ad595d propagated an exception for channel hudson.remoting.Channel@60a911aa:*****s close: {2}
java.io.IOException: Attempted read from closed stream.
at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:167)
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.Reader.read(Reader.java:140)
at org.apache.http.util.EntityUtils.toString(EntityUtils.java:247)
at org.apache.http.util.EntityUtils.toString(EntityUtils.java:291)
at hudson.plugins.ec2.win.winrm.WinRMClient.sendRequest(WinRMClient.java:264)
Caused: hudson.plugins.ec2.win.winrm.RuntimeIOException: I/O Exception Attempted read from closed stream.
at hudson.plugins.ec2.win.winrm.WinRMClient.sendRequest(WinRMClient.java:279)
at hudson.plugins.ec2.win.winrm.WinRMClient.sendRequest(WinRMClient.java:191)
at hudson.plugins.ec2.win.winrm.WinRMClient.signal(WinRMClient.java:116)
at hudson.plugins.ec2.win.winrm.WindowsProcess.destroy(WindowsProcess.java:88)
at hudson.plugins.ec2.win.EC2WindowsLauncher$1.onClosed(EC2WindowsLauncher.java:73)
at hudson.remoting.Channel.terminate(Channel.java:1059)
at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1271)
at hudson.remoting.Channel$1.handle(Channel.java:565)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:87)
Add Comment Add Comment
 

dfrank@connectwise.com (JIRA)

unread,
Jun 19, 2018, 9:04:04 AM6/19/18
to jenkinsc...@googlegroups.com

It seems that if I increase the Boot Delay from 3 minutes (the suggested time per the documentation) to 7.5 minutes that the issue went away.

dfrank@connectwise.com (JIRA)

unread,
Jun 19, 2018, 9:05:02 AM6/19/18
to jenkinsc...@googlegroups.com
David Frank updated an issue
Change By: David Frank
Priority: Major Minor

bitwiseman@gmail.com (JIRA)

unread,
May 13, 2019, 10:44:02 AM5/13/19
to jenkinsc...@googlegroups.com

I believe this is fixed in Jenkins v2.177.
Please comment if this should be reopened.

This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

bitwiseman@gmail.com (JIRA)

unread,
May 13, 2019, 10:45:02 AM5/13/19
to jenkinsc...@googlegroups.com
Liam Newman resolved as Fixed
Change By: Liam Newman
Status: Open Resolved
Resolution: Fixed
Reply all
Reply to author
Forward
0 new messages