unable to launch remoting agent on slaves

26 views
Skip to first unread message

Seth Galitzer

unread,
Aug 28, 2019, 1:46:46 PM8/28/19
to Jenkins Users
For the last two weeks, I cannot launch the remoting agent on linux slaves. Server version is 2.191, running on Debian 9.9 (stretch), installed from jenkins.io repo. Slaves are Ubuntu 18.04 (bionic), with openjdk-8 installed. Eventually, one slave will start, but none of the rest will. Between reboots or restarts of the jenkins server, the slave that successfully connects is different each time. There are 22 linux slaves total. Working directory is on local disk for each slave. SSH user is from LDAP.

Can somebody help me figure out what is blocking the start of the agents?

Thanks.
Seth

Sample server log:
2019-08-28 16:11:35.107+0000 [id=740] SEVERE hudson.slaves.ChannelPinger#install: Failed to set up a ping for linux64-santos13-minion
java.io.IOException: Closing all channels
at com.trilead.ssh2.channel.Channel.setReasonClosed(Channel.java:333)
at com.trilead.ssh2.channel.ChannelManager.closeChannel(ChannelManager.java:289)
at com.trilead.ssh2.channel.ChannelManager.closeAllChannels(ChannelManager.java:269)
at com.trilead.ssh2.Connection.close(Connection.java:536)
at com.trilead.ssh2.Connection.close(Connection.java:530)
at hudson.plugins.sshslaves.SSHLauncher.cleanupConnection(SSHLauncher.java:511)
at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:484)
at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:297)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
Caused: java.io.IOException: SSH channel is closed
at com.trilead.ssh2.channel.ChannelManager.ioException(ChannelManager.java:1540)
at com.trilead.ssh2.channel.ChannelManager.sendData(ChannelManager.java:373)
at com.trilead.ssh2.channel.ChannelOutputStream.write(ChannelOutputStream.java:63)
at com.trilead.ssh2.channel.ChannelOutputStream.write(ChannelOutputStream.java:68)
at hudson.remoting.ChunkedOutputStream.sendFrame(ChunkedOutputStream.java:89)
at hudson.remoting.ChunkedOutputStream.sendBreak(ChunkedOutputStream.java:62)
at hudson.remoting.ChunkedCommandTransport.writeBlock(ChunkedCommandTransport.java:46)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.write(AbstractSynchronousByteArrayCommandTransport.java:46)
at hudson.remoting.Channel.send(Channel.java:721)
at hudson.remoting.Request.call(Request.java:213)
at hudson.remoting.Channel.call(Channel.java:954)
at hudson.slaves.ChannelPinger.install(ChannelPinger.java:115)
at hudson.slaves.ChannelPinger.preOnline(ChannelPinger.java:98)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:667)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:435)
at hudson.plugins.sshslaves.SSHLauncher.startAgent(SSHLauncher.java:607)
at hudson.plugins.sshslaves.SSHLauncher.access$400(SSHLauncher.java:113)
at hudson.plugins.sshslaves.SSHLauncher$1.call(SSHLauncher.java:441)
at hudson.plugins.sshslaves.SSHLauncher$1.call(SSHLauncher.java:406)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-08-28 16:11:35.107+0000 [id=846] INFO h.r.SynchronousCommandTransport$ReaderThread#run: I/O error in channel linux64-santos14-minion
java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2681)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3156)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
at hudson.remoting.Command.readFrom(Command.java:140)
at hudson.remoting.Command.readFrom(Command.java:126)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)

Sample remoting.log on slave:
Aug 28, 2019 11:11:35 AM hudson.remoting.SynchronousCommandTransport$ReaderThread run
INFO: I/O error in channel channel
java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused by: java.io.EOFException
at java.base/java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2763)
at java.base/java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3258)
at java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:873)
at java.base/java.io.ObjectInputStream.<init>(ObjectInputStream.java:350)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
at hudson.remoting.Command.readFrom(Command.java:140)
at hudson.remoting.Command.readFrom(Command.java:126)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)

Seth Galitzer

unread,
Aug 29, 2019, 4:07:38 AM8/29/19
to jenkins...@googlegroups.com
I found a solution to this shortly after posting my question. It seems
jenkins uses a cache directory in the SSH users homedir for caching of
jars when the agent runs. All of my slaves use the same SSH account to
launch the agent. That account authenticates using LDAP and has a
homedir on a central fileserver that all the slave nodes connect to. So
when the first connection is established, it seems to be locking the
cache dir, blocking subsequent use of it.

This appears to be new behavior, as we have been using this setup for
years without trouble. The solution was to set -Duser.home=<working dir>
in the JVM Options in advanced configuration for the node. Since
<working dir> is always on the local disk of the slave node, there
should be no further file locking problems.

It would have saved me some pain if the error logging had been more
descriptive to identify this problem. "I/O error" or "null" was not
super useful in debugging this. And since we haven't had problems in the
past, it didn't occur to me that this might be an issue. Regardless, it
seems to be resolved now and hopefully this post will help somebody in
the future.

Seth

On 8/28/19 11:34 AM, Seth Galitzer wrote:
> For the last two weeks, I cannot launch the remoting agent on linux
> slaves. Server version is 2.191, running on Debian 9.9 (stretch),
> installed from jenkins.io repo. Slaves are Ubuntu 18.04 (bionic), with
> openjdk-8 installed. Eventually, one slave will start, but none of the
> rest will. Between reboots or restarts of the jenkins server, the slave
> that successfully connects is different each time. There are 22 linux
> slaves total. Working directory is on local disk for each slave. SSH
> user is from LDAP.
>
> Can somebody help me figure out what is blocking the start of the agents?
>
> Thanks.
> Seth
>
> Sample server log:
> 2019-08-28 16:11:35.107+0000
> [id=740]SEVEREhudson.slaves.ChannelPinger#install: Failed to set up a
> [id=846]INFOh.r.SynchronousCommandTransport$ReaderThread#run: I/O error
> --
> You received this message because you are subscribed to the Google
> Groups "Jenkins Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jenkinsci-use...@googlegroups.com
> <mailto:jenkinsci-use...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jenkinsci-users/c0815def-cf96-4fec-b3cb-c3775da9c031%40googlegroups.com
> <https://groups.google.com/d/msgid/jenkinsci-users/c0815def-cf96-4fec-b3cb-c3775da9c031%40googlegroups.com?utm_medium=email&utm_source=footer>.


--
Seth Galitzer
The beatings will continue until morale has improved.

Ivan Fernandez Calvo

unread,
Aug 29, 2019, 2:06:32 PM8/29/19
to Jenkins Users
SSH-slaves plugin does not recommend to use the same user on the same host for several agent connections because it would share the remoting and the java cache folders https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#overall-recommendations it would never was recommended. In your case you share home folder on a remote filesystem, that it is mostly the same. Also the use of a remote filesystem for the working folder would impact in the performance of any job that you run on those agents, it would be a bottleneck, it has to support all the IO operations of all your simultaneous running jobs, and those remoting filesystem use to not perform as fast as an hard disk.
Reply all
Reply to author
Forward
0 new messages