Jenkins slave (EC2 AWS plugin) fails occasionally when build takes too long with error "Caused: java.io.IOException: Unexpected termination of the channel"

104 views
Skip to first unread message

Lax Clarke

unread,
Sep 11, 2017, 9:32:30 PM9/11/17
to Jenkins Users
I have Jenkins configured to launch EC2 instances (via AWS plugin) to execute a build.
The actual build steps use the Execute Shell method, and launch ansible scripts.

I find that if the ansible script runs for too long, the Jenkins slave on the EC2 system goes down.
The Jenkins GUI shows this error:

Connection was broken

java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:349)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:73)



It suggests I check these log files:
1) connection logs on master - I checked these, and they have the same information as stack trace.
2) slave logs - I do not know where to find these.  The locations suggested do not exist on my slave.  

I even inspected the files that the slave java process opens, and the only file that seems like a log file ends up being empty:

ubuntu@ip-172-31-93-175:~/support$ ps auxww | grep java | grep slave
ubuntu    25235  8.0  0.5 12211452 157616 ?     Ssl  22:23   0:05 java -jar /tmp/slave.jar
ubuntu@ip-172-31-93-175:~/support$ sudo lsof | grep 25235  | grep -i log | awk '{print $NF}' | sort | uniq
/home/ubuntu/support/all_2017-09-11_22.23.07.log
ubuntu@ip-172-31-93-175:~/support$ cat /home/ubuntu/support/all_2017-09-11_22.23.07.log 
ubuntu@ip-172-31-93-175:~/support$ 



Has anyone run into this issue or know how or where to find the java slave logs? 

Thanks so much.


VERSION INFO:
Master:
Jenkins ver. 2.60.2
Java:  openjdk version "1.8.0_131"

Build node:
Launched via: AWS plugin: https://wiki.jenkins.io/display/JENKINS/Amazon+EC2+Plugin (latest version)

Ec2 instance running Ubuntu 14.04 and Java 8:
ubuntu@ip-172-31-93-175:~$ cat /etc/*release* | grep VERSION
VERSION="14.04.5 LTS, Trusty Tahr"
VERSION_ID="14.04"
ubuntu@ip-172-31-93-175:~$ java -version
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)





Joshua Noble

unread,
Sep 12, 2017, 12:50:26 PM9/12/17
to Jenkins Users
I've had great success with the EC2 plugin. Might I suggest using Hashicorp Packer to build an AMI, and have the EC2 plugin launch that AMI? That will let you remove Ansible (or any provision tool) out of the equation and save on agent provisioning time. I know that when our cluster scales up another node using the EC2 plugin, we generally need that node online ASAP.

K S

unread,
Sep 12, 2017, 11:16:45 PM9/12/17
to Jenkins Users
The ansible script is minimal and runs a yocto build process.

Are you suggesting ansible interacting with this plugin could be a problem?

Mike

unread,
Sep 14, 2017, 3:31:08 PM9/14/17
to Jenkins Users
I added ClientAliveInterval and ClientAliveCountMax parameters to the sshd configuration on our Jenkins agents to help prevent disconnects.  I also removed the monitoring plugin since we didn't use it.  I had noticed JavaMelody errors in the Jenkins log file at the same time the Jenkins agent disconnected.
Reply all
Reply to author
Forward
0 new messages