In my AWS setup, we have two separate clusters – one for master, another for agent. But both are in same vpc, subnet, and have security group assigned. Other than this, there is no other fancy stuff going on here, like, ELB, SSL certs (all communication happens over http, its a private setup).
1) I have configured ecs plugin – jenkins agent cluster is used here with url pointing to private ip of master, added simple cloudbees template with task ARN and given the template a label that can be used in the pipeline.
2) When the pipeline is triggered, master, using the plugin config, spins up a new task dynamically and starts a container in one of the instances attached to agent cluster. But, the agent to master connection is broken with the error -- snippet of docker logs from the slave agent where the ecs task spins up a new container.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Logs from the Docker container running in ECS agent >>>>>>>>>>>>>>>>>>>>>>>>
INFO: Agent discovery successful
Agent address: 172.31.44.131
Agent port: 50000
Identity: a6:ed:4e:67:6d:e8:0e:53:32:51:8a:b5:80:06:4a:83
Mar 24, 2018 7:08:50 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Mar 24, 2018 7:08:50 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to 172.31.44.131:50000
07:08:50.931 INFO - Using the passthrough mode handler
Mar 24, 2018 7:08:50 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP4-connect
2018-03-24 07:08:50.974:INFO:osjs.Server:main: jetty-9.4.5.v20170502
2018-03-24 07:08:51.028:WARN:osjs.SecurityHandler:main: Servlet...@o.s.j.s.ServletContextHandler@5a8e6209{/,null,STARTING} has uncovered http methods for path: /
2018-03-24 07:08:51.038:INFO:osjsh.ContextHandler:main: Started o.s.j.s.ServletContextHandler@5a8e6209{/,null,AVAILABLE}
2018-03-24 07:08:51.076:INFO:osjs.AbstractConnector:main: Started ServerConnector@e4423f5{HTTP/1.1,[http/1.1]}{0.0.0.0:4444}
2018-03-24 07:08:51.077:INFO:osjs.Server:main: Started @857ms
07:08:51.077 INFO - Selenium Server is up and running
Mar 24, 2018 7:08:51 AM org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer onRecv
SEVERE: [JNLP4-connect connection to ip-172-31-44-131.us-west-2.compute.internal/172.31.44.131:50000]
javax.net.ssl.SSLHandshakeException: General SSLEngine problem
at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1478)
at sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:535)
at sun.security.ssl.SSLEngineImpl.writeAppRecord(SSLEngineImpl.java:1214)
at sun.security.ssl.SSLEngineImpl.wrap(SSLEngineImpl.java:1186)
at javax.net.ssl.SSLEngine.wrap(SSLEngine.java:469)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processRead(SSLEngineFilterLayer.java:392)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecv(SSLEngineFilterLayer.java:117)
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecv(ProtocolStack.java:669)
at org.jenkinsci.remoting.protocol.impl.AckFilterLayer.onRecv(AckFilterLayer.java:255)
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecv(ProtocolStack.java:669)
at org.jenkinsci.remoting.protocol.NetworkLayer.onRead(NetworkLayer.java:136)
at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$2200(BIONetworkLayer.java:48)
at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:283)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at hudson.remoting.Engine$1$1.run(Engine.java:98)
at java.lang.Thread.run(Thread.java:748)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
================= Logs from the master ==========================
Mar 24, 2018 7:07:40 AM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
INFO: Started provisioning ECS Slave ecs-build from ecs-agent-cluster with 1 executors. Remaining excess workload: 0
Mar 24, 2018 7:07:40 AM com.cloudbees.jenkins.plugins.amazonecs.ECSService waitForSufficientClusterResources
INFO: Found 2 instances
Mar 24, 2018 7:07:40 AM com.cloudbees.jenkins.plugins.amazonecs.ECSService waitForSufficientClusterResources
INFO: Resources found in instance arn:aws:ecs:us-west-2:316675405544:container-instance/036e70cb-9c10-4c65-a2f4-2561aa77ac7b: [{Name: CPU,Type: INTEGER,DoubleValue: 0.0,LongValue: 0,IntegerValue: 2048,StringSetValue: []}, {Name: MEMORY,Type: INTEGER,DoubleValue: 0.0,LongValue: 0,IntegerValue: 3952,StringSetValue: []}, {Name: PORTS,Type: STRINGSET,DoubleValue: 0.0,LongValue: 0,IntegerValue: 0,StringSetValue: [22, 2376, 2375, 51678, 51679]}, {Name: PORTS_UDP,Type: STRINGSET,DoubleValue: 0.0,LongValue: 0,IntegerValue: 0,StringSetValue: []}]
Mar 24, 2018 7:07:40 AM com.cloudbees.jenkins.plugins.amazonecs.ECSService waitForSufficientClusterResources
INFO: Instance arn:aws:ecs:us-west-2:316675405544:container-instance/036e70cb-9c10-4c65-a2f4-2561aa77ac7b has 3,952mb of free memory. 1,024mb are required
Mar 24, 2018 7:07:40 AM com.cloudbees.jenkins.plugins.amazonecs.ECSService waitForSufficientClusterResources
INFO: Instance arn:aws:ecs:us-west-2:316675405544:container-instance/036e70cb-9c10-4c65-a2f4-2561aa77ac7b has 2,048 units of free cpu. 1,024 units are required
Mar 24, 2018 7:07:40 AM com.cloudbees.jenkins.plugins.amazonecs.ECSCloud$ProvisioningCallback call
INFO: Created Slave: ecs-agent-cluster-1ddce183296b9
Mar 24, 2018 7:07:40 AM com.cloudbees.jenkins.plugins.amazonecs.ECSService registerTemplate
INFO: Created Task Definition: arn:aws:ecs:us-west-2:316675405544:task-definition/ecs-agent-cluster-jenkins-agent-ecs:1
Mar 24, 2018 7:07:41 AM com.cloudbees.jenkins.plugins.amazonecs.ECSCloud$ProvisioningCallback call
INFO: Slave ecs-agent-cluster-1ddce183296b9 - Slave Task Started : arn:aws:ecs:us-west-2:316675405544:task/63a3ee05-ce2a-4492-aa0a-0f13d7d94b59
=================
3) Am using JNLP 4 with 50000 as port #. And, the security group is set to allow traffic on ports 80, 50000.
Would appreciate any pointers to understand what's going wrong here ?
/Ram