GOCD AWS ECS Elastic Agent allocation is falling

42 views
Skip to first unread message

pradeep devaraj

unread,
Sep 2, 2024, 12:25:48 PMSep 2
to go-cd
We are using a GOCD AWS ECS elastic agent plugin. 
GOCD version: GoCD Version: 23.4.0 

GoCD Elastic Agent Plugin for Amazon ECS
  • Version7.3.0-416





AMI id: ami-0ba9fb6bc8faf1fe0


Elastic instance is coming up and its not getting assigned to ECS cluster, we logged in to server and found the blow error.

[root@ip-******* ~]# systemctl restart docker
Job for docker.service failed because start of the service was attempted too often. See "systemctl status docker.service" and "journalctl -xe" for details.
To force a start use "systemctl reset-failed docker.service" followed by "systemctl start docker.service" again.
[root@ip- *******   ~]# journalctl -xe
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ecs.service has finished shutting down.
Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: start request repeated too quickly for docker.service
Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: Failed to start Docker Application Container Engine.
-- Subject: Unit docker.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit docker.service has failed.
--
-- The result is failed.
Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: docker.service failed.
Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: Starting Amazon Elastic Container Service - container agent...
-- Subject: Unit ecs.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ecs.service has begun starting up.
Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: ecs.service: control process exited, code=exited status=1
Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon amazon-ecs-init[6236]: level=info time=2024-09-02T16:03:20Z msg="post-stop"
Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon amazon-ecs-init[6236]: level=info time=2024-09-02T16:03:20Z msg="Cleaning up the credentials endpoint setup for Amazon El
Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon amazon-ecs-init[6236]: level=error time=2024-09-02T16:03:20Z msg="Error performing action 'delete' for iptables route: ex
Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon amazon-ecs-init[6236]: level=error time=2024-09-02T16:03:20Z msg="Error performing action 'delete' for iptables route: ex
Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon amazon-ecs-init[6236]: level=error time=2024-09-02T16:03:20Z msg="Error performing action 'delete' for iptables route: ex
Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon amazon-ecs-init[6236]: level=error time=2024-09-02T16:03:20Z msg="Error performing action 'delete' for iptables route: ex
Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: Failed to start Amazon Elastic Container Service - container agent.
-- Subject: Unit ecs.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ecs.service has failed.
--
-- The result is failed.
Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: Unit ecs.service entered failed state.
Sep 02 16:03:20 ip-10-226-11-63.aws.cloud.epsilon systemd[1]: ecs.service failed.

 

[root@ipXXXX ~]# df -hT
Filesystem     Type      Size  Used Avail Use% Mounted on
devtmpfs       devtmpfs  7.7G     0  7.7G   0% /dev
tmpfs          tmpfs     7.7G     0  7.7G   0% /dev/shm
tmpfs          tmpfs     7.7G  376K  7.7G   1% /run
tmpfs          tmpfs     7.7G     0  7.7G   0% /sys/fs/cgroup
/dev/nvme0n1p1 xfs       100G  2.4G   98G   3% /
tmpfs          tmpfs     1.6G     0  1.6G   0% /run/user/0
[root@ip-10-226-11-63 ~]# docker --version
Docker version 25.0.5, build 5dc9bcc

BELOW User data script we are using and getting excited while spinning up an error. 

"ECS_INSTANCE_ATTRIBUTES={"server-id":"31e424ad-e242-45d2-a5bb-0ef7be0d8306"} EOT echo 'File /etc/ecs/ecs.config successfully created.' log "Finished executing GoCD's user data script, now executing custom user data script from use, if present." #!/bin/bash echo "ECS_CLUSTER=GoCD-ECS-UAT"  >> /etc/ecs/ecs.config log "Finished executing user specified user data script." --// #cloud-config cloud_final_modules: - [scripts-user, always] --// Content-Type: text/x-shellscript; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="initialize_instance_store" #!/bin/bash exec > >(tee /var/log/initialize_instance_store.log | logger -t user-data -s 2>/dev/console) 2>&1 function log() {     echo "[$(date "+%Y-%m-%d %H:%M:%S")] - $1" >> /var/log/initialize_instance_store.log } function try() {    $@    return 0 } log "Starting to setup instance store for the docker." INSTANCE_STORES=$(ls /dev/disk/by-id/*EC2_NVMe_Instance_Storage*-ns-1) if [ -z "${INSTANCE_STORES}" ]; then     log "No instance store detected." fi VOLUMES="$INSTANCE_STORES" if [ -e "/dev/xvdcz" ]; then     log "Instance has /dev/xvdcz EBS volume. Using it for docker logical volume group."     VOLUMES="$VOLUMES /dev/xvdcz" fi if [ -z "${VOLUMES}" ]; then     log "No addition volumes. Using box standard docker setup." else     log "Available instance stores: ${VOLUMES}."     log "Setting up the docker logical volume group."     service docker stop     rm -rf /var/lib/docker/*     dmsetup remove_all     VOLUME_GROUP=docker     LOGICAL_VOLUME=docker-pool     try vgremove -y "${VOLUME_GROUP}"     try lvremove -y "${LOGICAL_VOLUME}"     vgcreate -y "${VOLUME_GROUP}" ${VOLUMES}     sleep 2     lvcreate -y -l 5%VG -n ${LOGICAL_VOLUME}\meta ${VOLUME_GROUP}     lvcreate -y -l 90%VG -n ${LOGICAL_VOLUME} ${VOLUME_GROUP}     sleep 2     lvconvert -y --zero n --thinpool ${VOLUME_GROUP}/${LOGICAL_VOLUME} --poolmetadata ${VOLUME_GROUP}/${LOGICAL_VOLUME}\meta     echo 'DOCKER_STORAGE_OPTIONS=" --storage-driver devicemapper --storage-opt dm.thinpooldev=/dev/mapper/docker-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true --storage-opt dm.fs=ext4 --storage-opt dm.use_deferred_deletion=true"' > /etc/sysconfig/docker-storage     test -f /bin/systemctl && systemctl reset-failed docker.service     service docker restart     test -f /bin/systemctl && systemctl enable --no-block --now ecs fi log "Setup completed." --//"

pradeep devaraj

unread,
Sep 2, 2024, 1:51:06 PMSep 2
to go-cd
Adding++

we are getting the agnet creation and deletion in loop 
[go] Received a request to create an agent for the job: [SpecOps_UAT_Elastic_Img_crt/6/test/1/test]
[go] No running instance(s) found to build the ECS Task to perform current job.
[go] Creating a new container instance to schedule ECS Task.
[go] Waiting for instance(s) ([i-061187c3d2ea07317]) to register with cluster.
[go] Received a request to create an agent for the job: [SpecOps_UAT_Elastic_Img_crt/6/test/1/test]
[go] No running instance(s) found to build the ECS Task to perform current job.
[go] Creating a new container instance to schedule ECS Task.
[go] Waiting for instance(s) ([i-00bb68d594121ab15]) to register with cluster.
[go] Received a request to create an agent for the job: [SpecOps_UAT_Elastic_Img_crt/6/test/1/test]
[go] No running instance(s) found to build the ECS Task to perform current job.
[go] Creating a new container instance to schedule ECS Task.

pradeep devaraj

unread,
Sep 3, 2024, 7:23:52 AMSep 3
to go-cd
Hi Team / Chad Wilson.

Docker service and ECS service is failing when new server comes up. AMI id: ami-0a5f593ecaa0f722d  community one.  when we manully  spin the server and attach via ASG it's registering to cluster. when we try the same from gocd ecs cluster profile(AWS ECS ELastic plugin) it's not working and Docker service and ECS service is failing.



Sriram Narayanan

unread,
Sep 3, 2024, 8:36:02 AMSep 3
to go...@googlegroups.com
( I am ill so please excuse the limited questions)
- does the ECS consumer get created and registered if you remove the user data script?
- what changed between when this ECS used to work vs now?

— Sriram

--
You received this message because you are subscribed to the Google Groups "go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/go-cd/763a2904-4962-4c8b-ae2a-b8bf72701e5bn%40googlegroups.com.

pradeep devaraj

unread,
Sep 3, 2024, 9:55:00 AMSep 3
to go-cd
Hi  Sriram,

- does the ECS consumer get created and registered if you remove the user data script? : yes
    We have taken marketplace AMI:  ami-0a5f593ecaa0f722d ,  if we create the server manually via the launch template and added to the ECS cluster, its works. the same step if we are doing it via from GOCD - GoCD Elastic Agent Plugin for Amazon ECS. its failing and ocker, ECS is not running. 
Docker version: D
ocker version 25.0.5, build 5dc9bcc

- what changed between when this ECS used to work vs now?
nothing has changed, it was working till last Thursday night 

Chad Wilson

unread,
Sep 4, 2024, 4:02:47 AMSep 4
to go...@googlegroups.com
Something must have changed, e.g you changed AMI, or when instances start they now upgrade pre-installed software during cloud-init to different versions of pre-installed tools. In future, you need to share the specific name of the AMI, the release date and the region etc - an AMI ID on its own is not useful to look up.

The plugin doesn't work with Docker 25, so I doubt it was using the same AMI before - did you see https://github.com/gocd/gocd-ecs-elastic-agent/issues/345 ? You'll have to find/use an Amazon Linux 2 (not 2023) AMI which still has Docker 20.10 pre-installed until the plugin can be modified to support Docker 25.

According to https://alas.aws.amazon.com/announcements/2024-009.html as of September 3 a yum upgrade --security on AL2 will cause Docker to upgrade to Docker 25, which would break the plugin. Likely if you are using a new ECS AMI it is pre-upgraded. However additionally, the last AL2 AMI that will work is https://github.com/aws/amazon-ecs-ami/releases/tag/20240625

Any Amazon Linux 2 ECS AMIs newer than 2024-06-05 will not work, as Docker has been upgraded to v25:
Since the plugin is still working for https://build.gocd.org which uses the ECS plugin, it's definitely possible to have it work - but it does mean using an unpatched ECS image, or managing the patching yourself to upgrade everything except Docker.

-Chad



Chad Wilson

unread,
Sep 4, 2024, 4:16:32 AMSep 4
to go...@googlegroups.com
With some trial and error, seems these are us-east-1 AMIs. The last one you shared is indeed too new - this wont work. (2024-08-21).

{
"body": {
"VirtualizationType": "hvm",
"Description": "Amazon Linux AMI 2.0.20240821 x86_64 ECS HVM GP2",
"Hypervisor": "xen",
"ImageOwnerAlias": "amazon",
"EnaSupport": true,
"SriovNetSupport": "simple",
"ImageId": "ami-0a5f593ecaa0f722d",
"State": "available",
"BlockDeviceMappings": [
{
"DeviceName": "/dev/xvda",
"Ebs": {
"DeleteOnTermination": true,
"SnapshotId": "snap-0dc7b37b7792952a7",
"VolumeSize": 30,
"VolumeType": "gp2",
"Encrypted": false
}
}
],
"Architecture": "x86_64",
"ImageLocation": "amazon/amzn2-ami-ecs-hvm-2.0.20240821-x86_64-ebs",
"RootDeviceType": "ebs",
"OwnerId": "591542846629",
"RootDeviceName": "/dev/xvda",
"CreationDate": "2024-08-22T20:53:11.000Z",
"Public": true,
"ImageType": "machine",
"Name": "amzn2-ami-ecs-hvm-2.0.20240821-x86_64-ebs"
}
}

The earlier one you shared is this one: (2024-02-01)

{
"body": {
"VirtualizationType": "hvm",
"Description": "Amazon Linux AMI 2.0.20240201 x86_64 ECS HVM GP2",
"Hypervisor": "xen",
"ImageOwnerAlias": "amazon",
"EnaSupport": true,
"SriovNetSupport": "simple",
"ImageId": "ami-0ba9fb6bc8faf1fe0",
"State": "available",
"BlockDeviceMappings": [
{
"DeviceName": "/dev/xvda",
"Ebs": {
"DeleteOnTermination": true,
"SnapshotId": "snap-0ca36cd61121c93d2",
"VolumeSize": 30,
"VolumeType": "gp2",
"Encrypted": false
}
}
],
"Architecture": "x86_64",
"ImageLocation": "amazon/amzn2-ami-ecs-hvm-2.0.20240201-x86_64-ebs",
"RootDeviceType": "ebs",
"OwnerId": "591542846629",
"RootDeviceName": "/dev/xvda",
"CreationDate": "2024-02-03T00:52:53.000Z",
"Public": true,
"ImageType": "machine",
"Name": "amzn2-ami-ecs-hvm-2.0.20240201-x86_64-ebs"
}
}

This second one might work, as it at least had Docker 20.10 on it, but since you have shared two different AMIs and I'm not sure which log is from which, I don't know what the problem is here.

https://build.gocd.org is using amzn2-ami-ecs-kernel-5.10-hvm-2.0.20240625-x86_64-ebs so this one definitely works. Find the AMI ID for your region (us-east-1 it seems) and try that?

-Chad
Reply all
Reply to author
Forward
0 new messages