[JIRA] (JENKINS-55483) ec2-plugin Wrong order of workers termination

0 views
Skip to first unread message

polansky.j@seznam.cz (JIRA)

unread,
Jan 9, 2019, 9:44:01 AM1/9/19
to jenkinsc...@googlegroups.com
Jiri Polansky created an issue
 
Jenkins / Bug JENKINS-55483
ec2-plugin Wrong order of workers termination
Issue Type: Bug Bug
Assignee: FABRIZIO MANFREDI
Attachments: job.log
Components: ec2-plugin
Created: 2019-01-09 14:43
Environment: ec2-plugin 1.41
Labels: ec2-plugin ec2 EC2 aws
Priority: Major Major
Reporter: Jiri Polansky

Dear colleagues, we found issue when we're using ec2-plugin. Problem appears when aws spot instance (jenkins slave) is terminating because of "Idle termination timeout".
EC2 plugin tries to first Cancel and Terminate AWS spot worker(slave) and after that remove node from Jenkins.

AWS instance Cancel and Termination process takes longer period and during this time Jenkins can try to build any new job on this let's say "available" node. Job failed because node is already in terminating state within AWS.

The better handling of node termination should be - put the node offline and after that cancel and remove it from aws.

1. "put the node offline or disconnect" (I dont know exact method)
2. ec2.cancelSpotInstanceRequests(...)
3. ec2.terminateInstances(...)
4. Jenkins.getInstance().removeNode(...)

Please see attached job.log file where you can see end of failed job.

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

fabrizio.manfredi@gmail.com (JIRA)

unread,
Jan 9, 2019, 9:59:08 AM1/9/19
to jenkinsc...@googlegroups.com
FABRIZIO MANFREDI commented on Bug JENKINS-55483
 
Re: ec2-plugin Wrong order of workers termination

Hi, are you sure that the spot instance has not be retired by AWS ?

I have to check the spot instance code, because is a bit different from the on-demand, but it should not possible to assign any new jobs to a node that reached the idle timeout.

polansky.j@seznam.cz (JIRA)

unread,
Jan 9, 2019, 10:17:02 AM1/9/19
to jenkinsc...@googlegroups.com
Jiri Polansky edited a comment on Bug JENKINS-55483
Spot instance was terminated by ec2 plugin (Client.UserInitiatedShutdown: User initiated). Node was available for some moment during instance spot termination and node removal from Jenkins. Therefore a new job was assign assigned .

polansky.j@seznam.cz (JIRA)

unread,
Jan 9, 2019, 10:17:02 AM1/9/19
to jenkinsc...@googlegroups.com

Spot instance was terminated by ec2 plugin (Client.UserInitiatedShutdown: User initiated). Node was available for some moment during instance spot termination and node removal from Jenkins. Therefore a new job was assign.

polansky.j@seznam.cz (JIRA)

unread,
Jan 9, 2019, 10:22:01 AM1/9/19
to jenkinsc...@googlegroups.com
Jiri Polansky edited a comment on Bug JENKINS-55483
Spot instance was terminated by ec2 plugin ( Client.UserInitiatedShutdown Event name : User initiated  CancelSpotInstanceRequests ). Node was available for some moment during instance spot termination and node removal from Jenkins. Therefore a new job was assigned.

polansky.j@seznam.cz (JIRA)

unread,
Jan 9, 2019, 10:28:02 AM1/9/19
to jenkinsc...@googlegroups.com
Jiri Polansky updated an issue
 
Change By: Jiri Polansky
Attachment: AwsCloudTrail.log

fabrizio.manfredi@gmail.com (JIRA)

unread,
Aug 10, 2019, 4:12:02 PM8/10/19
to jenkinsc...@googlegroups.com
FABRIZIO MANFREDI closed an issue as Fixed
Change By: FABRIZIO MANFREDI
Status: Open Closed
Resolution: Fixed
Released As: 1.45
Reply all
Reply to author
Forward
0 new messages