[JIRA] (JENKINS-61603) Handle Spot Instance Interruption

6 views
Skip to first unread message

jhansche@meetme.com (JIRA)

unread,
Mar 20, 2020, 2:29:02 PM3/20/20
to jenkinsc...@googlegroups.com
Joe Hansche created an issue
 
Jenkins / Bug JENKINS-61603
Handle Spot Instance Interruption
Issue Type: Bug Bug
Assignee: FABRIZIO MANFREDI
Components: ec2-plugin
Created: 2020-03-20 18:28
Labels: feature-request
Priority: Minor Minor
Reporter: Joe Hansche

According to the Spot Instance documentation, the instance will be notified (best effort) approximately 2 minutes before terminating a spot instance.

Currently when the spot instance is being terminated, it will simply interrupt any executing builds, leading to a build failure, and then we have to restart the build. Additionally, the slave's executors remain online during the 2-minute-warning period, so they are available to take new builds, even though it will be terminated imminently.

By monitoring the instance-action metadata (or CloudWatch events), we can receive notice that AWS is about to terminate a spot instance, and let the master react by taking the remaining executors offline (i.e., similar to the "Mark this node temporarily offline" button in the node status screen). That will do two things:

  1. Give visual notice in the Jenkins UI, that a slave is intentionally going offline
  2. Prevents any additional jobs being scheduled on that slave, allowing the built-in scheduling to route it to another online host, or possibly bring up a new instance to take its place

We can then add configuration options to the SlaveTemplate to forcefully abort (setting status=Result.ABORTED) an executing job when we get notified that the slave will be terminated, so that the build status can reflect what actually happened.

Some ideas for configuration options and jelly template:

[  ] Monitor for spot instance interruption notifications  [default=false, making it opt-in]
        (?) help links to https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html#spot-instance-termination-notices
<block  when checkbox is checked>
    Polling Interval (in seconds):  [Default = 5, based on documented recommendation]
    On Terminate / Stop:   [select multiple actions, default=Do Nothing]
    <!-- List of actions include:  Do nothing;  Take slave offline;  Abort builds; ...? -->
    <!-- Abort build options: -->
            Abort Builds
                When to abort:  [Immediately;  N seconds after notice;  N seconds before termination deadline]
                <!-- Could be handled similar to "Idle Timeout", for example:
                        "0 => immediately"
                        "15 => 15 seconds after notice
                        "-15 => 15 seconds before termination deadline
                  -->
</block>

I haven't yet looked into what the implementation would look like, but if I get a chance, I will look into it and see if I can get a PR together.

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.13.12#713012-sha1:6e07c38)
Atlassian logo

nigel.armstrong@braincorp.com (JIRA)

unread,
Apr 3, 2020, 2:12:03 AM4/3/20
to jenkinsc...@googlegroups.com
Nigel Armstrong commented on Bug JENKINS-61603
 
Re: Handle Spot Instance Interruption

FYI, the [ec2-fleet plugin|https://github.com/jenkinsci/ec2-fleet-plugin] already monitors for interruption. So maybe porting some code from this would be a start.

raihaan.shouhell@autodesk.com (JIRA)

unread,
Apr 26, 2020, 10:14:02 PM4/26/20
to jenkinsc...@googlegroups.com
Raihaan Shouhell updated an issue
 
Jenkins / New Feature JENKINS-61603
Handle Spot Instance Interruption
Change By: Raihaan Shouhell
Issue Type: Bug New Feature
Reply all
Reply to author
Forward
0 new messages