The Reimage and Restart repair actions are currently in PREVIEW.See the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability. Some aspects of this feature may change prior to general availability (GA).
Enabling automatic instance repairs for Azure Virtual Machine Scale Sets helps achieve high availability for applications by maintaining a set of healthy instances. If an unhealthy instance is found by Application Health extension or Load balancer health probes, automatic instance repairs will attempt to recover the instance by triggering repair actions such as deleting the unhealthy instance and creating a new one to replace it, reimaging the unhealthy instance (Preview), or restarting the unhealthy instance (Preview).
The scale set should have application health monitoring for instances enabled. Health monitoring can be done using either Application Health extension or Load balancer health probes, where only one can be enabled at a time. The application health extension or the load balancer probes ping the application endpoint configured on virtual machine instances to determine the application health status. This health status is used by the scale set orchestrator to monitor instance health and perform repairs when required.
Before enabling automatic instance repairs policy, ensure that your scale set instances have an application endpoint configured to emit the application health status. To configure health status on Application Health extension, you can use either Binary Health States or Rich Health States. To configure health status using Load balancer health probes, see probe up behavior.
For instances marked as "Unhealthy" or "Unknown" (Unknown state is only available with Application Health extension - Rich Health States), automatic repairs are triggered by the scale set. Ensure the application endpoint is correctly configured before enabling the automatic repairs policy in order to avoid unintended instance repairs, while the endpoint is getting configured.
Automatic repairs currently do not support scenarios where a VM instance is marked Unhealthy due to a provisioning failure. VMs must be successfully initialized to enable health monitoring and automatic repair capabilities.
The repairAction setting, is currently under PREVIEW and not suitable for production workloads. To preview the Restart and Reimage repair actions, you must register your Azure subscription with the AFEC flag AutomaticRepairsWithConfigurableRepairActions and your compute API version must be 2021-11-01 or higher.For more information, see feature registration.
Replace deletes the unhealthy instance and creates a new instance to replace it. The latest Virtual Machine Scale Set model is used to create the new instance. This repair action is the default.
The automatic instance repair operations are performed in batches. At any given time, no more than 5% of the instances in the scale set are repaired through the automatic repairs policy. This process helps avoid simultaneous deletion and re-creation of a large number of instances if found unhealthy at the same time.
When an instance goes through a state change operation because of a PUT, PATCH, or POST action performed on the scale set, then any repair action on that instance is performed only after the grace period ends. Grace period is the amount of time to allow the instance to return to healthy state. The grace period starts after the state change has completed, which helps avoid any premature or accidental repair operations. The grace period is honored for any newly created instance in the scale set, including the one created as a result of repair operation. Grace period is specified in minutes in ISO 8601 format and can be set using the property automaticRepairsPolicy.gracePeriod. Grace period can range between 10 minutes and 90 minutes, and has a default value of 10 minutes.
Virtual Machine Scale Sets provide the capability to temporarily suspend automatic instance repairs if needed. The serviceState for automatic repairs under the property orchestrationServices in instance view of Virtual Machine Scale Set shows the current state of the automatic repairs. When a scale set is opted into automatic repairs, the value of parameter serviceState is set to Running. When the automatic repairs are suspended for a scale set, the parameter serviceState is set to Suspended. If automaticRepairsPolicy is defined on a scale set but the automatic repairs feature isn't enabled, then the parameter serviceState is set to Not Running.
If newly created instances for replacing the unhealthy ones in a scale set continue to remain unhealthy even after repeatedly performing repair operations, then as a safety measure the platform updates the serviceState for automatic repairs to Suspended. You can resume the automatic repairs again by setting the value of serviceState for automatic repairs to Running. Detailed instructions are provided in the section on viewing and updating the service state of automatic repairs policy for your scale set.
You can also set up Azure Alert Rules to monitor serviceState changes and get notified if automatic repairs becomes suspended on your scale set. For details, see Use Azure alert rules to monitor changes in automatic instance repairs service state.
If an instance in a scale set is protected by applying one of the protection policies, then automatic repairs aren't performed on that instance. This behavior applies to both the protection policies: Protect from scale-in and Protect from scale-set actions.
Starting November 2023, VM scale sets created using PowerShell and Azure CLI will default to Flexible Orchestration Mode if no orchestration mode is specified. For more information about this change and what actions you should take, go to Breaking Change for VMSS PowerShell/CLI Customers - Microsoft Community Hub
For enabling automatic repairs policy while creating a new scale set, ensure that all the requirements for opting in to this feature are met. The application endpoint should be correctly configured for scale set instances to avoid triggering unintended repairs while the endpoint is getting configured. For newly created scale sets, any instance repairs are performed only after the grace period completes. To enable the automatic instance repair in a scale set, use automaticRepairsPolicy object in the Virtual Machine Scale Set model.
You can also use this quickstart template to deploy a Virtual Machine Scale Set. The scale set has a load balancer health probe and automatic instance repairs enabled with a grace period of 30 minutes.
The automatic instance repair feature can be enabled while creating a new scale set by using the New-AzVmssConfig cmdlet. This sample script walks through the creation of a scale set and associated resources using the configuration file: Create a complete Virtual Machine Scale Set. You can configure automatic instance repairs policy by adding the parameters EnableAutomaticRepair and AutomaticRepairGracePeriod to the configuration object for creating the scale set. The following example enables the feature with a grace period of 30 minutes.
The following example enables the automatic repairs policy while creating a new scale set using az vmss create. First create a resource group, then create a new scale set with automatic repairs policy grace period set to 30 minutes.
The above example uses an existing load balancer and health probe for monitoring application health status of instances. If you prefer using an application health extension for monitoring, you can do the following instead: create a scale set, configure the application health extension, and enable the automatic instance repairs policy. You can enable that policy by using the az vmss update, as explained in the next section.
Before enabling automatic repairs policy in an existing scale set, ensure that all the requirements for opting in to this feature are met. The application endpoint should be correctly configured for scale set instances to avoid triggering unintended repairs while the endpoint is getting configured. To enable the automatic instance repair in a scale set, use automaticRepairsPolicy object in the Virtual Machine Scale Set model.
After updating the model of an existing scale set, ensure that the latest model is applied to all the instances of the scale. Refer to the instruction on how to bring VMs up-to-date with the latest scale set model.
From the Protocol dropdown list, choose the network protocol used by your application to report health. Select the appropriate protocol based on your application requirements. Protocol options are HTTP, HTTPS, or TCP.
The Application Health extension will ping this path inside each virtual machine in the scale set to get application health status for each instance. If you're using Binary Health States and the endpoint responds with a status 200 (OK), then the instance is marked as "Healthy". In all the other cases (including if the endpoint is unreachable), the instance is marked "Unhealthy". For more health state options, explore Rich Health States.
The repairAction setting under automaticRepairsPolicy allows you to specify the desired repair action performed in response to an unhealthy instance. If you are updating the repair action on an existing automatic repairs policy, you must first disable automatic repairs on the scale set and re-enable with the updated repair action. This process is illustrated in the examples below.
b1e95dc632