[Google Compute Engine] details on VM migration and deprecated Debian/CentOS images

600 views
Skip to first unread message

Google Compute Team

unread,
Nov 13, 2013, 7:17:38 PM11/13/13
to

Greetings Google Cloud Platform users,


A couple weeks back, I gave you an early glimpse of a system we've been building to automatically move your virtual machines around scheduled outages or impending failures. Today, I'm pulling back the curtains to tell you all about this new feature, but first I need to announce the deprecation of some of our images.


Deprecated Debian and CentOS Images


We have deprecated all but our most recent Debian 7 and CentOS images. We deprecated all Debian 6 images due to forward-looking kernel incompatibility issues and also deprecated our older Debian 7 images (older than debian-7-wheezy-v20131014) and CentOS images (older than centos-6-v20130926).  We recommend customers currently deploying Debian images to use the latest Debian 7 image (currently debian-7-wheezy-v20131014) and CentOS images to use the latest CentOS image (currently centos-6-v20130926) for the optimal experience.


These deprecated images will remain available until February 25, 2014. If you would like to continue using these deprecated images, you can specify the fully qualified image name. An example gcutil command that lists old images and adds an instance with a fully qualified image name is as follows:


$ gcutil --project=<project-id> listimages --old_images

$ gcutil --project=<project-id> addinstance <instance-name> --image=projects/debian-cloud/global/images/debian-7-wheezy-vYYYYMMDD


And now, onto VM migration…



Transparent maintenance


In my previous email, I said we were putting the finishing touches on a system that makes our scheduled zone maintenance events transparent to your applications and workloads by automatically moving your virtual machines around scheduled outages or impending failures.


We have already upgraded us-central1-a and us-central1-b, therefore, the us-central1-a and us-central1-b Zones will not go offline again for scheduled maintenance.


We will, of course, continue to perform scheduled maintenance: patching our systems with the latest software, performing routine tests and preventative maintenance, and generally ensuring that our infrastructure is as fast and efficient as we know how to make it. Only now, we will be able to automatically move the affected VMs out of the way for you, and the zone itself will remain up and usable through the scheduled maintenance events.


Compute Engine has two ways to move your VMs:

  • migrate

    • Compute Engine will automatically migrate your running VM. The migration process may impact guest performance to some degree. The exact guest performance impact and duration depends on many factors, but we expect most applications and workloads won't be adversely affected.

  • terminate

    • Compute Engine will automatically signal your VM to shut down, wait a short time for the guest to shut down cleanly, and restart (by default; see automaticRestart below) the VM away from the scheduled maintenance event.



VM Migration Configuration


You can control how Compute Engine handles your VM, should we need to perform maintenance on the underlying infrastructure, by setting a new scheduling configuration option on your instances:

  • instance.scheduling.onHostMaintenance = migrate, terminate (as defined above)


We've added a second scheduling configuration option for each VM that lets you control whether Compute Engine automatically restarts a VM that was terminated due to anything other than a user-initiated event:

  • instance.scheduling.automaticRestart = true, false


Here's a gcutil example that specifies the new configuration options when creating a new VM:


$ gcutil --project=my-sample-project addinstance myinstance --on_host_maintenance=terminate --automatic_restart=true


Here is the equivalent setting in the API Instance resource:


{

 "kind": "compute#instance",

 "name": "vm1",

 "description": "A VM which will terminate and restart instead of migrate.",


 "scheduling": {

   "onHostMaintenance": "terminate",

   "automaticRestart": "true",

 }

}


In a few weeks, these scheduling options will be visible to the instance through new metadata server fields. At the same time, we will add another metadata server field to provide a short amount of advance notice of an upcoming maintenance event that will affect the instance, allowing you to implement automation to react as needed. Full details will be published when these new metadata instance fields are available.


Not all VMs will have the option of being migrated, though. To be eligible for migration, an instance must meet one of the following requirements:

  • use persistent disks exclusively for block storage

  • use a scratch disk for boot only, and persistent disk for additional block storage (see Note below)


This means that instances with a machine type that ends in -d cannot be migrated, and will always be terminated instead (i.e., onHostMaintenance will always be terminate). If you currently use -d machine types for your VMs, we suggest you evaluate our persistent disk storage option; we believe it is a viable alternative and we continue to improve its performance and value.


Note: We have an issue where the API does not support setting onHostMaintenance to migrate for instances with a scratch boot disk only.  The fix for the API will roll out next week, at which point the API will allow setting onHostMaintenance=migrate for instances created from that point onward.


For full details of the new configuration options, default and supported values, and the APIs for getting and setting them, refer to the online documentation.



System Events for VM Migration


When Compute Engine moves a VM due to a maintenance event, it will automatically publish one or more System Event Operations, depending on the VM's scheduling settings. Compute Engine publishes the System Event Operations to the Project that owns the VM, adding the Operation(s) to the Project's zone-level Operations Collection. The System Events are described in the following table:


System Event Operation Type

Description

compute.instances.migrateOnHostMaintenance

Published when a VM is migrated in response to a scheduled maintenance event

compute.instances.terminateOnHostMaintenance

Published when a VM is terminated in response to a scheduled maintenance event

compute.instances.automaticRestart

Published when a VM is automatically restarted after being terminated (for any non-user-initiated reason, not just maintenance)


If you decide you want the terminate-and-restart behavior for your VMs, or if you cannot switch from -d machine types to persistent disk, you should be aware that your VMs may experience more frequent terminate-and-restart events, even outside the normal 2 week zone maintenance windows, and plan accordingly.


Starting in late-November, some users with VMs running in us-central1-a and us-central1-b may start seeing the new system event Operation types appearing in their Operations collections as we begin to employ transparent maintenance in those zones.


Visit our Google Developers home page for full API details, FAQs, and more. As always, please send us any feedback via the normal channels.


Thanks, and happy computing!


-- ScottVW, on behalf of the Google Cloud Platform

Reply all
Reply to author
Forward
0 new messages