Persistent Disk lost on reboot?!?!

Peter Murray

unread,

Jul 21, 2017, 1:06:10 PM7/21/17

to gce-discussion

Greetings folks, I'm coming from more of an AWS background, so apologies if my terminology is not properly aligned.

Has anyone experienced the following scenario:

GCE VM Instance up and running for several days
expanded the default 10GB disk to 200GB without a hassle - loads of locally generated data present - everything working smoothly
stop the server to adjust the IAM permissions
re-start the server - disk is entirely gone and replaced with 10GB original ubuntu installation disk & size.

I've opened a ticket by calling the support number, but wondered if anyone here could help.

Is that disk image just gone at this point? Does this happen frequently on this platform? Am I misunderstanding the meaning of "Standard Persistent Disk"?

Thanks for any tips or help.

-pete

Raymond Hawkins

unread,

Jul 21, 2017, 3:04:02 PM7/21/17

to gce-discussion

We experienced this once, and it was due to the instance being part of a group with an instance template. So any time the instance restarted, it was resetting it back to the template.

Peter Murray

unread,

Jul 21, 2017, 11:09:09 PM7/21/17

to gce-discussion

Yeah - I guess that is the case here, Raymond. We were surprised as we had restarted the server several times without losing the disk - and we didn't destroy and create a new instance.

Theodore Ts'o

unread,

Jul 22, 2017, 1:28:20 PM7/22/17

to Peter Murray, gce-discussion

On Fri, Jul 21, 2017 at 09:48:39AM -0700, Peter Murray wrote:
> Greetings folks, I'm coming from more of an AWS background, so apologies
> if my terminology is not properly aligned.
>
> Has anyone experienced the following scenario:
>

> - GCE VM Instance up and running for several days
> - expanded the default 10GB disk to 200GB without a hassle - loads of

> locally generated data present - everything working smoothly

> - stop the server to adjust the IAM permissions
> - re-start the server - disk is entirely gone and replaced with 10GB

> original ubuntu installation disk & size.

It appears from your description is that when you "stopped" the
server, you destroyed the VM, and when you "re-started" the server
you created a new one.

So a couple of things here. When a VM is created, you can specify
either that it use an existing boot disk, or to create a boot disk
from an image. If you create the boot disk from an image, the
assumption is that it is a "throw-away" disk, since you can always
create a new one from the image. So the disk is marked with an
"auto-delete" flag, which means when the VM dies, the disk is deleted.

This is normally considered a *feature* because otherwise you would
continually charged for the disk's existence. The basic idea here is
that you create a custom disk image that contains your "VM appliance",
and then you might launch several VM that are all based off of the
same image. When the VM dies, the disk dies. That way, if the
appliance image gets corrupted, or worse, if the VM gets attacked and
someone leaves malware on the disk, the disk image is flushed when the
VM dies, and when you start a new VM, you get a fresh image that has
your appliance as it was originally set up for you. If you've heard
the phrase of treating servers as "pets versus cattle", that's
basically the model of what's going on.

To give you a concrete example of this might be done, I have a system
called gce-xfstests[1] where I create a test appliance for doing
kernel regression tests. To build the test appliance, I have a script
which launches a VM[2], and that VM has a startup script which takes a
standard Debian VM and configures it as my test appliance[3]. One of
the last things the image builder script[3] does is run this command:

gcloud compute -q instances delete "$BLD_INST" --zone "$ZONE" \
--keep-disks boot

This kills the VM, but it first clears the "auto-delete" flag on the
boot disk, so that it doesn't die when the VM dies. Then the script
that launched the image builder VM converts that boot disk into an
image, using this command:

gcloud compute --project "$GCE_PROJECT" -q images create "$NEW_IMAGE_NAME" \
--description "Linux Kernel File System Test Appliance" \
--source-disk "$BLD_DISK" --source-disk-zone "$GCE_ZONE" \
--family "$ROOT_FS"

Now in the future, when I create test appliance VM's, I use the
gce-xfstests image as the starting image, which has been customized to
do its thing. And I might launch a half-dozen VM's, all running
different tests. And when the VM dies, their boot disks dies, because
they are considered temporary.

[1] https://thunk.org/gce-xfstests
[2] https://github.com/tytso/xfstests-bld/blob/master/kvm-xfstests/test-appliance/gce-create-image
[3] https://github.com/tytso/xfstests-bld/blob/master/kvm-xfstests/test-appliance/gce-xfstests-bld.sh

-----

If instead you want to create a "pet" VM, which you take care of very
carefully, and if it gets sick, you nurse it back to health (as
opposed to in the "cattle" paradigm, where if the VM gets sick, you
just shoot it in the head), you can either create the VM with the
--no-boot-disk-auto-delete command:

gcloud compute instances create lassie-vm --machine-type "n1-standard-1" \
--image "debian-9-stretch-v20170717" --image-project "debian-cloud" \
--boot-disk-size "10" --no-boot-disk-auto-delete --boot-disk-type "pd-standard" \
--boot-disk-device-name "lassie"

Now when you delete the vm, the "boot disk" lassie will stick around.
You can then subsequently create a new VM using that disk:

gcloud compute instances create lassie-vm --machine-type "n1-standard-1" \
--disk "name=lassie,device-name=boot,mode=rw,boot=yes"

You can also toggle a particular disk's auto-delete flag on the fly:

gcloud compute -q instances set-disk-auto-delete ${instance} \
--auto-delete --disk "$SCRATCH_DSK" --zone "$ZONE"

> Is that disk image just gone at this point? Does this happen frequently on
> this platform?

Unfortunately, if the disk image was set to auto-delete, then once
it's gone, it's gone. Amazon's AWS works very similarly --- AWS will
auto-delete root device volumes when the VM is terminated. So it's
just a matter of understanding how things work, and telling the
platform what you want it to do. Both AWS and GCP are designed with
the "VM as cattle" paradigm as the default, and so the commands are
set up to work that way by default. You can override the defaults, of
course.

Cheers, and I hope this helps,

- Ted

Reply all

Reply to author

Forward