I'm going to try my best to answer your question, so feel free to correct me if I miss any details to your question:
-PD SLOs:
Our PDs SLOs state that: “Each device instantiation shall not have more than 43 unhealthy minutes, corresponding to 99.9% of total minutes in a 30 day sliding window, and will not have more than 15 consecutive unhealthy minutes per 30 day sliding window."
Note that an "unhealthy minute" is any minute in which at least one IO operation fails, or takes longer than 2 seconds to complete.
PDs perform in line with its SLOs. But, in general, transient slowness or unavailability can happen. Of course we try to minimize it, but there is no way to 100% prevent it.
-Snapshots:
Snapshots require data to be available in exactly the same way as reads / writes within a VM. So, if the data is temporarily unavailable, snapshots may fail ( typically hang until the data becomes available again). Generally, snapshots should be entirely consistent. When a snapshot is requested, we freeze the device at that point in time (of course new writes come in, but we keep the old data around too). A snapshot may take a long time to finish if there is unavailability, but there is no reason it will not eventually finish, and it will be entirely consistent with the state of the device *at the time it was requested*. With snapshot locality, all snapshots will be created and stored in the region where you define it to be. [1] Snapshots are ideal for DR ( disaster recovery) faster to create images and smaller since it does not contain the OS.
-Images:
Public images are provided and maintained by google; therefore; will ensure higher functionality.
With custom images, you control access to the boot disk since you’re the owner of it. Having everything tailored for your environment will give you the flexibility that is needed. Storing images is optional ( if you desire to store in cloud storage) however, your image will be stored in the multi-region closest to the image source. **“You can create an image from a disk even while it is attached to a running VM instance. However, your image will be more reliable if you put the instance in a state that is easier for the image to capture.”[2]** An image includes the operating system and boot loader which are typically larger in size compared to snapshots.
-Possible best use case for this scenario:
Perhaps you can explore instance-templates as opposed to creating snapshots from your control node? You can define machine types, boot disks image or create an instance template based on existing images (creating the R/O /opt dir.) [3] It's easier to resize your boot partition with snapshots and overall, easier to work with.
[2]:https://cloud.google.com/compute/docs/images/create-delete-deprecate-private-images
[3]:https://cloud.google.com/compute/docs/instance-templates/create-instance-templates
I understand the differences between a disk image and a snapshot can be blurry when looking at the high level implementations by Cloud providers.
Here are the differences on at a low level:
An Image is : All the bytes of a hard drive stored as a file elsewhere
A Snapshot is : A point in time (to put it simply). A snapshot depends on the content of the disk at the moment the snapshot is made. Explanation: You are using a computer and Poof you create a snapshot. The size of the snapshot starts at 0 Byte. From that point on, all the content that you modify on the disk is not actually written to the disk, it gets written to the snapshot file that is stored somewhere else. You write a 1GB file on your disk? The bytes on the disk did not actually change. You will find that the snapshot storing the delta between now and since it was created has now a size of 1GB.
A Backup is : All the bytes of a hard drive stored as a file elsewhere and “a point in time”. It differs from a snapshot because, like an image, the backup depends on nothing but itself.
To be clear, you will not be able to restore data using a snapshot if a nuclear bomb lands on your disk. You would be able to create a new disk with the past content if you had created a backup though.
$ gcloud compute images describe burrmill-compute-v001-200105
archiveSizeBytes: '564512384'
diskSizeGb: '10'
storageLocations:
- us
$ time gcloud compute disks create test --zone=us-west1-b --image=burrmill-compute-v001-200105 --type=pd-ssd --size=10GB
Created [https://www.googleapis.com/compute/v1/projects/[REDACTED]/zones/us-west1-b/disks/test].
NAME ZONE SIZE_GB TYPE STATUS
test us-west1-b 10 pd-ssd READY
real 0m18.417s
$ gcloud compute disks snapshot test --zone=us-west1-b --snapshot-names=test --storage-location=us
Creating snapshot(s) test...done.
$ gcloud compute snapshots describe test
diskSizeGb: '10'
storageBytes: '564512384'
storageLocations:
- us
$ gcloud compute disks delete test -q --zone=us-west1-b
Deleted [https://www.googleapis.com/compute/v1/projects/[REDACTED]/zones/us-west1-b/disks/test].
$ time gcloud compute disks create test --zone=us-west1-b --source-snapshot=test --type=pd-ssd --size=10GB
Created [https://www.googleapis.com/compute/v1/projects/[REDACTED]/zones/us-west1-b/disks/test].
NAME ZONE SIZE_GB TYPE STATUS
test us-west1-b 10 pd-ssd READY
real 0m23.606s