Automatically Restarting GCE VM and TPU node

313 views
Skip to first unread message

Joy Lunkad

unread,
Feb 25, 2022, 5:03:22 AM2/25/22
to gce-discussion
Is there a way to automatically restart a TPU node when it is preempted?

I want to use a non preemptible GCE VM and preemptible TPU-v3-8. Can I add a script to the VM that restarts the TPU when it is preempted and then run some other script when it restarts?

PS: This is the command i am using to create them, but I am not sure if it is doing what I want.

```
gcloud compute tpus execution-groups create --name=tpu-demo --zone=europe-west4-a --disk-size=300 --machine-type=n1-standard-16 --tf-version=2.8.0 --accelerator-type=v3-8 --preemptible
```
Thank you so much
 

Kevin Tielve

unread,
Mar 8, 2022, 11:58:04 AM3/8/22
to gce-discussion

The gcloud command you are using is to create TPUs along with VMs. AFAIK, there is no script to detect the preemption of a TPU similar to the preemptible VM shutdown script.

In the preemptible TPUs documentation, you can see a note in the Detecting if a TPU has been preempted part.

Note: If your Cloud TPU is preempted, you must delete it and create a new one as described in Managing TPUs.



I believe that a non-native option could be to monitor the preemption of your TPU with some cron job or with Cloud Scheduler to delete and create a new one on preemption (eg: with a Cloud Function). Cloud Composer could also be considered.

You could also monitor the preemption from the logs (resource.type="tpu_worker") and create a log-based alert


Additionally, in case your concern is related to the 24h preemption counteryou can reset this 24-hour counter with certain actions.

In the Preemptible TPUs documentation you have information on how you can create TPU Preemptible nodes, best practice, like described here.

Lastly, if your concern is not related to this 24h preemption counter and the reset actions workaround doesn't fit your use case, then I suggest you to raise a Feature Request on issue tracker.


I hope this helps

Best regards,

Reply all
Reply to author
Forward
0 new messages