Fellow GCEers, look like the resource du jour ¹ is the famous inferencing price/perf champ, the T4. Our training nodes on
preemptive P100/V100 churn, but even non-preemptive T4s play possum. (But of course we skipped purchasing reservations and commitments, which has been The Right Thing every time. In hindsight.) We're running in the major US regions (us-*1), but no backup setup across the pond, can't tell about other regions. So I'm just sitting here and waiting the weather out. Myself, I bumped my dev machine from T4 to V100, which is a good thing to do once in a while to run into data races you haven't caught. Every cloud has a silver lining.
This is a little price to pay for having this amazing infra maintained by the crème de la crème SREs at your fingertips. They can do anything, except conjuring the T4 accelerators out of thin air². Maybe. I mean, last time I checked they couldn't, but that was whole two weeks ago.
And all this is at the time of an unseen before global catastrophic pandemic, disrupting the global hardware supply. Keep calm, thank these S.überR.E. as warmly as you can
for keeping this unimaginably huge and complex rig running for ya, despite inevitable hiccups (did I mention reservations and commitments?), and, of course, don't forget to
mark 21-05-25 in your calendars!
DON'T PANIC
____
¹
Or, rather, the complete semantic opposite of its non-literal meaning.
² Jeff Dean can, indeed. The rumor is that he has already sent for a supply of compressed air.
-kkm