We have preemptible instances.
We want to confirm when an exception happens whether it was due to a preemption or some other reason.
In order to do so, we catch socket exceptions and then check in the logs for a preemption event.
gfilter = f'''
log_name="projects/{project}/logs/cloudaudit.googleapis.com%2Fsystem_event" and
resource.type:gce_instance and
resource.labels.instance_id:{instance} and
operation.producer="compute.instances.preempted"
''' glog = google.cloud.logging.Client()
event = glog.list_entries(filter_=gfilter)
We had an instance preempted today, but when we checked in the logs there was no event there.
10:54:10.637 [19645] [DEBUG ] cmd.run: running task
10:54:10.837 [19645] [ERROR ] transport._log: Socket exception: Connection reset by peer (104)
10:54:10.838 [19645] [DEBUG ] vm.was_preempted: search for preemption of vm(foo) instance_id(12345678)
10:54:10.876 [19645] [DEBUG ] requests.__call__: Making request: POST https://oauth2.googleapis.com/token
10:54:10.878 [19645] [DEBUG ] connectionpool._new_conn: Starting new HTTPS connection (1): oauth2.googleapis.com:443
10:54:10.960 [19645] [DEBUG ] connectionpool._make_request: https://oauth2.googleapis.com:443 "POST /token HTTP/1.1" 200 None
10:54:21.147 [19645] [DEBUG ] vm.was_preempted: vm(foo) was not preempted
However, searching in the logs later showed the preemption did actually occur, it just only appeared in the logs ~30 seconds later:
I 2019-09-26T14:54:46.513992Z compute.instances.preempted {"event_timestamp_us":"1569509686513992","actor":{"user":"system"},"resource":{"name":"foo","type":"instance","zone":"us-east1-b","id":"12345678"},"event_type":"GCE_OPERATION_DONE","trace_id":"systemevent-1569509664110-59375f20035ca-164973da-ccb5efcf","operation":{"name":"systemeven… compute.instances.preempted
I'm not sure how closely our clock is sync'd to Googles, but here are the times:
10:54:10 - preemption started
10:54:21 - log query comes back, nothing about preempting in the logs
10:54:46 - preemption event appears in the logs (14:54:46 UTC)
Is there any other way we can confirm whether a preemption has been initiated without having to rely on logs which can be updated many seconds *after* the fact?