```sh
cronjob.batch/dashboard-v5-2-0-repo1-diff 0 1 0-6 False 0 14h 2d1h
cronjob.batch/dashboard-v5-2-0-repo1-full 0 1 0 False 0 2d1h
cronjob.batch/dashboard-v5-2-0-repo2-diff 0 3 0-6 False 0 12h 2d1h
cronjob.batch/dashboard-v5-2-0-repo2-full 0 3 0 False 0 2d1h
```
I had an issue with the deployment where the wrong service key was deployed, and the backup failed. That in itself isn't an issue. If it doesn't have the right credentials I would expect it to fail, but the backup jobs seem to bring down the entire cluster, and database connectivity no longer works.
I attached a copy of my manifest I used for the initial deployment for reference.
Event Logs look like this:
```json
{
"insertId": "gdvkpxfo9ayla",
"jsonPayload": {
"kind": "Event",
"eventTime": null,
"apiVersion": "v1",
"message": "Readiness probe failed: HTTP probe failed with statuscode: 503",
"type": "Warning",
"source": {
"host": "gke-prod2-dashboard-default-pool-3c07ebdf-tw9p",
"component": "kubelet"
},
"metadata": {
"creationTimestamp": "2023-03-23T12:50:25Z",
"resourceVersion": "3368",
"namespace": "postgres-operator",
"name": "dashboard-v5-2-0-00-jdgd-0.174f0d506cf55d67",
"managedFields": [
{
"time": "2023-03-23T12:50:25Z",
"operation": "Update",
"fieldsType": "FieldsV1",
"manager": "kubelet",
"apiVersion": "v1",
"fieldsV1": {
"f:count": {},
"f:involvedObject": {},
"f:reason": {},
"f:source": {
"f:host": {},
"f:component": {}
},
"f:message": {},
"f:type": {},
"f:lastTimestamp": {},
"f:firstTimestamp": {}
}
}
],
"uid": "28889db4-2523-43ea-9151-e54d9e398ad2"
},
"reportingInstance": "",
"involvedObject": {
"apiVersion": "v1",
"name": "dashboard-v5-2-0-00-jdgd-0",
"namespace": "postgres-operator",
"fieldPath": "spec.containers{database}",
"uid": "c7fce83f-b0a8-4a78-aa41-cc8d74d18334",
"resourceVersion": "1209718",
"kind": "Pod"
},
"lastTimestamp": "2023-03-23T12:52:05Z",
"reason": "Unhealthy",
"reportingComponent": ""
},
"resource": {
"type": "k8s_pod",
"labels": {
"namespace_name": "postgres-operator",
"cluster_name": "prod2-dashboard",
"project_id": "esnet-sd-dev",
"location": "us-central1-c",
"pod_name": "dashboard-v5-2-0-00-jdgd-0"
}
},
"timestamp": "2023-03-23T12:52:05Z",
"severity": "WARNING",
"logName": "projects/esnet-sd-dev/logs/events",
"receiveTimestamp": "2023-03-23T12:52:10.131326261Z"
}
(combined from similar events): command terminated with exit code 39: ERROR: [039]: HTTP request failed with 403 (Forbidden): *** Path/Query ***: GET /storage/v1/b/prod-dashboards/o/pgbackrest%2Fpostgres-operator%2Fdashboard-v5-2-0-gcs%2Fprod2%2Farchive%2Fdb%2Farchive.info?fields=size%2Cupdated *** Request Headers ***: authorization: <redacted> content-length: 0 host: storage.googleapis.com *** Response Headers ***: cache-control: no-cache, no-store, max-age=0, must-revalidate content-length: 598 content-type: application/json; charset=UTF-8 date: Thu, 23 Mar 2023 13:39:06 GMT expires: Mon, 01 Jan 1990 00:00:00 GMT pragma: no-cache server: UploadServer vary: Origin, X-Origin x-guploader-uploadid: ADPycdtw4LkL_Sg2cr4K0H2fQBnbkXeHHz8_EO8JbyxANE8GsE32RIE2S3E9GBHUW_t-YfxNQ468McouKXmDP9OQJ7BjRA *** Response Content ***: { "error": { "code": 403, "message": "staging-bac...@esnet-sd-dev.iam.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).", "errors": [ { "message": "staging-bac...@esnet-sd-dev.iam.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).", "domain": "global", "reason": "forbidden" } ] } }
```
I'm fairly sure I resolved the issue now that the correct service account has been added, but the Report or Bug that I'm filling is that a failure on a backup job should not bring down a production database no matter what the issues were. Or am I wrong to make that assumption?
--
Samir Faci
(combined from similar events): command terminated with exit code 39: ERROR: [039]: HTTP request failed with 403 (Forbidden): *** Path/Query ***: GET /storage/v1/b/prod-dashboards/o/pgbackrest%2Fpostgres-operator%2Fdashboard-v5-2-0-gcs%2Fprod2%2Farchive%2Fdb%2Farchive.info?fields=size%2Cupdated *** Request Headers ***: authorization: <redacted> content-length: 0 host: storage.googleapis.com *** Response Headers ***: cache-control: no-cache, no-store, max-age=0, must-revalidate content-length: 598 content-type: application/json; charset=UTF-8 date: Thu, 23 Mar 2023 13:39:06 GMT expires: Mon, 01 Jan 1990 00:00:00 GMT pragma: no-cache server: UploadServer vary: Origin, X-Origin x-guploader-uploadid: ADPycdtw4LkL_Sg2cr4K0H2fQBnbkXeHHz8_EO8JbyxANE8GsE32RIE2S3E9GBHUW_t-YfxNQ468McouKXmDP9OQJ7BjRA *** Response Content ***: { "error": { "code": 403, "message": "staging-backup-service@esnet-sd-dev.iam.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).", "errors": [ { "message": "staging-backup-service@esnet-sd-dev.iam.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).", "domain": "global", "reason": "forbidden" } ] } }
```
(combined from similar events): command terminated with exit code 39: ERROR: [039]: HTTP request failed with 403 (Forbidden): *** Path/Query ***: GET /storage/v1/b/prod-dashboards/o/pgbackrest%2Fpostgres-operator%2Fdashboard-v5-2-0-gcs%2Fprod2%2Farchive%2Fdb%2Farchive.info?fields=size%2Cupdated *** Request Headers ***: authorization: <redacted> content-length: 0 host: storage.googleapis.com *** Response Headers ***: cache-control: no-cache, no-store, max-age=0, must-revalidate content-length: 598 content-type: application/json; charset=UTF-8 date: Thu, 23 Mar 2023 13:39:06 GMT expires: Mon, 01 Jan 1990 00:00:00 GMT pragma: no-cache server: UploadServer vary: Origin, X-Origin x-guploader-uploadid: ADPycdtw4LkL_Sg2cr4K0H2fQBnbkXeHHz8_EO8JbyxANE8GsE32RIE2S3E9GBHUW_t-YfxNQ468McouKXmDP9OQJ7BjRA *** Response Content ***: { "error": { "code": 403, "message": "staging-bac...@esnet-sd-dev.iam.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).", "errors": [ { "message": "staging-bac...@esnet-sd-dev.iam.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).", "domain": "global", "reason": "forbidden" } ] } }
```