pgo failover dasbharod-prod-20-100 --target= gke-dashboard-beta-default-pool-dd4da635-3p3l
pgo backup --backup-opts="--type=full" dashboard-prod-20-100
Error: primary pod is not in Ready state
2021-05-13 15:41:02,987 INFO: no action. i am the leader with the lock
2021-05-13 15:41:12,980 INFO: Lock owner: dashboard-prod-20-100-tnki-74cb96ddc8-5blgx; I am dashboard-prod-20-100-tnki-74cb96ddc8-5blgx
2021-05-13 15:41:13,028 INFO: no action. i am the leader with the lock
2021-05-13 15:41:22,980 INFO: Lock owner: dashboard-prod-20-100-tnki-74cb96ddc8-5blgx; I am dashboard-prod-20-100-tnki-74cb96ddc8-5blgx
2021-05-13 15:41:22,991 INFO: no action. i am the leader with the lock
2021-05-13 15:43:43,030 INFO: no action. i am a secondary and i am following a leader
2021-05-13 15:43:52,987 INFO: Lock owner: dashboard-prod-20-100-tnki-74cb96ddc8-5blgx; I am dashboard-prod-20-100-5694d79955-gsvs4
2021-05-13 15:43:52,987 INFO: does not have lock
2021-05-13 15:43:53,011 INFO: no action. i am a secondary and i am following a leader
2021/05/11 04:26:07 http: panic serving 127.0.0.1:60952: runtime error: index out of range [46] with length 46
goroutine 1003161 [running]:
net/http.(*conn).serve.func1(0xc0006830e0)
/usr/lib/golang/src/net/http/server.go:1801 +0x147
panic(0x165e1a0, 0xc0007927e0)
/usr/lib/golang/src/runtime/panic.go:975 +0x47a
github.com/crunchydata/postgres-operator/internal/apiserver/backupoptions.parseBackupOpts(0xc0004937d0, 0x2e, 0x1973d60, 0xc00008aaa0, 0xc000470000)
/opt/cdev/src/github.com/crunchydata/postgres-operator/internal/apiserver/backupoptions/backupoptionsutil.go:115 +0x665
github.com/crunchydata/postgres-operator/internal/apiserver/backupoptions.convertBackupOptsToStruct(0xc0004937d0, 0x2e, 0x14e8b80, 0xc0005f5200, 0x0, 0x0, 0x7, 0xc000492510, 0x27, 0x1975180, ...)
/opt/cdev/src/github.com/crunchydata/postgres-operator/internal/apiserver/backupoptions/backupoptionsutil.go:62 +0x50
github.com/crunchydata/postgres-operator/internal/apiserver/backupoptions.ValidateBackupOpts(0xc0004937d0, 0x2e, 0x14e8b80, 0xc0005f5200, 0x1e, 0x0)
/opt/cdev/src/github.com/crunchydata/postgres-operator/internal/apiserver/backupoptions/backupoptionsutil.go:49 +0x10a
github.com/crunchydata/postgres-operator/internal/apiserver/clusterservice.validateDataSourceParms(0xc0005f5200, 0x0, 0x0)
/opt/cdev/src/github.com/crunchydata/postgres-operator/internal/apiserver/clusterservice/clusterimpl.go:2371 +0x38e
github.com/crunchydata/postgres-operator/internal/apiserver/clusterservice.CreateCluster(0xc0005f5200, 0xc0004af000, 0x3, 0xc000792d60, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/opt/cdev/src/github.com/crunchydata/postgres-operator/internal/apiserver/clusterservice/clusterimpl.go:586 +0x47a
github.com/crunchydata/postgres-operator/internal/apiserver/clusterservice.CreateClusterHandler(0x19a68a0, 0xc0000fa70
This is happening to a new cluster I created recently. I had an issue yesterday where the database went down and the primary was down so I manually failed over.essentially running:pgo failover dasbharod-prod-20-100 --target= gke-dashboard-beta-default-pool-dd4da635-3p3lThat seems to have fixed things and when I do a test it shows all green, but when I try to take a backup I get this error.pgo backup --backup-opts="--type=full" dashboard-prod-20-100 Error: primary pod is not in Ready stateI presume when we're saying 'primary pod' it's talking about the primary instance of the database.
logs in: service/dashboard-prod-20-100 all look good.
2021-05-13 15:41:02,987 INFO: no action. i am the leader with the lock 2021-05-13 15:41:12,980 INFO: Lock owner: dashboard-prod-20-100-tnki-74cb96ddc8-5blgx; I am dashboard-prod-20-100-tnki-74cb96ddc8-5blgx 2021-05-13 15:41:13,028 INFO: no action. i am the leader with the lock 2021-05-13 15:41:22,980 INFO: Lock owner: dashboard-prod-20-100-tnki-74cb96ddc8-5blgx; I am dashboard-prod-20-100-tnki-74cb96ddc8-5blgx 2021-05-13 15:41:22,991 INFO: no action. i am the leader with the lockReplica Service also seems fine.2021-05-13 15:43:43,030 INFO: no action. i am a secondary and i am following a leader 2021-05-13 15:43:52,987 INFO: Lock owner: dashboard-prod-20-100-tnki-74cb96ddc8-5blgx; I am dashboard-prod-20-100-5694d79955-gsvs4 2021-05-13 15:43:52,987 INFO: does not have lock 2021-05-13 15:43:53,011 INFO: no action. i am a secondary and i am following a leaderAPI Server gives me this error not sure if it is related:
2021/05/11 04:26:07 http: panic serving 127.0.0.1:60952: runtime error: index out of range [46] with length 46 goroutine 1003161 [running]: net/http.(*conn).serve.func1(0xc0006830e0) /usr/lib/golang/src/net/http/server.go:1801 +0x147 panic(0x165e1a0, 0xc0007927e0) /usr/lib/golang/src/runtime/panic.go:975 +0x47a github.com/crunchydata/postgres-operator/internal/apiserver/backupoptions.parseBackupOpts(0xc0004937d0, 0x2e, 0x1973d60, 0xc00008aaa0, 0xc000470000) /opt/cdev/src/github.com/crunchydata/postgres-operator/internal/apiserver/backupoptions/backupoptionsutil.go:115 +0x665 github.com/crunchydata/postgres-operator/internal/apiserver/backupoptions.convertBackupOptsToStruct(0xc0004937d0, 0x2e, 0x14e8b80, 0xc0005f5200, 0x0, 0x0, 0x7, 0xc000492510, 0x27, 0x1975180, ...) /opt/cdev/src/github.com/crunchydata/postgres-operator/internal/apiserver/backupoptions/backupoptionsutil.go:62 +0x50 github.com/crunchydata/postgres-operator/internal/apiserver/backupoptions.ValidateBackupOpts(0xc0004937d0, 0x2e, 0x14e8b80, 0xc0005f5200, 0x1e, 0x0) /opt/cdev/src/github.com/crunchydata/postgres-operator/internal/apiserver/backupoptions/backupoptionsutil.go:49 +0x10a github.com/crunchydata/postgres-operator/internal/apiserver/clusterservice.validateDataSourceParms(0xc0005f5200, 0x0, 0x0) /opt/cdev/src/github.com/crunchydata/postgres-operator/internal/apiserver/clusterservice/clusterimpl.go:2371 +0x38e github.com/crunchydata/postgres-operator/internal/apiserver/clusterservice.CreateCluster(0xc0005f5200, 0xc0004af000, 0x3, 0xc000792d60, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /opt/cdev/src/github.com/crunchydata/postgres-operator/internal/apiserver/clusterservice/clusterimpl.go:586 +0x47a github.com/crunchydata/postgres-operator/internal/apiserver/clusterservice.CreateClusterHandler(0x19a68a0, 0xc0000fa70Operator looks fine, just repeats '2021/05/13 00:41:28 INF 82 (localhost:4150) connecting to nsqd'. Event looks fine as well, same for scheduler.Executing the backup command with no options also gives me the same error. (ie. pog backup dashboard-prod-20-100).Any thoughts on what is going on here?
Hi Samir,
Also, as we do credit bug reporters, if you have a GitHub handle I can credit, I would be happy to include it in the release notes.
Thanks,JonathanOn Thu, May 13, 2021 at 6:04 PM Jonathan S. Katz <jonath...@crunchydata.com> wrote:Hi Samir,Thanks for submitting this.Before getting into the issue, overall I do appreciate the reproduction steps you provided, however it missed a few elements that are helpful in troubleshooting, i.e. which version of PGO is this, what platform you are running on etc. In the future suggest following the bug reporting template -- anything that is referring to a Go panic is definitely a bug:
If you continue to receive the "primary pod not in Ready state" that could mean that the primary is unhealthy, or not all of the attributes that handle the primary detection have been set (primary role label, pod phase, etc.)Thanks for reporting!Jonathan
--
You received this message because you are subscribed to the Google Groups "Postgres Operator" group.
To unsubscribe from this group and stop receiving emails from it, send an email to postgres-opera...@crunchydata.com.
Hello Jonathan/CrunchyData Folks,Sort of a random question. Is the mailing list archived and public? I try to post my solution when I do find one assuming others would find a solution that worked for me so others can find it in the future but if it's not persisted, it can likely just live in my internal documentation flow and not need to fill other people's inbox.--Thank you,Samir Faci