Failover not working correct with PerconaServer 8.0.18

204 views
Skip to first unread message

wayne.l...@gmail.com

unread,
Jan 23, 2020, 8:51:11 AM1/23/20
to orchestrator-mysql
Hello all,

I have Orchestrator 3.1.3 install, and using 3 PerconaServer 8.0.18 as master with two slaves. 

9 out of 10 times, when we have a Master crash, One slave takes over. The other slave does not connect back to the new master, but becomes a master also.  

I am not sure what config information you may need. Please let me know and I will provide it. 

Thank you. 

wayne.l...@gmail.com

unread,
Jan 23, 2020, 8:52:26 AM1/23/20
to orchestrator-mysql
Lets me add one more thing. I also have a 5.7.24 cluster being managed by same Orchestrator 3.1.3 and I am NOT seeing any issues with 5.7.24.

Shlomi Noach

unread,
Jan 23, 2020, 9:06:28 AM1/23/20
to wayne.l...@gmail.com, orchestrator-mysql
if you can paste the logs from around the failover that would be great. Please run with --debug.

--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-my...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/orchestrator-mysql/42ebab49-0458-4842-ab55-6e53d90bcccb%40googlegroups.com.

wayne.l...@gmail.com

unread,
Jan 23, 2020, 9:35:43 AM1/23/20
to orchestrator-mysql
Server dbvrd43248 was stopped it was the master at time of stop. Here log output:

2020-01-23 08:29:01     emergently-read-topology-instance       dbvrd43249      3306    [dbvrd43249:3306]       UnreachableMaster

2020-01-23 08:29:01     emergently-read-topology-instance       dbvrd43250      3306    [dbvrd43249:3306]       UnreachableMaster

2020-01-23 08:29:01     emergently-read-topology-instance       dbvrd43256      3306    [dbvrd43249:3306]       UnreachableMaster

2020-01-23 08:29:02     emergently-read-topology-instance       dbvrd43249      3306    [dbvrd43249:3306]       UnreachableMaster

2020-01-23 08:29:02     emergently-read-topology-instance       dbvrd43250      3306    [dbvrd43249:3306]       UnreachableMaster

2020-01-23 08:29:02     emergently-read-topology-instance       dbvrd43256      3306    [dbvrd43249:3306]       UnreachableMaster

2020-01-23 08:29:04     emergently-read-topology-instance       dbvrd43249      3306    [dbvrd43249:3306]       UnreachableMaster

2020-01-23 08:29:04     emergently-read-topology-instance       dbvrd43250      3306    [dbvrd43249:3306]       UnreachableMaster

2020-01-23 08:29:04     emergently-read-topology-instance       dbvrd43256      3306    [dbvrd43249:3306]       UnreachableMaster

2020-01-23 08:29:05     emergently-read-topology-instance       dbvrd43249      3306    [dbvrd43249:3306]       UnreachableMaster

2020-01-23 08:29:05     emergently-read-topology-instance       dbvrd43250      3306    [dbvrd43249:3306]       UnreachableMaster

2020-01-23 08:29:05     emergently-read-topology-instance       dbvrd43256      3306    [dbvrd43249:3306]       UnreachableMaster

2020-01-23 08:29:06     emergently-read-topology-instance       dbvrd43249      3306    [dbvrd43249:3306]       UnreachableMaster

2020-01-23 08:29:06     emergently-read-topology-instance       dbvrd43250      3306    [dbvrd43249:3306]       UnreachableMaster

2020-01-23 08:29:06     emergently-read-topology-instance       dbvrd43256      3306    [dbvrd43249:3306]       UnreachableMaster

2020-01-23 08:29:07     recover-dead-master     dbvrd43249      3306    [dbvrd43249:3306]       problem found; will recover

2020-01-23 08:29:07     begin-maintenance       dbvrd43250      3306    [dbvrd43249:3306]       maintenanceToken: 1, owner: dbvrd43251, reason: move below dbvrd43256:3306

2020-01-23 08:29:07     move-below-gtid dbvrd43250      3306    [dbvrd43249:3306]       moved dbvrd43250:3306 below dbvrd43256:3306

2020-01-23 08:29:07     move-replicas-gtid      dbvrd43256      3306    [dbvrd43249:3306]       moved 1/1 replicas below dbvrd43256:3306 via GTID

2020-01-23 08:29:07     end-maintenance dbvrd43250      3306    [dbvrd43249:3306]       maintenanceToken: 1

2020-01-23 08:29:08     regroup-replicas-gtid   dbvrd43249      3306    [dbvrd43249:3306]       regrouped replicas of dbvrd43249:3306 via GTID; promoted dbvrd43256:3306

2020-01-23 08:29:08     begin-downtime  dbvrd43249      3306    [dbvrd43249:3306]       owner: dbvrd43251, reason: lost-in-recovery

2020-01-23 08:29:08     recover-dead-master     dbvrd43249      3306    [dbvrd43249:3306]       promoted replica: dbvrd43256:3306

2020-01-23 08:29:08     begin-maintenance       dbvrd43256      3306    [dbvrd43249:3306]       maintenanceToken: 2, owner: dbvrd43251, reason: reset replica

2020-01-23 08:29:08     reset-slave     dbvrd43256      3306    [dbvrd43256:3306]       dbvrd43256:3306 replication reset

2020-01-23 08:29:08     end-maintenance dbvrd43256      3306    [dbvrd43256:3306]       maintenanceToken: 2

2020-01-23 08:29:08     read-only       dbvrd43256      3306    [dbvrd43256:3306]       set as false


On Thursday, January 23, 2020 at 8:51:11 AM UTC-5, wayne.l...@gmail.com wrote:

Shlomi Noach

unread,
Jan 23, 2020, 9:44:55 AM1/23/20
to wayne.l...@gmail.com, orchestrator-mysql
thanks; pretty sure there should be more logs following this, and if you can include them that would be helpful. Anyway it seems like the master was dbvrd43249 and not dbvrd43248, and there was only one replica?

--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-my...@googlegroups.com.

Shlomi Noach

unread,
Jan 23, 2020, 9:46:16 AM1/23/20
to wayne.l...@gmail.com, orchestrator-mysql
to clarify this is the audit-log, I'm looking for the "general" log; if you run with systemctl, that would be journalctl -f -u orchestrator ; otherwise /var/log/orchestrator.log

wayne.l...@gmail.com

unread,
Jan 23, 2020, 9:49:48 AM1/23/20
to orchestrator-mysql
Sorry sir. You are correct dbvrd43249 was the master. Typo on my part. Getting more logs. 


On Thursday, January 23, 2020 at 9:46:16 AM UTC-5, Shlomi Noach wrote:
to clarify this is the audit-log, I'm looking for the "general" log; if you run with systemctl, that would be journalctl -f -u orchestrator ; otherwise /var/log/orchestrator.log

To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-mysql+unsub...@googlegroups.com.

wayne.l...@gmail.com

unread,
Jan 23, 2020, 10:25:53 AM1/23/20
to orchestrator-mysql
I think this is what you need. 

Jan 23 09:16:05 dbvrd43251 orchestrator[14351]: Error 1053: Server shutdown in progress

Jan 23 09:16:05 dbvrd43251 orchestrator[14351]: ReadTopologyInstance(dbvrd43249:3306) show variables like 'maxscale%': Error 1053: Server shutdown in progress

Jan 23 09:16:05 dbvrd43251 orchestrator[14351]: DiscoverInstance(dbvrd43249:3306) instance is nil in 0.013s (Backend: 0.003s, Instance: 0.009s), error=Error 1053: Server shutdown in progress

Jan 23 09:16:06 dbvrd43251 orchestrator[14351]: executeCheckAndRecoverFunction: proceeding with UnreachableMaster detection on dbvrd43249:3306; isActionable?: false; skipProcesses: false

Jan 23 09:16:06 dbvrd43251 orchestrator[14351]: invalid connection

Jan 23 09:16:06 dbvrd43251 orchestrator[14351]: topology_recovery: Running 1 OnFailureDetectionProcesses hooks

Jan 23 09:16:06 dbvrd43251 orchestrator[14351]: topology_recovery: detected UnreachableMaster failure on dbvrd43249:3306

Jan 23 09:16:06 dbvrd43251 orchestrator[14351]: topology_recovery: Running OnFailureDetectionProcesses hook 1 of 1: echo 'Detected UnreachableMaster on dbvrd43249:3306. Affected replicas: 2' >> /tmp/recovery.log

Jan 23 09:16:06 dbvrd43251 orchestrator[14351]: topology_recovery: Completed OnFailureDetectionProcesses hook 1 of 1 in 2.863209ms

Jan 23 09:16:06 dbvrd43251 orchestrator[14351]: CommandRun(echo 'Detected UnreachableMaster on dbvrd43249:3306. Affected replicas: 2' >> /tmp/recovery.log,[])

Jan 23 09:16:06 dbvrd43251 orchestrator[14351]: CommandRun/running: bash /tmp/orchestrator-process-cmd-291707449

Jan 23 09:16:06 dbvrd43251 orchestrator[14351]: CommandRun:

Jan 23 09:16:06 dbvrd43251 orchestrator[14351]: CommandRun successful. exit status 0

Jan 23 09:16:06 dbvrd43251 orchestrator[14351]: topology_recovery: done running OnFailureDetectionProcesses hooks

Jan 23 09:16:06 dbvrd43251 orchestrator[14351]: executeCheckAndRecoverFunction: proceeding with UnreachableMaster recovery on dbvrd43249:3306; isRecoverable?: false; skipProcesses: false

Jan 23 09:16:07 dbvrd43251 orchestrator[14351]: checkAndExecuteFailureDetectionProcesses: could not register UnreachableMaster detection on dbvrd43249:3306

Jan 23 09:16:07 dbvrd43251 orchestrator[14351]: dial tcp 10.203.28.192:3306: connect: connection refused

Jan 23 09:16:09 dbvrd43251 orchestrator[14351]: dial tcp 10.203.28.192:3306: connect: connection refused

Jan 23 09:16:11 dbvrd43251 orchestrator[14351]: dial tcp 10.203.28.192:3306: connect: connection refused

Jan 23 09:16:12 dbvrd43251 orchestrator[14351]: dial tcp 10.203.28.192:3306: connect: connection refused

Jan 23 09:16:13 dbvrd43251 orchestrator[14351]: dial tcp 10.203.28.192:3306: connect: connection refused

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: executeCheckAndRecoverFunction: ignoring analysisEntry that has no action plan: FirstTierSlaveFailingToConnectToMaster; key: dbvrd43256:3306

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: executeCheckAndRecoverFunction: proceeding with DeadMaster detection on dbvrd43249:3306; isActionable?: true; skipProcesses: false

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: executeCheckAndRecoverFunction: ignoring analysisEntry that has no action plan: FirstTierSlaveFailingToConnectToMaster; key: dbvrd43250:3306

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: topology_recovery: Running 1 OnFailureDetectionProcesses hooks

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: topology_recovery: detected DeadMaster failure on dbvrd43249:3306

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: topology_recovery: Running OnFailureDetectionProcesses hook 1 of 1: echo 'Detected DeadMaster on dbvrd43249:3306. Affected replicas: 2' >> /tmp/recovery.log

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: CommandRun(echo 'Detected DeadMaster on dbvrd43249:3306. Affected replicas: 2' >> /tmp/recovery.log,[])

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: CommandRun/running: bash /tmp/orchestrator-process-cmd-712703556

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: CommandRun:

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: CommandRun successful. exit status 0

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: topology_recovery: Completed OnFailureDetectionProcesses hook 1 of 1 in 2.662655ms

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: topology_recovery: done running OnFailureDetectionProcesses hooks

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: executeCheckAndRecoverFunction: proceeding with DeadMaster recovery on dbvrd43249:3306; isRecoverable?: true; skipProcesses: false

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: topology_recovery: will handle DeadMaster event on dbvrd43249:3306

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: topology_recovery: Running 1 PreFailoverProcesses hooks

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: topology_recovery: Running PreFailoverProcesses hook 1 of 1: echo 'Will recover from DeadMaster on dbvrd43249:3306' >> /tmp/recovery.log

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: CommandRun(echo 'Will recover from DeadMaster on dbvrd43249:3306' >> /tmp/recovery.log,[])

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: CommandRun/running: bash /tmp/orchestrator-process-cmd-983626963

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: CommandRun:

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: CommandRun successful. exit status 0

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: topology_recovery: Completed PreFailoverProcesses hook 1 of 1 in 2.579675ms

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: topology_recovery: done running PreFailoverProcesses hooks

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: topology_recovery: RecoverDeadMaster: will recover dbvrd43249:3306

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: topology_recovery: RecoverDeadMaster: masterRecoveryType=MasterRecoveryGTID

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: topology_recovery: RecoverDeadMaster: regrouping replicas via GTID

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: Stopped slave nicely on dbvrd43250:3306, Self:binarylog.000002:95895, Exec:binarylog.000015:275

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: Stopped replication on dbvrd43250:3306, Self:binarylog.000002:95895, Exec:binarylog.000015:275

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: Stopped slave nicely on dbvrd43256:3306, Self:binarylog.000016:275, Exec:binarylog.000015:275

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: Stopped replication on dbvrd43256:3306, Self:binarylog.000016:275, Exec:binarylog.000015:275

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: topology_recovery: RecoverDeadMaster: promotedReplicaIsIdeal(dbvrd43256:3306)

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: moveReplicasViaGTID: Will move 1 replicas below dbvrd43256:3306 via GTID

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: Will move dbvrd43250:3306 below dbvrd43256:3306 via GTID

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: Stopped replication on dbvrd43250:3306, Self:binarylog.000002:95895, Exec:binarylog.000015:275

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: ChangeMasterTo: Changed master on dbvrd43250:3306 to: dbvrd43256:3306, binarylog.000016:275. GTID: true

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: Started replication on dbvrd43250:3306

Jan 23 09:16:14 dbvrd43251 orchestrator[14351]: Started replication on dbvrd43256:3306

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: executeCheckAndRecoverFunction: proceeding with DeadMaster detection on dbvrd43249:3306; isActionable?: true; skipProcesses: false

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: executeCheckAndRecoverFunction: proceeding with DeadMaster recovery on dbvrd43249:3306; isRecoverable?: true; skipProcesses: false

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: 2020-01-23 09:16:15 ERROR AttemptRecoveryRegistration: cluster dbvrd43249:3306 has recently experienced a failover (of dbvrd43249:3306) and is in active period. It will not be failed over again. You may acknowledge the failure on this cluster (-c ack-cluster-recoveries) or on dbvrd43249:3306 (-c ack-instance-recoveries) to remove this blockage

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: AttemptRecoveryRegistration: cluster dbvrd43249:3306 has recently experienced a failover (of dbvrd43249:3306) and is in active period. It will not be failed over again. You may acknowledge the failure on this cluster (-c ack-cluster-recoveries) or on dbvrd43249:3306 (-c ack-instance-recoveries) to remove this blockage

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: found an active or recent recovery on dbvrd43249:3306. Will not issue another RecoverDeadMaster.

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: ReadInstanceClusterAttributes: in co-master topology dbvrd43249:3306 is not in (dbvrd43256:3306, dbvrd43250:3306). Forcing it to become one of them

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: RecoverDeadMaster: 0 postponed functions

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: checking if should replace promoted replica with a better candidate

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: + checking if promoted replica is the ideal candidate

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: + searching for an ideal candidate

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: + checking if promoted replica is an OK candidate

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: + searching for a candidate

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: + searching for a candidate

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: + found no server to promote on top promoted replica

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: replace-promoted-replica-with-candidate: promoted instance dbvrd43256:3306 requires no further action

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: promoted replica: dbvrd43256:3306

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: RecoverDeadMaster: successfully promoted dbvrd43256:3306

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: - RecoverDeadMaster: promoted server coordinates: binarylog.000016:275

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: - RecoverDeadMaster: will apply MySQL changes to promoted master

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: ReadInstanceClusterAttributes: in co-master topology dbvrd43249:3306 is not in (dbvrd43256:3306, dbvrd43250:3306). Forcing it to become one of them

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: Will reset replica on dbvrd43256:3306

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: ReadInstanceClusterAttributes: in co-master topology dbvrd43249:3306 is not in (dbvrd43256:3306, dbvrd43250:3306). Forcing it to become one of them

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: ReadInstanceClusterAttributes: in co-master topology dbvrd43249:3306 is not in (dbvrd43256:3306, dbvrd43250:3306). Forcing it to become one of them

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: Stopped replication on dbvrd43256:3306, Self:binarylog.000016:275, Exec:binarylog.000002:95895

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: ReadInstanceClusterAttributes: in co-master topology dbvrd43249:3306 is not in (dbvrd43256:3306, dbvrd43250:3306). Forcing it to become one of them

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: Reset slave dbvrd43256:3306

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: - RecoverDeadMaster: applying RESET SLAVE ALL on promoted master: success=true

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: instance dbvrd43256:3306 read_only: false

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: - RecoverDeadMaster: applying read-only=0 on promoted master: success=true

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: Writing KV [mysql/master/v8poc:dbvrd43256:3306 mysql/master/v8poc/hostname:dbvrd43256 mysql/master/v8poc/port:3306 mysql/master/v8poc/ipv4:10.203.9.82 mysql/master/v8poc/ipv6:]

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: dial tcp 10.203.28.192:3306: connect: connection refused

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: Distributing KV [mysql/master/v8poc:dbvrd43256:3306 mysql/master/v8poc/hostname:dbvrd43256 mysql/master/v8poc/port:3306 mysql/master/v8poc/ipv4:10.203.9.82 mysql/master/v8poc/ipv6:]

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: - RecoverDeadMaster: applying read-only=1 on demoted master: success=false

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: dial tcp 10.203.28.192:3306: connect: connection refused

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: - RecoverDeadMaster: updating cluster_alias: dbvrd43249:3306 -> dbvrd43256:3306

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: Running 1 PostMasterFailoverProcesses hooks

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: Running PostMasterFailoverProcesses hook 1 of 1: echo 'Recovered from DeadMaster on dbvrd43249:3306. Failed: dbvrd43249:3306; Promoted: dbvrd43256:3306' >> /tmp/recovery.log

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: CommandRun(echo 'Recovered from DeadMaster on dbvrd43249:3306. Failed: dbvrd43249:3306; Promoted: dbvrd43256:3306' >> /tmp/recovery.log,[])

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: CommandRun/running: bash /tmp/orchestrator-process-cmd-086485526

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: CommandRun:

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: CommandRun successful. exit status 0

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: Completed PostMasterFailoverProcesses hook 1 of 1 in 2.622943ms

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: done running PostMasterFailoverProcesses hooks

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: Topology recovery: {"Id":101,"UID":"1579792574486249888:6f2201fe74aa9035f432734ac251a7a51e955dfe86b0cc49e94bad6a144f5a92","AnalysisEntry":{"AnalyzedInstanceKey":{"Hostname":"dbvrd43249","Port":3306},"AnalyzedInstanceMasterKey":{"Hostname":"","Port":0},"ClusterDetails":{"ClusterName":"dbvrd43249:3306","ClusterAlias":"v8poc","ClusterDomain":"uhc.com","CountInstances":0,"HeuristicLag":0,"HasAutomatedMasterRecovery":true,"HasAutomatedIntermediateMasterRecovery":true},"AnalyzedInstanceDataCenter":"ctc","AnalyzedInstanceRegion":"","AnalyzedInstancePhysicalEnvironment":"","IsMaster":true,"IsCoMaster":false,"LastCheckValid":false,"LastCheckPartialSuccess":false,"CountReplicas":2,"CountValidReplicas":2,"CountValidReplicatingReplicas":0,"CountReplicasFailingToConnectToMaster":2,"CountDowntimedReplicas":0,"ReplicationDepth":0,"SlaveHosts":[{"Hostname":"dbvrd43250","Port":3306},{"Hostname":"dbvrd43256","Port":3306}],"IsFailingToConnectToMaster":false,"Analysis":"DeadMaster","Description":"Master cannot be reached by orchestrator and none of its replicas is replicating","StructureAnalysis":null,"IsDowntimed":false,"IsReplicasDowntimed":false,"DowntimeEndTimestamp":"","DowntimeRemainingSeconds":0,"IsBinlogServer":false,"PseudoGTIDImmediateTopology":false,"OracleGTIDImmediateTopology":true,"MariaDBGTIDImmediateTopology":false,"BinlogServerImmediateTopology":false,"CountLoggingReplicas":2,"CountStatementBasedLoggingReplicas":0,"CountMixedBasedLoggingReplicas":2,"CountRowBasedLoggingReplicas":0,"CountDistinctMajorVersionsLoggingReplicas":1,"CountDelayedReplicas":0,"CountLaggingReplicas":0,"IsActionableRecovery":true,"ProcessingNodeHostname":"dbvrd43251","ProcessingNodeToken":"00474e6c532b9d665ee92456d9d5ae4c8cebaa8119739f1d0160aacc3fca2f7f","CountAdditionalAgreeingNodes":0,"StartActivePeriod":"","SkippableDueToDowntime":false,"GTIDMode":"ON","MinReplicaGTIDMode":"ON","MaxReplicaGTIDMode":"ON","MaxReplicaGTIDErrant":"","CommandHint":"","IsReadOnly":false},"SuccessorKey":{"Hostname":"dbvrd43256","Port":3306},"SuccessorAlias":"dbvrd43256","IsActive":false,"IsSuccessful":true,"LostReplicas":[],"ParticipatingInstanceKeys":[],"AllErrors":[],"RecoveryStartTimestamp":"","RecoveryEndTimestamp":"","ProcessingNodeHostname":"","ProcessingNodeToken":"","Acknowledged":false,"AcknowledgedAt":"","AcknowledgedBy":"","AcknowledgedComment":"","LastDetectionId":0,"RelatedRecoveryId":0,"Type":"MasterRecovery","RecoveryType":"MasterRecoveryGTID"}

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: Running 1 PostFailoverProcesses hooks

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: Running PostFailoverProcesses hook 1 of 1: echo '(for all types) Recovered from DeadMaster on dbvrd43249:3306. Failed: dbvrd43249:3306; Successor: dbvrd43256:3306' >> /tmp/recovery.log

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: CommandRun(echo '(for all types) Recovered from DeadMaster on dbvrd43249:3306. Failed: dbvrd43249:3306; Successor: dbvrd43256:3306' >> /tmp/recovery.log,[])

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: CommandRun/running: bash /tmp/orchestrator-process-cmd-454742653

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: CommandRun:

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: CommandRun successful. exit status 0

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: Completed PostFailoverProcesses hook 1 of 1 in 2.594447ms

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: done running PostFailoverProcesses hooks

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: Waiting for 0 postponed functions

Jan 23 09:16:15 dbvrd43251 orchestrator[14351]: topology_recovery: Executed 0 postponed functions

Jan 23 09:16:26 dbvrd43251 orchestrator[14351]: dial tcp 10.203.28.192:3306: connect: connection refused

Shlomi Noach

unread,
Jan 23, 2020, 10:43:47 AM1/23/20
to wayne.l...@gmail.com, orchestrator-mysql
it seemed like dbvrd43256 was promoted, and dbvrd43250 became its replica? Seems like this was successful? Perhaps I'm not reading this right but I don't see the problem here. 

I am a bit confused on why orchestrator things there's a co-master setup.

To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-my...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-my...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/orchestrator-mysql/f0b315d0-f45a-4ae6-9a9f-a8d34a116508%40googlegroups.com.

wayne.l...@gmail.com

unread,
Jan 23, 2020, 10:51:41 AM1/23/20
to orchestrator-mysql
You are so nice to assist me.

So we are testing this for a solution. I was wondering, do we need to acknowledge and clear each failure before we do the next test? I am going to keep testing, and get more logs for you. 
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-mysql+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "orchestrator-mysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orchestrator-mysql+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages