PreventCrossDataCenterMasterFailover: will not promote server

67 views
Skip to first unread message

Juan Pablo Otero

unread,
Jul 27, 2020, 5:31:45 PM7/27/20
to orchestrator-mysql
Hi,

During a failover, i had the following error:
2020-07-27 12:25:30 ERROR RecoverDeadMaster: failed mariadb-lab-02:33033 promotion; PreventCrossDataCenterMasterFailover: will not promote server in DC1 when failed server in
All nodes in the topology has the same DC, This value is obtained using de the variable DetectDataCenterQuery in orchestrator.conf.json:

"DetectDataCenterQuery": "select ifnull(max(datacenter), '') as datacenter from `orchestrator`.`datacenter` where cluster_name=(select ifnull(max(cluster_name), '') as cluster_name from `orchestrator`.`cluster` where anchor=1) and hostname=@@hostname"

Is this error possible, having all nodes on the same DC? The first error indicates that the datacenter value for the failed master could not be obtained (DetectDataCenterQuery: invalid connection)
Is this behavior correct? If orchestrator cannot get the datacenter value of a node, does not use a cached value? (Maybe I erroneously assume behavior that's not as expected)


* This a part of orchestrator.log filtered with the failover secuence:
...
2020-07-27 12:24:53 ERROR ReadTopologyInstance(mariadb-lab-00:33033) DetectDataCenterQuery: invalid connection
2020-07-27 12:24:53 WARNING discoverInstance exceeded InstancePollSeconds for mariadb-lab-00:33033, took 7.3000s
2020-07-27 12:25:04 WARNING  DiscoverInstance(mariadb-lab-00:33033) instance is nil in 0.022s (Backend: 0.016s, Instance: 0.007s), error=dial tcp 10.194.128.104:33033: connect: connection refused
2020-07-27 12:25:05 WARNING executeCheckAndRecoverFunction: ignoring analysisEntry that has no action plan: FirstTierSlaveFailingToConnectToMaster; key: mariadb-lab-02:33033
2020-07-27 12:25:05 WARNING executeCheckAndRecoverFunction: ignoring analysisEntry that has no action plan: FirstTierSlaveFailingToConnectToMaster; key: mariadb-lab-01:33033
2020-07-27 12:25:05 INFO executeCheckAndRecoverFunction: proceeding with DeadMaster detection on mariadb-lab-00:33033; isActionable?: true; skipProcesses: false
2020-07-27 12:25:05 INFO topology_recovery: detected DeadMaster failure on mariadb-lab-00:33033
...
2020-07-27 12:25:30 INFO auditType:take-master instance:mariadb-lab-02:33033 cluster:mariadb-lab-00:33033 message:took master: mariadb-lab-01:33033
2020-07-27 12:25:30 INFO topology_recovery: success promoting mariadb-lab-02:33033 over mariadb-lab-01:33033
2020-07-27 12:25:30 INFO topology_recovery: promoted replica: mariadb-lab-02:33033
2020-07-27 12:25:30 DEBUG replace-promoted-replica-with-candidate: relocating replicas of mariadb-lab-01:33033 below mariadb-lab-02:33033
2020-07-27 12:25:30 DEBUG replace-promoted-replica-with-candidate: + relocated 0 replicas of mariadb-lab-01:33033 below mariadb-lab-02:33033
2020-07-27 12:25:30 INFO topology_recovery: relocated 0 replicas of mariadb-lab-01:33033 below mariadb-lab-02:33033
2020-07-27 12:25:30 INFO auditType:recover-dead-master instance:mariadb-lab-00:33033 cluster:mariadb-lab-00:33033 message:promoted replica: mariadb-lab-02:33033
2020-07-27 12:25:30 INFO topology_recovery: RecoverDeadMaster: failed mariadb-lab-02:33033 promotion; PreventCrossDataCenterMasterFailover: will not promote server in nap when failed server in
2020-07-27 12:25:30 INFO Topology recovery: {"Id":46,"UID":"1595867105702738350:7d01d02edd6600cf8e5af9d782d6592c17cf23e095b56345905bf89023d3f8cb","AnalysisEntry":{"AnalyzedInstanceKey":{"Hostname":"mariadb-lab-00","Port":33033},"AnalyzedInstanceMasterKey":{"Hostname":"","Port":0},"ClusterDetails":{"ClusterName":"mariadb-lab-00:33033","ClusterAlias":"mariadb-lab","ClusterDomain":"mariadb-lab-00:33033","CountInstances":0,"HeuristicLag":0,"HasAutomatedMasterRecovery":true,"HasAutomatedIntermediateMasterRecovery":false},"AnalyzedInstanceDataCenter":"","AnalyzedInstanceRegion":"","AnalyzedInstancePhysicalEnvironment":"","IsMaster":true,"IsCoMaster":false,"LastCheckValid":false,"LastCheckPartialSuccess":false,"CountReplicas":2,"CountValidReplicas":2,"CountValidReplicatingReplicas":0,"CountReplicasFailingToConnectToMaster":2,"CountDowntimedReplicas":0,"ReplicationDepth":0,"SlaveHosts":[{"Hostname":"mariadb-lab-01","Port":33033},{"Hostname":"mariadb-lab-02","Port":33033}],"IsFailingToConnectToMaster":false,"Analysis":"DeadMaster","Description":"Master cannot be reached by orchestrator and none of its replicas is replicating","StructureAnalysis":null,"IsDowntimed":false,"IsReplicasDowntimed":false,"DowntimeEndTimestamp":"","DowntimeRemainingSeconds":0,"IsBinlogServer":false,"PseudoGTIDImmediateTopology":false,"OracleGTIDImmediateTopology":false,"MariaDBGTIDImmediateTopology":true,"BinlogServerImmediateTopology":false,"CountLoggingReplicas":2,"CountStatementBasedLoggingReplicas":0,"CountMixedBasedLoggingReplicas":0,"CountRowBasedLoggingReplicas":2,"CountDistinctMajorVersionsLoggingReplicas":1,"CountDelayedReplicas":0,"CountLaggingReplicas":0,"IsActionableRecovery":true,"ProcessingNodeHostname":"orc-lab-02","ProcessingNodeToken":"954c4d4385695f0c2042c2bd99cc9e0c3b52150d8cae3048c49801f560b78aae","CountAdditionalAgreeingNodes":0,"StartActivePeriod":"","SkippableDueToDowntime":false,"GTIDMode":"","MinReplicaGTIDMode":"","MaxReplicaGTIDMode":"","MaxReplicaGTIDErrant":"","CommandHint":"","IsReadOnly":false},"SuccessorKey":null,"SuccessorAlias":"","IsActive":false,"IsSuccessful":false,"LostReplicas":[],"ParticipatingInstanceKeys":[],"AllErrors":[],"RecoveryStartTimestamp":"","RecoveryEndTimestamp":"","ProcessingNodeHostname":"","ProcessingNodeToken":"","Acknowledged":false,"AcknowledgedAt":"","AcknowledgedBy":"","AcknowledgedComment":"","LastDetectionId":0,"RelatedRecoveryId":0,"Type":"MasterRecovery","RecoveryType":"MasterRecoveryGTID"}
2020-07-27 12:25:30 INFO topology_recovery: Running PostUnsuccessfulFailoverProcesses hook 1 of 2: echo '(for all types) Unsuccessful Failover Processes from DeadMaster on mariadb-lab Failed: mariadb-lab-00:33033; Successor: {successorHost}:{successorPort}' &>> /var/log/orchestrator/recovery.log
2020-07-27 12:25:30 INFO CommandRun(echo '(for all types) Unsuccessful Failover Processes from DeadMaster on mariadb-lab Failed: mariadb-lab-00:33033; Successor: {successorHost}:{successorPort}' &>> /var/log/orchestrator/recovery.log,[])
2020-07-27 12:25:30 INFO topology_recovery: Running PostUnsuccessfulFailoverProcesses hook 2 of 2: /var/lib/orchestrator/scripts/PostUnsuccessfulFailoverProcesses -c mariadb-lab -f mariadb-lab-00 -t DeadMaster -d "Master cannot be reached by orchestrator and none of its replicas is replicating" &>> /var/log/orchestrator/PostUnsuccessfulFailoverProcesses.log
2020-07-27 12:25:30 INFO CommandRun(/var/lib/orchestrator/PostUnsuccessfulFailoverProcesses -c mariadb-lab -f mariadb-lab-00 -t DeadMaster -d "Master cannot be reached by orchestrator and none of its replicas is replicating" &>> /var/log/orchestrator/PostUnsuccessfulFailoverProcesses.log,[])
2020-07-27 12:25:30 INFO auditType:emergently-read-topology-instance instance:mariadb-lab-00:33033 cluster:mariadb-lab-00:33033 message:FirstTierSlaveFailingToConnectToMaster
2020-07-27 12:25:30 INFO topology_recovery: Executed postponed functions: replace-promoted-replica-with-candidate: relocate replicas of mariadb-lab-01:33033
2020-07-27 12:25:30 ERROR RecoverDeadMaster: failed mariadb-lab-02:33033 promotion; PreventCrossDataCenterMasterFailover: will not promote server in DC1 when failed server in
2020-07-27 12:25:32 INFO auditType:emergently-read-topology-instance instance:mariadb-lab-00:33033 cluster:mariadb-lab-00:33033 message:FirstTierSlaveFailingToConnectToMaster
2020-07-27 12:25:33 INFO auditType:emergently-read-topology-instance instance:mariadb-lab-00:33033 cluster:mariadb-lab-00:33033 message:FirstTierSlaveFailingToConnectToMaster
...

Thanks in advance.
Reply all
Reply to author
Forward
0 new messages