Hey,
my config is for all Nodes:
cluster.initial_master_nodes:
- "indexer-m"
- "indexer-d1"
- "indexer-d2"
cluster.name: "opensearch"
discovery.seed_hosts:
- "172.16.2.137"
- "172.16.2.135"
- "172.16.2.136"
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role node.roles cluster_manager name
172.16.2.137 41 95 2 0.20 0.22 0.19 dimr cluster_manager,data,ingest,remote_cluster_client - indexer-m
172.16.2.136 35 91 1 0.07 0.24 0.38 dimr cluster_manager,data,ingest,remote_cluster_client - indexer-d2
172.16.2.135 54 96 1 0.35 0.33 0.38 dimr cluster_manager,data,ingest,remote_cluster_client * indexer-d1
Logs after stopping the d2:
indexer-m:
[2023-09-07T15:34:18,251][INFO ][o.o.c.s.ClusterApplierService] [indexer-m] removed {{indexer-d2}{64jy12WdR0eV2_fBYU5wqA}{56AxYY-jRgi1r-PUyaApRw}{172.16.2.136}{172.16.2.136:9300}{dimr}{shard_indexing_pressure_enabled=true}}, term: 68, version: 15939, reason: ApplyCommitRequest{term=68, version=15939, sourceNode={indexer-d1}{gQQ4mVt5TAOT6IwxyAf5Zw}{k9_YNutaQemKYUazhJ_7Og}{172.16.2.135}{172.16.2.135:9300}{dimr}{shard_indexing_pressure_enabled=true}}
[2023-09-07T15:34:18,265][INFO ][o.o.a.c.ADClusterEventListener] [indexer-m] Cluster node changed, node removed: true, node added: false
[2023-09-07T15:34:18,265][INFO ][o.o.a.c.HashRing ] [indexer-m] Node removed: [64jy12WdR0eV2_fBYU5wqA]
[2023-09-07T15:34:18,266][INFO ][o.o.a.c.HashRing ] [indexer-m] Remove data node from AD version hash ring: 64jy12WdR0eV2_fBYU5wqA
[2023-09-07T15:34:18,266][INFO ][o.o.a.c.ADClusterEventListener] [indexer-m] Hash ring build result: true
[2023-09-07T15:34:18,267][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [indexer-m] Detected cluster change event for destination migration
[2023-09-07T15:34:18,285][INFO ][o.o.i.s.IndexShard ] [indexer-m] [.kibana_2][0] primary-replica resync completed with 0 operations
[2023-09-07T15:34:18,287][INFO ][o.o.i.s.IndexShard ] [indexer-m] [.opensearch-observability][0] detected new primary with primary term [36], global checkpoint [-1], max_seq_no [-1]
[2023-09-07T15:34:18,288][INFO ][o.o.i.s.IndexShard ] [indexer-m] [.opendistro_security][0] detected new primary with primary term [52], global checkpoint [147], max_seq_no [147]
[2023-09-07T15:34:18,311][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [indexer-m] Detected cluster change event for destination migration
[2023-09-07T15:34:39,133][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [indexer-m] Detected cluster change event for destination migration
[2023-09-07T15:34:39,487][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [indexer-m] Detected cluster change event for destination migration
[2023-09-07T15:35:18,209][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [indexer-m] Detected cluster change event for destination migration
[2023-09-07T15:35:18,340][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [indexer-m] Detected cluster change event for destination migration
[2023-09-07T15:35:18,406][INFO ][o.o.i.r.RecoverySourceHandler] [indexer-m] [.kibana_2][0][recover to indexer-d1] finalizing recovery took [6.4ms]
indexer-d1:
[2023-09-07T15:34:18,164][INFO ][o.o.c.c.FollowersChecker ] [indexer-d1] FollowerChecker{discoveryNode={indexer-d2}{64jy12WdR0eV2_fBYU5wqA}{56AxYY-jRgi1r-PUyaApRw}{172.16.2.136}{172.16.2.136:9300}{dimr}{shard_indexing_pressure_enabled=true}, failureCountSinceLastSuccess=0, [cluster.fault_detection.follower_check.retry_count]=3} disconnected
[2023-09-07T15:34:18,165][INFO ][o.o.c.c.FollowersChecker ] [indexer-d1] FollowerChecker{discoveryNode={indexer-d2}{64jy12WdR0eV2_fBYU5wqA}{56AxYY-jRgi1r-PUyaApRw}{172.16.2.136}{172.16.2.136:9300}{dimr}{shard_indexing_pressure_enabled=true}, failureCountSinceLastSuccess=0, [cluster.fault_detection.follower_check.retry_count]=3} marking node as faulty
[2023-09-07T15:34:18,188][INFO ][o.o.c.r.a.AllocationService] [indexer-d1] updating number_of_replicas to [1] for indices [.opensearch-observability, .opendistro_security]
[2023-09-07T15:34:18,206][INFO ][o.o.c.s.MasterService ] [indexer-d1] node-left[{indexer-d2}{64jy12WdR0eV2_fBYU5wqA}{56AxYY-jRgi1r-PUyaApRw}{172.16.2.136}{172.16.2.136:9300}{dimr}{shard_indexing_pressure_enabled=true} reason: disconnected], term: 68, version: 15939, delta: removed {{indexer-d2}{64jy12WdR0eV2_fBYU5wqA}{56AxYY-jRgi1r-PUyaApRw}{172.16.2.136}{172.16.2.136:9300}{dimr}{shard_indexing_pressure_enabled=true}}
[2023-09-07T15:34:18,270][INFO ][o.o.c.s.ClusterApplierService] [indexer-d1] removed {{indexer-d2}{64jy12WdR0eV2_fBYU5wqA}{56AxYY-jRgi1r-PUyaApRw}{172.16.2.136}{172.16.2.136:9300}{dimr}{shard_indexing_pressure_enabled=true}}, term: 68, version: 15939, reason: Publication{term=68, version=15939}
[2023-09-07T15:34:18,285][INFO ][o.o.a.c.ADClusterEventListener] [indexer-d1] Cluster node changed, node removed: true, node added: false
[2023-09-07T15:34:18,285][INFO ][o.o.a.c.HashRing ] [indexer-d1] Node removed: [64jy12WdR0eV2_fBYU5wqA]
[2023-09-07T15:34:18,286][INFO ][o.o.a.c.HashRing ] [indexer-d1] Remove data node from AD version hash ring: 64jy12WdR0eV2_fBYU5wqA
[2023-09-07T15:34:18,286][INFO ][o.o.a.c.ADClusterEventListener] [indexer-d1] Hash ring build result: true
[2023-09-07T15:34:18,286][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [indexer-d1] Detected cluster change event for destination migration
[2023-09-07T15:34:18,287][INFO ][o.o.c.r.DelayedAllocationService] [indexer-d1] scheduling reroute for delayed shards in [59.8s] (148 delayed shards)
[2023-09-07T15:34:18,300][INFO ][o.o.i.s.IndexShard ] [indexer-d1] [.opensearch-observability][0] primary-replica resync completed with 0 operations
[2023-09-07T15:34:18,300][INFO ][o.o.i.s.IndexShard ] [indexer-d1] [.opendistro_security][0] primary-replica resync completed with 0 operations
[2023-09-07T15:34:18,316][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [indexer-d1] Detected cluster change event for destination migration
[2023-09-07T15:34:37,817][INFO ][o.o.j.s.JobSweeper ] [indexer-d1] Running full sweep
[2023-09-07T15:34:39,085][WARN ][o.o.c.r.a.AllocationService] [indexer-d1] [.opendistro_security][0] marking unavailable shards as stale: [HBavrEhcSFSvhiIkFtHjlw]
[2023-09-07T15:34:39,144][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [indexer-d1] Detected cluster change event for destination migration
[2023-09-07T15:34:39,439][WARN ][o.o.c.r.a.AllocationService] [indexer-d1] [.opensearch-observability][0] marking unavailable shards as stale: [aAHzDp9ITwCDzrXl8hbnAA]
[2023-09-07T15:34:39,497][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [indexer-d1] Detected cluster change event for destination migration
[2023-09-07T15:34:49,056][WARN ][o.o.s.a.BackendRegistry ] [indexer-d1] No 'Authorization' header, send 401 and 'WWW-Authenticate Basic'
[2023-09-07T15:34:49,073][WARN ][o.o.s.a.BackendRegistry ] [indexer-d1] No 'Authorization' header, send 401 and 'WWW-Authenticate Basic'
[2023-09-07T15:35:18,217][INFO ][o.o.p.PluginsService ] [indexer-d1] PluginService:onIndexModule index:[.kibana_2/kVu9440FTv-oM7IseJuWqg]
[2023-09-07T15:35:18,237][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [indexer-d1] Detected cluster change event for destination migration
[2023-09-07T15:35:18,294][WARN ][o.o.c.r.a.AllocationService] [indexer-d1] [.kibana_2][0] marking unavailable shards as stale: [635P-g0jT6qAFocpGDQZhQ]
[2023-09-07T15:35:18,350][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [indexer-d1] Detected cluster change event for destination migration
[2023-09-07T15:35:18,442][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [indexer-d1] Detected cluster change event for destination migration
indexer-d2:
[2023-09-07T15:33:16,679][INFO ][o.o.j.s.JobSweeper ] [indexer-d2] Running full sweep
[2023-09-07T15:34:18,157][INFO ][o.o.n.Node ] [indexer-d2] stopping ...
[2023-09-07T15:34:18,158][INFO ][o.o.s.a.r.AuditMessageRouter] [indexer-d2] Closing AuditMessageRouter
[2023-09-07T15:34:18,159][INFO ][o.o.s.a.s.SinkProvider ] [indexer-d2] Closing DebugSink
[2023-09-07T15:34:18,161][INFO ][o.o.c.c.Coordinator ] [indexer-d2] cluster-manager node [{indexer-d1}{gQQ4mVt5TAOT6IwxyAf5Zw}{k9_YNutaQemKYUazhJ_7Og}{172.16.2.135}{172.16.2.135:9300}{dimr}{shard_indexing_pressure_enabled=true}] failed, restarting discovery
org.opensearch.transport.NodeDisconnectedException: [indexer-d1][172.16.2.135:9300][disconnected] disconnected
[2023-09-07T15:34:18,547][INFO ][o.o.n.Node ] [indexer-d2] stopped
[2023-09-07T15:34:18,547][INFO ][o.o.n.Node ] [indexer-d2] closing ...
[2023-09-07T15:34:18,552][INFO ][o.o.s.a.i.AuditLogImpl ] [indexer-d2] Closing AuditLogImpl
[2023-09-07T15:34:18,557][INFO ][o.o.n.Node ] [indexer-d2] closed
Regards
Nico