One of the nodes in a 3-node cluster fails to join the cluster after a restart

679 views
Skip to first unread message

Shobhana

<shobhana@quickride.in>
unread,
Feb 16, 2022, 12:53:22 AM2/16/22
to ScyllaDB users
Hi,

We have a 3 node cluster (single DC) running on Ubuntu 18.04. We had to upgrade all nodes in the cluster (increase the disk space) and hence stopped node-3 first, upgraded the instance and attempted to start it back. It started up after 300 seconds (after failing to connect to the seed nodes), but it has started as a separate cluster. nodetool status on node-3 shows nodes 1 and 2 as down (DN). nodetool status on nodes 1 and 2 show node-3 as down (DN) and 1 and 2 as up (UN).

node-3:~$ nodetool status
Datacenter: scylla_data_center
==============================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns    Host ID                               Rack
DN  a.b.c.176  ?          256          ?       b1642399-9596-4bc7-8f01-d875f0584e77  scylla_rack
DN  a.b.c.177  ?          256          ?       e7f6e8f4-c07e-47e7-946d-ba76a272776f  scylla_rack
UN  a.b.c.198  126.83 GB  256          ?       b29ae510-1edc-4ae4-bd4d-fea2de229750  scylla_rack

node-2:~$ nodetool status
Datacenter: scylla_data_center
==============================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns    Host ID                               Rack
UN  172.16.121.176  138.98 GB  256          ?       b1642399-9596-4bc7-8f01-d875f0584e77  scylla_rack
UN  172.16.121.177  128.26 GB  256          ?       e7f6e8f4-c07e-47e7-946d-ba76a272776f  scylla_rack
DN  172.16.121.198  127.66 GB  256          ?       b29ae510-1edc-4ae4-bd4d-fea2de229750  scylla_rack

node-1:~$ nodetool describecluster
Cluster Information:
        Name: Scylla_Cluster
        Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
        DynamicEndPointSnitch: disabled
        Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
        Schema versions:
                224341ff-6870-30a9-b9c2-977007111e00: [172.16.121.177, 172.16.121.176]

How can I analyze what is the cause for this? Please help.

Logs from syslog:
Feb 16 05:24:40 e2e-71-39 scylla:  [shard 0] gossip - Connect seeds again ... (299 seconds passed)
Feb 16 05:24:41 e2e-71-39 scylla:  [shard 0] storage_service - Shadow round failed with std::runtime_error (Unable to gossip with any seeds (ShadowRound)), checking remote features with system tables only
Feb 16 05:24:41 e2e-71-39 scylla:  [shard 0] gossip - Node a.b.c.176 does not contain SUPPORTED_FEATURES in gossip, using features saved in system table, features={COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, NONFROZEN_UDTS, PER_TABLE_PARTITIONERS, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, UNBOUNDED_RANGE_TOMBSTONES, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}
Feb 16 05:24:41 e2e-71-39 scylla:  [shard 0] gossip - Node a.b.c.177 does not contain SUPPORTED_FEATURES in gossip, using features saved in system table, features={COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, NONFROZEN_UDTS, PER_TABLE_PARTITIONERS, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, UNBOUNDED_RANGE_TOMBSTONES, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}
Feb 16 05:24:41 e2e-71-39 scylla:  [shard 0] gossip - Feature check passed. Local node a.b.c.198 features = {COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, NONFROZEN_UDTS, PER_TABLE_PARTITIONERS, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, UNBOUNDED_RANGE_TOMBSTONES, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, NONFROZEN_UDTS, PER_TABLE_PARTITIONERS, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, UNBOUNDED_RANGE_TOMBSTONES, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}
Feb 16 05:24:41 e2e-71-39 scylla:  [shard 0] storage_service - Restarting a node in NORMAL status
Feb 16 05:24:41 e2e-71-39 scylla:  [shard 0] database - Schema version changed to e0df65b5-0794-39a7-b95f-58df4b065456
Feb 16 05:24:41 e2e-71-39 scylla:  [shard 0] storage_service - Starting up server gossip
Feb 16 05:24:41 e2e-71-39 scylla:  [shard 4] compaction - Compacting [/var/lib/scylla/data/system/local-7ad54392bcdd35a684174e047860b377/mc-1576-big-Data.db:level=0, /var/lib/scylla/data/system/local-7ad54392bcdd35a684174e047860b377/mc-1564-big-Data.db:level=0, ]
Feb 16 05:24:41 e2e-71-39 scylla:  [shard 4] compaction - Compacted 2 sstables to [/var/lib/scylla/data/system/local-7ad54392bcdd35a684174e047860b377/mc-1588-big-Data.db:level=0, ]. 18905 bytes to 12286 (~64% of original) in 54ms = 0.22MB/s. ~256 total partitions merged to 1.
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] gossip - No gossip backlog; proceeding
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] gossip - Node a.b.c.177 does not contain SUPPORTED_FEATURES in gossip, using features saved in system table, features={COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, NONFROZEN_UDTS, PER_TABLE_PARTITIONERS, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, UNBOUNDED_RANGE_TOMBSTONES, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] gossip - Node a.b.c.176 does not contain SUPPORTED_FEATURES in gossip, using features saved in system table, features={COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, NONFROZEN_UDTS, PER_TABLE_PARTITIONERS, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, UNBOUNDED_RANGE_TOMBSTONES, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature COMPUTED_COLUMNS is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature CORRECT_COUNTER_ORDER is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature CORRECT_NON_COMPOUND_RANGE_TOMBSTONES is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature CORRECT_STATIC_COMPACT_IN_MC is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature COUNTERS is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature DIGEST_INSENSITIVE_TO_EXPIRY is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature DIGEST_MULTIPARTITION_READ is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature HINTED_HANDOFF_SEPARATE_CONNECTION is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature INDEXES is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature LARGE_PARTITIONS is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature LWT is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature MATERIALIZED_VIEWS is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature MC_SSTABLE_FORMAT is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature NONFROZEN_UDTS is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature PER_TABLE_PARTITIONERS is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature RANGE_TOMBSTONES is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature ROLES is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature ROW_LEVEL_REPAIR is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature SCHEMA_TABLES_V3 is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature STREAM_WITH_RPC_STREAM is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature TRUNCATION_TABLE is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] system_keyspace - Got cluster agreement on truncation table feature. Removing legacy records.
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature UNBOUNDED_RANGE_TOMBSTONES is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature VIEW_VIRTUAL_COLUMNS is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature WRITE_FAILURE_REPLY is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] features - Feature XXHASH is enabled
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] system_keyspace - Legacy records deleted.
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] init - starting system distributed keyspace
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] storage_service - Using saved tokens {981359583743598234, 914643842153801541, 9010590306704705381, 8978199967542709796, 8823634646786289099, 8796898105082258136, 8735550987922770836, 8716862277181876955, 8409105211318848833, 8377749213602607469, 8368791134507409577, 8279480215642442291, 8226961371913237686, 818381322587006401, 8015510533693741419, 7861744753126011670, 7421129492214354842, 7214971269794117680, 7134961561941729532, 6923641246652266152, 6776104411408891615, 6695671858569773702, 6693992036470326300, 6629009954816350525, 6548856613543647119, 6389196725922261142, 6163371136002122168, 6133043365797828528, 585649641794656355, 5738577221751324225, 5738248321319428204, 5705043832523143106, 5668216508314928077, -4557764141784572195, -1644045961680410408, -7337241265398333233, -843795661146775055, -6302540392497142197, -5833743234028982803, -634705565880395230, -4958499958501506050, -5721899000934394495, -5662226590528258745, -5863550396435924306, 7405132976089028394, -1483338121755382290, -5599651753802208674, -5497935070567060245, -3287241193203949349, -5480724239573015458, -1768913238822924244, -4313920312240004082, -7742269385760950209, 1544104103817535351, -5151210696294284952, -4945370009733546535, -4736232796159710631, 1297334564753141491, -7314483105723765852, 9155415407225907444, -8238109602967798162, -4055299875677524731, -6225897398415140129, -4464250693382842193, -4139729551782525945, -4101875752422732840, -4036058009622413045, -3887089173831867306, -5949312886484657205, -4302826775874228972, 4860225828015527099, -4176767073335097420, -2529787299615517648, -2329419726267129792, 5787431231402222772, -3301430467437933287, -2723124257130426774, -4926578624848477465, 1550847609577933440, -1916156740536922152, 7192748273674972870, -2878270740901973302, -2485123870808844519, 5413629940898600116, -6037901051187131011, -3817121916838484812, -688606435673695662, 8247134048417264463, -5322928324362422086, 4767049724017491120, 3591737829389263358, -38199495392631203, -2115530598853125641, -1170251189550237325, 1751289155726942140, -214498304572404919, 2198492660263745610, -1896243756967692565, 7738028362278836365, -4681327999969571817, -6307738690076505542, -1180128817601957970, -1247294995264304572, -1276317355778147260, -2680829330196023582, -6337067260532457853, -2600503851729509055, -6307321555293480304, 9106645826555318592, 9061774124564711711, -3180210182939961985, -2748991993166122115, 3422651550806358808, -3717690279312985028, -3995930376652802796, 8547898083999725455, -1043835251045960321, -2642715296731302325, -5149132603055791464, -3861004888145475545, 9157071785062909445, 9032911546674574035, 3305983580718934862, -1981477152297299877, -1058782092572298128, 7822587515481166661, -1368605066756232610, 6455877678004719235, -1505957052533080322, -3814037245040110293, -1566295029164646162, -8371461501928103821, -8897274447384318218, -5316828967805419512, 5745484637631331604, -2678930901140445786, -5136598720956101600, -663801927112347845, -6739283263411693711, 1749206398117464736, -3287143799763388087, -2598234819930738431, -4082234711130884355, -9209787921805962716, 3010101704879133373, -1890238888283138344, -7761958934793014252, -8172674007873106523, -739123274252957022, 3265753453995204282, -2915992172628861199, -3667783081752373538, -6501236203315282760, 8931169065516500523, 5533486863258912614, -4919654394039939902, -3395543248856121953, -4970767617117537957, 423393396060418976, -6562888536124186750, 8620915609834225408, -768438852331168683, 3409000770900983238, 4432443222351601018, 6418490594918376620, 4497995499735905959, -6636956636360348345, -6665320506662753578, -5305167683859848462, 3179171907273646765, -2087973484340967048, -6667986014290191580, 4156577717195337895, -3034639017138832321, -6687940317915620534, -6728397100427179352, 1889439023822131084, -3660003847260911784, -6749908253833615569, -4826694828955081272, -6808535190151648746, -5882251120294352269, -6614373203765155209, -7048971061032113655, -7117961121566796258, 2142964552067073128, -7223572769323795131, 9197226287558170824, -7375309003048271910, 85073829603582502, 1283453278591625454, -7492617147315786480, 1388676553039166118, -7682939894669667664, -8137558297361045108, -4041246415814689872, -8345018714663555081, -2783546563957281471, -8791971152774881062, 3101156310359519828, -8380646024208568298, -8670052273494495724, -8670748993509014291, 9144871785596149756, -8689026017675532195, -879492027863855393, -2539032810908864131, -8984181823612140772, -3122711311429953308, -9015557707182576916, -9104214729339575714, 6752494505809604050, -960614186048457472, -6498505452476582165, -7925996298075583396, 3875232282660286389, 1260220645871937735, -1413254618414978453, 1480566843288401576, 4398483809836019694, -6782135258406156880, -7612520485750890120, 2371298605054022667, 6035261906275539740, -784248190169511198, 2508674685785936233, 2659675948217842415, -5053961063111697967, 2724877485090627519, 1864306643481604233, 2876361943714477319, 2888513980906119706, 2895077279165987494, 2916672112075092115, -6837236943880328093, 30200552277592369, 3085579160270189285, 3103812910155078786, 2063390855526692088, 3183184163975239163, -5370407169213155335, 3239379584315210024, 330763901795203252, 3619116542448789429, 4090208882828735628, -3244616077337361530, 4485785961072150850, -5805291180915492837, 4942364365952111313, 8098572163962579030, 5091212439202166244, 660083585913023736, 534770842014008233, 3650585966843974153, 5406231698757539610, 5483719450797787800}
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] database - Schema version changed to 224341ff-6870-30a9-b9c2-977007111e00
Feb 16 05:24:53 e2e-71-39 scylla: message repeated 2 times: [  [shard 0] database - Schema version changed to 224341ff-6870-30a9-b9c2-977007111e00]
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 4] compaction - Compacting [/var/lib/scylla/data/system/local-7ad54392bcdd35a684174e047860b377/mc-1600-big-Data.db:level=0, /var/lib/scylla/data/system/local-7ad54392bcdd35a684174e047860b377/mc-1588-big-Data.db:level=0, ]
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] storage_service - Node a.b.c.198 state jump to normal
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] storage_service - Remove node a.b.c.198 from pending replacing endpoint
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 4] compaction - Compacted 2 sstables to [/var/lib/scylla/data/system/local-7ad54392bcdd35a684174e047860b377/mc-1624-big-Data.db:level=0, ]. 18365 bytes to 12119 (~65% of original) in 43ms = 0.27MB/s. ~256 total partitions merged to 1.
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 4] compaction - Compacting [/var/lib/scylla/data/system/local-7ad54392bcdd35a684174e047860b377/mc-1612-big-Data.db:level=0, /var/lib/scylla/data/system/local-7ad54392bcdd35a684174e047860b377/mc-1624-big-Data.db:level=0, ]
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] storage_service - NORMAL: node is now in normal status
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] cdc - No generation seen during startup.
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] init - starting tracing
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] storage_service - SSTable data integrity checker is disabled.
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] init - starting batchlog manager
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] init - starting load meter
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] init - starting cf cache hit rate calculator
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] init - starting view update backlog broker
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 0] init - Waiting for gossip to settle before accepting client requests...
Feb 16 05:24:53 e2e-71-39 scylla:  [shard 4] compaction - Compacted 2 sstables to [/var/lib/scylla/data/system/local-7ad54392bcdd35a684174e047860b377/mc-1636-big-Data.db:level=0, ]. 18198 bytes to 12115 (~66% of original) in 37ms = 0.31MB/s. ~256 total partitions merged to 1.
Feb 16 05:25:05 e2e-71-39 scylla:  [shard 0] gossip - No gossip backlog; proceeding
Feb 16 05:25:05 e2e-71-39 scylla:  [shard 0] init - allow replaying hints
Feb 16 05:25:05 e2e-71-39 scylla:  [shard 0] init - Launching generate_mv_updates for non system tables
Feb 16 05:25:05 e2e-71-39 scylla:  [shard 0] init - starting the view builder
Feb 16 05:25:05 e2e-71-39 scylla:  [shard 1] compaction - Compacting [/var/lib/scylla/data/system/truncated-38c19fd0fb863310a4b70d0cc66628aa/mc-229-big-Data.db:level=0, /var/lib/scylla/data/system/truncated-38c19fd0fb863310a4b70d0cc66628aa/mc-217-big-Data.db:level=0, ]
Feb 16 05:25:05 e2e-71-39 scylla:  [shard 0] init - starting native transport
Feb 16 05:25:05 e2e-71-39 scylla:  [shard 0] storage_service - Starting listening for CQL clients on a.b.c.198:9042 (unencrypted)
Feb 16 05:25:05 e2e-71-39 scylla:  [shard 0] storage_service - Thrift server listening on a.b.c.198:9160 ...
Feb 16 05:25:05 e2e-71-39 scylla:  [shard 0] init - serving
Feb 16 05:25:05 e2e-71-39 scylla:  [shard 0] init - Scylla version 4.1.7-0.20200918.2251a1c577 initialization completed.
Feb 16 05:25:05 e2e-71-39 scylla:  [shard 1] compaction - Compacted 2 sstables to [/var/lib/scylla/data/system/truncated-38c19fd0fb863310a4b70d0cc66628aa/mc-241-big-Data.db:level=0, ]. 10782 bytes to 5549 (~51% of original) in 50ms = 0.11MB/s. ~256 total partitions merged to 1.

Asias He

<asias@scylladb.com>
unread,
Feb 16, 2022, 1:16:40 AM2/16/22
to ScyllaDB users
Hello,

Can you share the scylla.yaml for all 3 nodes?

Please make sure network access between node 198 and 176/177 are not blocked. It looks like node 198 can not talk to node 176 and 177 for some reason.

Your scylla version (Scylla version 4.1.7-0.20200918) is too old.

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/5b09a1fd-5cab-48ea-88b1-c022caf339b9n%40googlegroups.com.


--
Asias

Shobhana .

<shobhana@quickride.in>
unread,
Feb 16, 2022, 1:21:39 AM2/16/22
to scylladb-users@googlegroups.com
Thanks for the quick response, Asias!
I have attached the scylla.yaml for all 3 nodes.

I am able to connect via cqlsh to 176 and 177 from 198. So I assume the network between these nodes is fine?


You received this message because you are subscribed to a topic in the Google Groups "ScyllaDB users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scylladb-users/VQzG8bUR0vo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/CAO1GqFZZO-w796g4DioSb04ah_2BsuuPj89Gs56COk-bfuiKaQ%40mail.gmail.com.
node2-scylla.yaml
node1-scylla.yaml
node3-scylla.yaml

Asias He

<asias@scylladb.com>
unread,
Feb 16, 2022, 1:29:58 AM2/16/22
to ScyllaDB users
Check port tcp 7000 which is used by node to node communication.

What version are node1 and node2  using at the moment? It seems you want to upgrade node 3 to 4.1.7?

Your config looks sane.

Shobhana

<shobhana@quickride.in>
unread,
Feb 16, 2022, 1:35:41 AM2/16/22
to ScyllaDB users
I am able to telnet to nodes 1 and 2 from node3 successfully.

node3:~$ telnet a.b.c.177 7000
Trying a.b.c.177...
Connected to a.b.c.177.
Escape character is '^]'.

All nodes are running the same version - 4.1.7-0.20200918.2251a1c577

We wanted to increase the hard disk space and hence triggered these steps!

Asias He

<asias@scylladb.com>
unread,
Feb 16, 2022, 1:46:39 AM2/16/22
to ScyllaDB users
OK. If you can tolerate shutdown the cluster. You can try rolling restart the node1 then restart node3.

You can also try restarting node3 with 4.2.

4.1 is way too old and is not supported any more. Once you can boot up all 3 nodes, please upgrade following the doc as soon as possible.



--
Asias

Shobhana

<shobhana@quickride.in>
unread,
Feb 16, 2022, 1:52:40 AM2/16/22
to ScyllaDB users
I forgot to mention another change - adding here just in case it matters:  Initially when the cluster was created more than a year ago, seeds list was set to only node-1 in all 3 nodes. Before the node-3 was stopped last night, the seeds list was updated to node-1,node-2,node3 in all nodes, but nodes 1 and 2 were not restarted. Only node3 was restarted after upgrading the hard disk.

Now I changed the seeds list back to node1 in all the nodes and attempted to restart scylla service in node3, but it fails and I see the following logs in syslog. My understanding of seed node was that it is useful only when a node starts up and is used to decide which node to talk to to understand the cluster topology. Since nodes 1 and 2 are running fine, I assumed that changing the seeds list back to node1 would not make any difference, but it did! What is the reason?

Feb 16 06:44:22 e2e-71-39 scylla:  [shard 0] gossip - Connect seeds again ... (298 seconds passed)
Feb 16 06:44:23 e2e-71-39 scylla:  [shard 0] gossip - Connect seeds again ... (299 seconds passed)
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down gossiping
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] gossip - gossip is already stopped
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down gossiping was successful
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down storage service notifications
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down storage service notifications was successful
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down repair message handlers
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down repair message handlers was successful
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down repair service
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down repair service was successful
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down streaming service
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down streaming service was successful
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down storage proxy RPC verbs
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down storage proxy RPC verbs was successful
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down cdc
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down cdc was successful
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down migration manager
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] migration_manager - stopping migration service
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 1] migration_manager - stopping migration service
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 5] migration_manager - stopping migration service
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 9] migration_manager - stopping migration service
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 8] migration_manager - stopping migration service
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 3] migration_manager - stopping migration service
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 10] migration_manager - stopping migration service
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 7] migration_manager - stopping migration service
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 4] migration_manager - stopping migration service
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 2] migration_manager - stopping migration service
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 11] migration_manager - stopping migration service
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 6] migration_manager - stopping migration service
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down migration manager was successful
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down database
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] compaction_manager - Asked to stop
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] compaction_manager - Stopped
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 4] compaction_manager - Asked to stop
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 2] compaction_manager - Asked to stop
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 3] compaction_manager - Asked to stop
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 11] compaction_manager - Asked to stop
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 1] compaction_manager - Asked to stop
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 9] compaction_manager - Asked to stop
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 4] compaction_manager - Stopped
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 11] compaction_manager - Stopped
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 7] compaction_manager - Asked to stop
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 10] compaction_manager - Asked to stop
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 5] compaction_manager - Asked to stop
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 9] compaction_manager - Stopped
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 2] compaction_manager - Stopped
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 3] compaction_manager - Stopped
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 1] compaction_manager - Stopped
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 10] compaction_manager - Stopped
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 7] compaction_manager - Stopped
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 5] compaction_manager - Stopped
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 8] compaction_manager - Asked to stop
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 8] compaction_manager - Stopped
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 6] compaction_manager - Asked to stop
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 6] compaction_manager - Stopped
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down database: waiting for background jobs...
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down database was successful
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down migration manager notifier
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down migration manager notifier was successful
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down prometheus API server
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down prometheus API server was successful
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down sighup
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Shutting down sighup was successful
Feb 16 06:44:24 e2e-71-39 scylla:  [shard 0] init - Startup failed: std::runtime_error (Unable to gossip with any seeds (ShadowRound))
Feb 16 06:44:24 e2e-71-39 systemd[1]: scylla-server.service: Main process exited, code=exited, status=1/FAILURE
Feb 16 06:44:24 e2e-71-39 systemd[1]: Stopped Run Scylla Housekeeping daily mode.
Feb 16 06:44:24 e2e-71-39 systemd[1]: Stopped Run Scylla Housekeeping restart mode.
Feb 16 06:44:25 e2e-71-39 systemd[1]: scylla-server.service: Failed with result 'exit-code'.

Asias He

<asias@scylladb.com>
unread,
Feb 16, 2022, 1:58:40 AM2/16/22
to ScyllaDB users
On Wed, Feb 16, 2022 at 2:52 PM Shobhana <shob...@quickride.in> wrote:
I forgot to mention another change - adding here just in case it matters:  Initially when the cluster was created more than a year ago, seeds list was set to only node-1 in all 3 nodes. Before the node-3 was stopped last night, the seeds list was updated to node-1,node-2,node3 in all nodes, but nodes 1 and 2 were not restarted. Only node3 was restarted after upgrading the hard disk.

Now I changed the seeds list back to node1 in all the nodes and attempted to restart scylla service in node3, but it fails and I see the following logs in syslog. My understanding of seed node was that it is useful only when a node starts up and is used to decide which node to talk to to understand the cluster topology. Since nodes 1 and 2 are running fine, I assumed that changing the seeds list back to node1 would not make any difference, but it did! What is the reason?

Feb 16 06:44:22 e2e-71-39 scylla:  [shard 0] gossip - Connect seeds again ... (298 seconds passed)
Feb 16 06:44:23 e2e-71-39 scylla:  [shard 0] gossip - Connect seeds again ... (299 seconds passed)

I noticed the seed change in node3. It looks ok. But here, node3 can not talk to either node1 or node2 for some reason. Did you see any warnings on node1 and node2?

Asias He

<asias@scylladb.com>
unread,
Feb 16, 2022, 2:04:56 AM2/16/22
to ScyllaDB users
On Wed, Feb 16, 2022 at 2:58 PM Asias He <as...@scylladb.com> wrote:


On Wed, Feb 16, 2022 at 2:52 PM Shobhana <shob...@quickride.in> wrote:
I forgot to mention another change - adding here just in case it matters:  Initially when the cluster was created more than a year ago, seeds list was set to only node-1 in all 3 nodes. Before the node-3 was stopped last night, the seeds list was updated to node-1,node-2,node3 in all nodes, but nodes 1 and 2 were not restarted. Only node3 was restarted after upgrading the hard disk.

Now I changed the seeds list back to node1 in all the nodes and attempted to restart scylla service in node3, but it fails and I see the following logs in syslog. My understanding of seed node was that it is useful only when a node starts up and is used to decide which node to talk to to understand the cluster topology. Since nodes 1 and 2 are running fine, I assumed that changing the seeds list back to node1 would not make any difference, but it did! What is the reason?

Feb 16 06:44:22 e2e-71-39 scylla:  [shard 0] gossip - Connect seeds again ... (298 seconds passed)
Feb 16 06:44:23 e2e-71-39 scylla:  [shard 0] gossip - Connect seeds again ... (299 seconds passed)

I noticed the seed change in node3. It looks ok. But here, node3 can not talk to either node1 or node2 for some reason. Did you see any warnings on node1 and node2?

Can you try:

1) Run on node1

$ nodetool gossipinfo|grep gen

Get the generation number of node1: X
Y = x + 100

2) Start note 3 adding the following option
--force-gossip-generation Y


--
Asias

Shobhana

<shobhana@quickride.in>
unread,
Feb 16, 2022, 2:50:02 AM2/16/22
to ScyllaDB users
Thank you so much for your time and effort, Asias! 
I figured out the problem - I was always checking whether node3 is able to connect to node1 and node2 and it was always successful; so I did not bother to check whether nodes 1 and 2 are able to connect to node3. I just checked the same now and found that nodes 1 and 2 were not able to connect to node3! We use shorewall firewall service and I tried by restarting that service - now nodes 1 and 2 are able to connect to node3! Node3 has joined the cluster successfully. I now have to repeat the same steps on nodes1 and 2.

Is the procedure to upgrade the Scylla version simple? If it is, I'll plan for this upgrade soon.

Thanks again,
Shobhana

Asias He

<asias@scylladb.com>
unread,
Feb 16, 2022, 3:09:02 AM2/16/22
to ScyllaDB users
You are welcome. Good to know the issue is fixed ;)

Here you can find the upgrade procedure here:


Shobhana .

<shobhana@quickride.in>
unread,
Feb 16, 2022, 3:26:32 AM2/16/22
to scylladb-users@googlegroups.com
Thanks Asias, I'll check it out.

Should we upgrade one version at a time? i.e, upgrade from 4.1 to 4.2 and then to 4.3 and so on?

Asias He

<asias@scylladb.com>
unread,
Feb 16, 2022, 3:28:34 AM2/16/22
to ScyllaDB users
Yes, it is best to follow the  one major release at a time procedure. It is the safest.

Reply all
Reply to author
Forward
0 new messages