nodetool repair crashed

Vasily Popkov

<p1dl0.vp@gmail.com>

unread,

Sep 26, 2019, 7:52:26 AM9/26/19

to ScyllaDB users

we run test 
6 machines with 7t ssd
and 2t data on it

and complite quest in 

https://docs.scylladb.com/operating-scylla/procedures/cluster-management/replace_dead_node/

we replace .11 to .14 node with 

replace_address_first_boot: 172.24.221.11

UN  172.24.221.10  1.8 TB     256          77.3%             
UN  172.24.221.12  1.53 TB    256          73.7%             
UN  172.24.221.13  1.87 TB    256          74.8%             
UN  172.24.221.14  8.82 GB    256          74.2%             

we kill 
UN  172.24.221.11  
 all ok BUT 

but nodetool repair crashed 

Sep 26 14:42:56 plck-pixeldb-05 scylla: [shard 15] rpc - client 172.24.221.11: fail to connect: Connection refused

Sep 26 14:42:56 plck-pixeldb-05 scylla: [shard 19] repair - Checksum of range [-9167189631503328870, -9167077041512644608) on 172.24.221.11 failed: seastar::rpc::closed_error (connection is closed)

Sep 26 14:42:56 plck-pixeldb-05 scylla: [shard 19] repair - Checksum or sync of partial range failed

Sep 26 14:42:56 plck-pixeldb-05 scylla: [shard 19] rpc - client 172.24.221.11: fail to connect: Connection refused

Sep 26 14:42:56 plck-pixeldb-05 scylla: [shard 23] repair - Checksum of range (-8792884207473498521, -8792827912478156390) on 172.24.221.11 failed: seastar::rpc::closed_error (connection is closed)

Sep 26 14:42:56 plck-pixeldb-05 scylla: [shard 23] repair - Checksum or sync of partial range failed

Sep 26 14:42:56 plck-pixeldb-05 scylla: [shard 23] rpc - client 172.24.221.11: fail to connect: Connection refused

Sep 26 14:42:56 plck-pixeldb-05 scylla: [shard 23] repair - Checksum of range [-8788436902841470156, -8788324312850785894) on 172.24.221.11 failed: seastar::rpc::closed_error (connection is closed)

Sep 26 14:42:56 plck-pixeldb-05 scylla: [shard 23] repair - Checksum or sync of partial range failed

Sep 26 14:42:56 plck-pixeldb-05 scylla: [shard 23] rpc - client 172.24.221.11: fail to connect: Connection refused

Sep 26 14:42:56 plck-pixeldb-05 scylla: [shard 39] repair - Checksum of range [-8786635462990521958, -8786522872999837696) on 172.24.221.11 failed: seastar::rpc::closed_error (connection is closed)

Vasily Popkov

<p1dl0.vp@gmail.com>

unread,

Sep 26, 2019, 7:55:04 AM9/26/19

to ScyllaDB users

we kill this node

why process continue connect to it after restart

in connect string we have

172.24.221.10:9042,172.24.221.12:9042

2019/09/26 14:52:29 logger.go:27: Found invalid peer '[HostInfo connectAddress="172.24.221.11" peer="172.24.221.11" rpc_address="<nil>" broadcast_address="<nil>" preferred_ip="<nil>" connect_addr="172.24.221.11" connect_addr_source="connect_address" port=9042 data_centre="" rack="" host_id="00000000-0000-0000-0000-000000000000" version="v0.0.0" state=UP num_tokens=256]' Likely due to a gossip or snitch issue, this host will be ignored

четверг, 26 сентября 2019 г., 14:52:26 UTC+3 пользователь Vasily Popkov написал:

Asias He

<asias@scylladb.com>

unread,

Sep 26, 2019, 8:03:46 AM9/26/19

to ScyllaDB users

What is the ip address of plck-pixeldb-05? I assume it is another node. If you kill 172.24.221.11 while repair runs and repairs with node 172.24.221.11, it is expected to see the failure of repair.

By "repair crashed", I guess you mean repair failed.

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/7ddf8a25-de4e-4c29-a72b-c08dccca447e%40googlegroups.com.

--

Asias

Vasily Popkov

<p1dl0.vp@gmail.com>

unread,

Sep 26, 2019, 8:09:55 AM9/26/19

to ScyllaDB users

.11 it's pixeldb-02
. 14 it's pixeldb-05

Vasily Popkov

<p1dl0.vp@gmail.com>

unread,

Sep 26, 2019, 8:13:39 AM9/26/19

to ScyllaDB users

please read first message please.

четверг, 26 сентября 2019 г., 15:03:46 UTC+3 пользователь Asias He написал:

To unsubscribe from this group and stop receiving emails from it, send an email to scyllad...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/7ddf8a25-de4e-4c29-a72b-c08dccca447e%40googlegroups.com.

--
Asias

IMG_20190926_143551.jpg

Asias He

<asias@scylladb.com>

unread,

Sep 26, 2019, 8:20:26 AM9/26/19

to ScyllaDB users

On Thu, Sep 26, 2019 at 8:13 PM Vasily Popkov <p1dl...@gmail.com> wrote:

please read first message please.

which message?

To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/1d255ac0-bbf1-4b8b-b21a-2005ad4e5d9a%40googlegroups.com.

--

Asias

Vasily Popkov

<p1dl0.vp@gmail.com>

unread,

Sep 26, 2019, 8:39:44 AM9/26/19

to ScyllaDB users

https://groups.google.com/d/msg/scylladb-users/Yfse6ewtCcs/Ac6p9kMZCAAJ

четверг, 26 сентября 2019 г., 15:20:26 UTC+3 пользователь Asias He написал:

To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/1d255ac0-bbf1-4b8b-b21a-2005ad4e5d9a%40googlegroups.com.

--
Asias

Asias He

<asias@scylladb.com>

unread,

Sep 26, 2019, 8:52:22 AM9/26/19

to ScyllaDB users

On Thu, Sep 26, 2019, 20:39 Vasily Popkov <p1dl...@gmail.com> wrote:

https://groups.google.com/d/msg/scylladb-users/Yfse6ewtCcs/Ac6p9kMZCAAJ

Of course I read this message. Ask me to read again does not help. It is not clear when you run repair, during the replace operation or after it is done. The original message also does not say on which node you run the repair.

To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/f2bc73ab-b7fa-445c-9288-694a018470af%40googlegroups.com.

Vasily Popkov

<p1dl0.vp@gmail.com>

unread,

Sep 26, 2019, 9:11:51 AM9/26/19

to ScyllaDB users

i run repair on new node

how it talks in

https://docs.scylladb.com/operating-scylla/procedures/cluster-management/replace_dead_node/

all step by step

четверг, 26 сентября 2019 г., 15:52:22 UTC+3 пользователь Asias He написал:

To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/f2bc73ab-b7fa-445c-9288-694a018470af%40googlegroups.com.

Asias He

<asias@scylladb.com>

unread,

Sep 27, 2019, 3:03:16 AM9/27/19

to ScyllaDB users

On Thu, Sep 26, 2019 at 9:11 PM Vasily Popkov <p1dl...@gmail.com> wrote:

i run repair on new node
how it talks in
https://docs.scylladb.com/operating-scylla/procedures/cluster-management/replace_dead_node/

all step by step

It does not sense because after the replace operation is done. The new node will not see the old node.

```

Sep 26 14:42:56 plck-pixeldb-05 scylla: [shard 19] repair - Checksum of range [-9167189631503328870, -9167077041512644608) on 172.24.221.11 failed: seastar::rpc::closed_error (connection is closed)

```

The log suggests plck-pixeldb-05(new node) tries to repair with the (old node which is replaced).

Please provide the output of

```

nodetool gossipinfo

nodetool ring

```

from all nodes.

If you run repair again on the new node. Does repair succeed?

--

Asias

Vasily Popkov

<p1dl0.vp@gmail.com>

unread,

Sep 27, 2019, 7:08:31 AM9/27/19

to ScyllaDB users

we clean new node and re run repair

and process is very slow

on a foundation db repair is more faster

1 day repair we have

-- Address Load Tokens Owns (effective) Host ID Rack

UN 172.24.221.10 2.61 TB 256 75.8% b7a00907-6054-4ee1-83c7-a023f74a1894 rack1

UN 172.24.221.12 2.29 TB 256 71.8% 2cdfe372-0b3f-49a8-b16b-148efc29ebad rack1

UN 172.24.221.13 2.69 TB 256 76.9% fd34f5b4-6a09-4020-bd58-d5d720e74867 rack1

UN 172.24.221.14 1.79 TB 256 75.5% 4d083b8c-83ab-4003-84d2-0a6508df01be rack1

test writes 10k op/s

repair in progress

very very slow

how to speed up the process.

we have 10g net and 2g/s ssd speed

why rebalance is so slowww???

пятница, 27 сентября 2019 г., 10:03:16 UTC+3 пользователь Asias He написал:

Reply all

Reply to author

Forward