The alternator latency goes up when rebuilding a dead node.

Yang Liyuan

<yangly0815@gmail.com>

unread,

Jul 5, 2022, 4:27:14 AM7/5/22

to ScyllaDB users

The cluster was deployed in 3 DC and 3 nodes each DC. The disk of one node was failure, and we replace a new disk and rebuid the data of the node.

After we re-create the raid and restart scylla-server to rebuild the data, the alternator latency increased by 10ms ~ 20ms on each node.

This may have an impact to customer service in production and I think the impact should be only in self DC other than to impact all the DC's nodes.

Dor Laor

<dor@scylladb.com>

unread,

Jul 5, 2022, 5:06:52 AM7/5/22

to ScyllaDB users

The CQL api handle this better as the client is topology aware and receives the

cache utilization numbers, thus the client can choose a coordinator node closer

to it (it's part of the load balancing algorithm selection on the client). The alternator

api doesn't have this.

In scylla 5.0 and in the master branch we also deal with streaming better, the new io

scheduler tracks priorities better and the latency increase on node operations is

2-5ms

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/0470f29c-75ba-45d4-ba56-7c991df0c165n%40googlegroups.com.

Avi Kivity

<avi@scylladb.com>

unread,

Jul 5, 2022, 6:50:08 AM7/5/22

to scylladb-users@googlegroups.com, Yang Liyuan

Can you give information about the disk type (make, model)? And provide screenshots from the Advanced dashboard in monitoring, showing queue delay and scheduling group utilization?

What version are you using? If it's a disk problem, then 5.0 provides better isolation.

Yang Liyuan

<yangly0815@gmail.com>

unread,

Jul 7, 2022, 10:32:47 PM7/7/22

to ScyllaDB users

Each node has 8*8TB nvme and the queue delay and scheduling group as fallowings:

queque delay and scheduler info.zip

Avi Kivity

<avi@scylladb.com>

unread,

Jul 11, 2022, 7:16:36 AM7/11/22

to scylladb-users@googlegroups.com, Yang Liyuan

Please switch to per-shard view, the aggregated views can hide problems in a single shard.

What's the make and model of the disks?

To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/c6cd853b-6dbe-4e4c-a18c-b778f2e79989n%40googlegroups.com.

Reply all

Reply to author

Forward