The alternator latency goes up when rebuilding a dead node.

19 views
Skip to first unread message

Yang Liyuan

<yangly0815@gmail.com>
unread,
Jul 5, 2022, 4:27:14 AM7/5/22
to ScyllaDB users

The cluster was deployed in 3 DC and 3 nodes each DC. The disk of one node was failure, and we replace a new disk and rebuid the data of the node. 
After we re-create the raid and restart scylla-server to rebuild the data, the alternator latency increased by 10ms ~ 20ms on each node.
This may have an impact to customer service in production and I think the impact should be only in self DC other than to impact all the DC's nodes.

Dor Laor

<dor@scylladb.com>
unread,
Jul 5, 2022, 5:06:52 AM7/5/22
to ScyllaDB users
The CQL api handle this better as the client is topology aware and receives the
cache utilization numbers, thus the client can choose a coordinator node closer
to it (it's part of the load balancing algorithm selection on the client). The alternator 
api doesn't have this.

In scylla 5.0 and in the master branch we also deal with streaming better, the new io
scheduler tracks priorities better and the latency increase on node operations is
2-5ms 

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/0470f29c-75ba-45d4-ba56-7c991df0c165n%40googlegroups.com.

Avi Kivity

<avi@scylladb.com>
unread,
Jul 5, 2022, 6:50:08 AM7/5/22
to scylladb-users@googlegroups.com, Yang Liyuan

Can you give information about the disk type (make, model)? And provide screenshots from the Advanced dashboard in monitoring, showing queue delay and scheduling group utilization?


What version are you using? If it's a disk problem, then 5.0 provides better isolation.

Yang Liyuan

<yangly0815@gmail.com>
unread,
Jul 7, 2022, 10:32:47 PM7/7/22
to ScyllaDB users
Each node has 8*8TB nvme and the queue delay and scheduling group as fallowings:
queque delay and scheduler info.zip

Avi Kivity

<avi@scylladb.com>
unread,
Jul 11, 2022, 7:16:36 AM7/11/22
to scylladb-users@googlegroups.com, Yang Liyuan

Please switch to per-shard view, the aggregated views can hide problems in a single shard.


What's the make and model of the disks?

Reply all
Reply to author
Forward
0 new messages