On 8/6/21 11:50 AM,
hor...@gmail.com wrote:
> Hi,
>
> in one installation (which is on Azure) we have latency issues. I do
> not think its CPU or disk related. But instead it seems to me that the
> network latency is to blame.
>
> Are there any good metrics in scylla to see the network latency and
> its effects?
>
>
I don't think we have good tools to distinguish network latency from
other sources. Maybe the closest thing is to use tracing. It will report
the time at which Scylla thinks it sent a message, and the time at which
the other node thinks it receives it.
To get good results, you'll need the clocks of all machines to be
accurately synchronized with ntp (a good idea anyway, but here you need
to be even more sure), and a lightly loaded or even idle cluster. If the
cluster is busy, then some latency will be introduced by internal
queueing. It's a good idea to measure using different nodes as
coordinators, to see if there is some asymmetry.
> A regular ping is reporting 1.5ms for the internal network. Which is
> much higher than the 0.2 ms for other installations. Does anyone use
> Microsoft Azure, is this normal?
I don't have much experience with Azure, but often availability zones
have ~1ms latency. It's good practice to spread your cluster across
multiple availability zones (and have Scylla recognize them as racks) so
a loss of a zone doesn't impact the cluster. So many clusters do work
with 1ms within-region latency.