How to trace network latency

hor...@gmail.com

<horschi@gmail.com>

unread,

Aug 6, 2021, 4:50:34 AM8/6/21

to ScyllaDB users

Hi,

in one installation (which is on Azure) we have latency issues. I do not think its CPU or disk related. But instead it seems to me that the network latency is to blame.

Are there any good metrics in scylla to see the network latency and its effects?

A regular ping is reporting 1.5ms for the internal network. Which is much higher than the 0.2 ms for other installations. Does anyone use Microsoft Azure, is this normal?

regards,

Christian

Tzach Livyatan

<tzach@scylladb.com>

unread,

Aug 6, 2021, 5:04:11 AM8/6/21

to ScyllaDB users

As a starting point, have Scylla Monitoring stack up and running

https://monitoring.docs.scylladb.com/stable/

The Overview and detailed dashboards have latency charts.

Tzach

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/02737969-d09a-4445-b5fd-eb2d4fe21592n%40googlegroups.com.

Avi Kivity

<avi@scylladb.com>

unread,

Aug 7, 2021, 5:50:02 AM8/7/21

to scylladb-users@googlegroups.com, hor...@gmail.com

On 8/6/21 11:50 AM, hor...@gmail.com wrote:
> Hi,
>
> in one installation (which is on Azure) we have latency issues. I do
> not think its CPU or disk related. But instead it seems to me that the
> network latency is to blame.
>
> Are there any good metrics in scylla to see the network latency and
> its effects?
>
>

I don't think we have good tools to distinguish network latency from
other sources. Maybe the closest thing is to use tracing. It will report
the time at which Scylla thinks it sent a message, and the time at which
the other node thinks it receives it.

To get good results, you'll need the clocks of all machines to be
accurately synchronized with ntp (a good idea anyway, but here you need
to be even more sure), and a lightly loaded or even idle cluster. If the
cluster is busy, then some latency will be introduced by internal
queueing. It's a good idea to measure using different nodes as
coordinators, to see if there is some asymmetry.

> A regular ping is reporting 1.5ms for the internal network. Which is
> much higher than the 0.2 ms for other installations. Does anyone use
> Microsoft Azure, is this normal?

I don't have much experience with Azure, but often availability zones
have ~1ms latency. It's good practice to spread your cluster across
multiple availability zones (and have Scylla recognize them as racks) so
a loss of a zone doesn't impact the cluster. So many clusters do work
with 1ms within-region latency.

Reply all

Reply to author

Forward