End to End p90 latency is too long(> 40ms) observed from client side.

Cong Guo

<cong.guo@smartnews.com>

unread,

Nov 3, 2021, 11:25:47 PM11/3/21

to ScyllaDB users

Hi experts, I got a weird issue, I tried to run 10000 dynamo queries on scylla on the pod locally, but the e2e p99 latency is above 40ms, while observed from monitoring, the p99 latency is less than 2ms. How could I debug such issue?context:
scylla deployed on eks cluster with 4x i3.4xlarge nodes.monitoring from client side running on the same pod with the coordinator:

monitoring from scylla side.

https://files.slack.com/files-pri/T2NLYA215-F02L4SNSCH1/image.png

monitoring from client side:
https://files.slack.com/files-pri/T2NLYA215-F02LHETCWC9/image.png

testing code (I just scan some items from ddb and query them on scylla):

public static void readRandom(String sourceTableName, AmazonDynamoDBAsync sourceCli, String destTableName, AmazonDynamoDBAsync destCli) {
    try {
      DynamoDB ddb = new DynamoDB(sourceCli);
      Table ddbTable = ddb.getTable(sourceTableName);
      ItemCollection<ScanOutcome> scan = ddbTable.scan();
      IteratorSupport<Item, ScanOutcome> ddbIter = scan.iterator();
      long i = 0;
      BufferedWriter file = new BufferedWriter(new FileWriter("/tmp/perf.csv", false));
      while (ddbIter.hasNext()) {
        i++;
        Item item = ddbIter.next();
        String id = item.getString("id");
        GetItemRequest request = new GetItemRequest().withTableName(destTableName).withKey(buildKey(id, "BATCH"));
        long start = System.currentTimeMillis();
        GetItemResult item1 = destCli.getItem(request);
        long duration = System.currentTimeMillis() - start;
        file.write(String.format("%d\n", duration));
        if (i == 10000) {
          break;
        }
      }
      file.close();
    } catch (Exception e) {
      e.printStackTrace();
    }
  }

发送信息给 general

Cong Guo

<cong.guo@smartnews.com>

unread,

Nov 3, 2021, 11:27:28 PM11/3/21

to ScyllaDB users

To sum, e2e p90 latency > 40ms with client running on the same pod/node with the coordinator, while observed from scylla monitoring p90 latency < 2ms. inconsistent

Cong Guo

<cong.guo@smartnews.com>

unread,

Nov 4, 2021, 4:38:20 AM11/4/21

to ScyllaDB users

Can anyone help on this?

Dor Laor

<dor@scylladb.com>

unread,

Nov 4, 2021, 4:44:26 AM11/4/21

to ScyllaDB users

Try to run the client on a pod on another server, Scylla doesn't like

to share resources in general (unless you pin everything perfectly.

Also, your test isn't parallel, so you send one request at a time,

it's fine for a test but it wouldn't represent a real case and will leave

scylla underutilized

You can use scylla bench, a test code in go to run many test patterns

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/0d6792f6-31ca-4da0-83db-c3cd8980a462n%40googlegroups.com.

Cong Guo

<cong.guo@smartnews.com>

unread,

Nov 4, 2021, 5:31:10 AM11/4/21

to ScyllaDB users

Actually I did run client on another pod and met the same problem. I know that scylla is not under heavy load in my case, but I didn't expect the p99 latency > 40ms while on the Server side monitoring(prom + grafana) shows the p99 latency is only 2ms, so I want to find where is the problem.

To exclude the impact brought by k8s, I installed scylla on 4x i3.4xlarge ec2 directly and repeat the > 40ms latency as well. Do you need extra info?

Cong Guo

<cong.guo@smartnews.com>

unread,

Nov 4, 2021, 5:35:43 AM11/4/21

to ScyllaDB users

Any configuration info do you need?

Piotr Sarna

<sarna@scylladb.com>

unread,

Nov 4, 2021, 5:40:16 AM11/4/21

to scylladb-users@googlegroups.com

Hi Cong,

I see that you're using alternator, so the thing that could really help us would be query tracing. For alternator, the only way is to turn on tracing node-wide, which you can do by Scylla's REST API:

curl -X POST localhost:10000/storage_service/trace_probability?probability=1

Either run it on each node locally, or just replace `localhost` with each node's address.

Then, run your code again.

After it's done, you can browse the traces with a helper browser script I published here:https://alternator-traces.sarna.dev/ - it lets you configure the endpoint and will extract tracing information from a node. Tracing contains elapsed time for each step, so it will greatly help in seeing what happens in Scylla - after looking at tracing we'll know whether Scylla spent ~1ms or ~40ms on serving the query. If it spent ~1ms, then the problem is on the driver side. Here's an example output from tracing - source_elapsed is the number of microseconds that passed since the query started:

session_id	event_id	activity	scylla_parent_id	scylla_span_id	source	source_elapsed	thread
----
905cdba0-3d52-11ec-bd5d-331fd6e9788e	905cffda-3d52-11ec-bd5d-331fd6e9788e	Scan	0	253230276965299	127.0.0.1	10	shard 0
905cdba0-3d52-11ec-bd5d-331fd6e9788e	905d056f-3d52-11ec-bd5d-331fd6e9788e	Performing a database query	0	253230276965299	127.0.0.1	153	shard 0
905cdba0-3d52-11ec-bd5d-331fd6e9788e	905d06f5-3d52-11ec-bd5d-331fd6e9788e	read_data: querying locally	0	253230276965299	127.0.0.1	192	shard 0
905cdba0-3d52-11ec-bd5d-331fd6e9788e	905d0744-3d52-11ec-bd5d-331fd6e9788e	Start querying token range (-inf, {minimum token, end}]	0	253230276965299	127.0.0.1	200	shard 0
905cdba0-3d52-11ec-bd5d-331fd6e9788e	905d096a-3d52-11ec-bd5d-331fd6e9788e	Creating shard reader on shard: 0	0	253230276965299	127.0.0.1	255	shard 0
905cdba0-3d52-11ec-bd5d-331fd6e9788e	905d09e0-3d52-11ec-bd5d-331fd6e9788e	Scanning cache for range (-inf, {minimum token, end}] and slice {(-inf, +inf)}	0	253230276965299	127.0.0.1	267	shard 0
905cdba0-3d52-11ec-bd5d-331fd6e9788e	905d0af7-3d52-11ec-bd5d-331fd6e9788e	Page stats: 0 partition(s), 0 static row(s) (0 live, 0 dead), 0 clustering row(s) (0 live, 0 dead) and 0 range tombstone(s)	0	253230276965299	127.0.0.1	295	shard 0
905cdba0-3d52-11ec-bd5d-331fd6e9788e	905d18c5-3d52-11ec-bd5d-331fd6e9788e	Querying is done	0	253230276965299	127.0.0.1	648	shard 0
905cdba0-3d52-11ec-bd5d-331fd6e9788e	905d19d9-3d52-11ec-bd5d-331fd6e9788e	read_data: querying locally	0	253230276965299	127.0.0.1	676	shard 0
905cdba0-3d52-11ec-bd5d-331fd6e9788e	905d19f3-3d52-11ec-bd5d-331fd6e9788e	Start querying token range ({minimum token, end}, +inf)	0	253230276965299	127.0.0.1	678	shard 0
905cdba0-3d52-11ec-bd5d-331fd6e9788e	905d1a22-3d52-11ec-bd5d-331fd6e9788e	Creating shard reader on shard: 0	0	253230276965299	127.0.0.1	683	shard 0
905cdba0-3d52-11ec-bd5d-331fd6e9788e	905d1a48-3d52-11ec-bd5d-331fd6e9788e	Scanning cache for range ({minimum token, end}, +inf) and slice {(-inf, +inf)}	0	253230276965299	127.0.0.1	687	shard 0
905cdba0-3d52-11ec-bd5d-331fd6e9788e	905d1ad6-3d52-11ec-bd5d-331fd6e9788e	Page stats: 0 partition(s), 0 static row(s) (0 live, 0 dead), 0 clustering row(s) (0 live, 0 dead) and 0 range tombstone(s)	0	253230276965299	127.0.0.1	701	shard 0
905cdba0-3d52-11ec-bd5d-331fd6e9788e	905d20a5-3d52-11ec-bd5d-331fd6e9788e	Querying is done	0	253230276965299	127.0.0.1	850	shard 0
----

To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/22a7a049-bd2a-41f1-9858-aecc67c7ca1cn%40googlegroups.com.

Cong Guo

<cong.guo@smartnews.com>

unread,

Nov 4, 2021, 5:44:22 AM11/4/21

to ScyllaDB users

great, will do that, thx.

Cong Guo

<cong.guo@smartnews.com>

unread,

Nov 4, 2021, 5:56:04 AM11/4/21

to ScyllaDB users

Is there any other way to check the alternator trace log, I cannot connect to the cluster via public ip by https://alternator-traces.sarna.dev/ . It shows "Failed to fetch data: NetworkingError: Network Failure."

On Thursday, November 4, 2021 at 5:40:16 PM UTC+8 sa...@scylladb.com wrote:

Cong Guo

<cong.guo@smartnews.com>

unread,

Nov 4, 2021, 5:58:04 AM11/4/21

to ScyllaDB users

see logs in journalctl

Nov 04 09:47:16 ip-172-16-89-251 scylla[15656]: [shard 3] tracing - Setting tracing probability to 1.0 (normalized 281474976710656)

Nov 04 09:47:16 ip-172-16-89-251 scylla[15656]: [shard 7] tracing - Setting tracing probability to 1.0 (normalized 281474976710656)

Nov 04 09:47:16 ip-172-16-89-251 scylla[15656]: [shard 9] tracing - Setting tracing probability to 1.0 (normalized 281474976710656)

Nov 04 09:47:16 ip-172-16-89-251 scylla[15656]: [shard 10] tracing - Setting tracing probability to 1.0 (normalized 281474976710656)

Nov 04 09:47:16 ip-172-16-89-251 scylla[15656]: [shard 5] tracing - Setting tracing probability to 1.0 (normalized 281474976710656)

Nov 04 09:47:16 ip-172-16-89-251 scylla[15656]: [shard 13] tracing - Setting tracing probability to 1.0 (normalized 281474976710656)

Nov 04 09:47:16 ip-172-16-89-251 scylla[15656]: [shard 11] tracing - Setting tracing probability to 1.0 (normalized 281474976710656)

Nov 04 09:47:16 ip-172-16-89-251 scylla[15656]: [shard 12] tracing - Setting tracing probability to 1.0 (normalized 281474976710656)

Nov 04 09:47:16 ip-172-16-89-251 scylla[15656]: [shard 8] tracing - Setting tracing probability to 1.0 (normalized 281474976710656)

Nov 04 09:47:16 ip-172-16-89-251 scylla[15656]: [shard 6] tracing - Setting tracing probability to 1.0 (normalized 281474976710656)

Piotr Sarna

<sarna@scylladb.com>

unread,

Nov 4, 2021, 6:03:50 AM11/4/21

to ScyllaDB users

There is - with less convenient output, but it works. Just query the special system table exposed for alternator. Example with aws cli: aws --endpoint-url 'http://127.0.0.1:8000' dynamodb scan --table-name .scylla.alternator.system_traces.events

To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/29f56f10-7243-4d66-9852-c3f2a34d32d7n%40googlegroups.com.

Cong Guo

<cong.guo@smartnews.com>

unread,

Nov 4, 2021, 6:07:27 AM11/4/21

to ScyllaDB users

seems from system_traces.events, will extract the info.

Cong Guo

<cong.guo@smartnews.com>

unread,

Nov 4, 2021, 6:09:11 AM11/4/21

to ScyllaDB users

--------------------------------------+--------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+-----------------+---------------+----------------+---------

acaf0af0-3d55-11ec-b12d-bc6d814a812d | acaf2467-3d55-11ec-b12d-bc6d814a812d | GetItem | 0 | 310160318929845 | 172.16.89.251 | 0 | shard 0

acaf0af0-3d55-11ec-b12d-bc6d814a812d | acaf255b-3d55-11ec-b12d-bc6d814a812d | Creating read executor for token -5751088081278505224 with all: {172.16.89.45, 172.16.89.191, 172.16.89.194} targets: {172.16.89.45} repair decision: NONE | 0 | 310160318929845 | 172.16.89.251 | 24 | shard 0

acaf0af0-3d55-11ec-b12d-bc6d814a812d | acaf257b-3d55-11ec-b12d-bc6d814a812d | read_data: sending a message to /172.16.89.45 | 0 | 310160318929845 | 172.16.89.251 | 27 | shard 0

acaf0af0-3d55-11ec-b12d-bc6d814a812d | acaf3424-3d55-11ec-b12d-bc6d814a812d | read_data: got response from /172.16.89.45 | 0 | 310160318929845 | 172.16.89.251 | 403 | shard 0

Piotr Dulikowski

<piodul@scylladb.com>

unread,

Nov 4, 2021, 6:09:22 AM11/4/21

to scylladb-users@googlegroups.com

This sounds like "the 40 millisecond bug": https://vorner.github.io/2020/11/06/40-ms-bug.html . Your issue might be caused by a bad interaction between the Nagle's algorithm and delayed ack functionality of the TCP, which increases the latency on the client side (more details in the article under the link).

I'm not familiar with the DynamoDB library, but you can see if it is possible to disable either the Nagle's algorithm or delayed acks and try doing that.

To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/d9a7ca4e-d374-46a9-ad89-31e991714d5an%40googlegroups.com.

Piotr Sarna

<sarna@scylladb.com>

unread,

Nov 4, 2021, 6:12:31 AM11/4/21

to scylladb-users@googlegroups.com

Traces indicate that Scylla processed your query in 400microseconds, which translates to sub-1ms latency, as reported by monitoring. It strongly indicates that the problem lies in the driver - in particular, take a look at Piotr Dulikowski's suggestion about the "40ms bug", which comes from using Nagle's algorithm and delayed ack at the same time. If that's indeed the case, the fix is to either disable one of the mechanisms in your system, or at least in the library you're using.

To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/d0dece61-f28f-4ac5-a910-3c01134a16f6n%40googlegroups.com.

Cong Guo

<cong.guo@smartnews.com>

unread,

Nov 4, 2021, 6:22:05 AM11/4/21

to ScyllaDB users

Will investigate the article, thanks a lot to you all!

Dor Laor

<dor@scylladb.com>

unread,

Nov 4, 2021, 12:43:02 PM11/4/21

to ScyllaDB users

Try https://github.com/scylladb/scylla-bench to get around your test client

To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/052e6e4f-3899-4779-a17e-ae1bc2aaf65cn%40googlegroups.com.

Cong Guo

<cong.guo@smartnews.com>

unread,

Nov 4, 2021, 11:46:47 PM11/4/21

to ScyllaDB users

Hi dor,

Thank you. Actually I want to simulate the real scenario in our prod. My client want to access scylla via ddb api. I found about 10% of the ACK are delayed 40ms even the PSH flag is set, weird. Please notice I do access pod in eks directly. But a more weird thing is, if I access scylla via classic lb, the latency is pretty low < 10ms.

capture for 40ms latency:
https://files.slack.com/files-pri/T02B9QAPR-F02L33C0T46/image.png

Message has been deleted

Cong Guo

<cong.guo@smartnews.com>

unread,

Nov 5, 2021, 3:17:06 AM11/5/21

to ScyllaDB users

May related to the ddb sdk.

Reply all

Reply to author

Forward