--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/0d6792f6-31ca-4da0-83db-c3cd8980a462n%40googlegroups.com.
Hi Cong,
I see that you're using alternator, so the thing that could really help us would be query tracing. For alternator, the only way is to turn on tracing node-wide, which you can do by Scylla's REST API:
curl -X POST localhost:10000/storage_service/trace_probability?probability=1
Either run it on each node locally, or just replace `localhost` with each node's address.
Then, run your code again.
After it's done, you can browse the traces with a helper browser script I published here:https://alternator-traces.sarna.dev/ - it lets you configure the endpoint and will extract tracing information from a node. Tracing contains elapsed time for each step, so it will greatly help in seeing what happens in Scylla - after looking at tracing we'll know whether Scylla spent ~1ms or ~40ms on serving the query. If it spent ~1ms, then the problem is on the driver side. Here's an example output from tracing - source_elapsed is the number of microseconds that passed since the query started:
session_id | event_id | activity | scylla_parent_id | scylla_span_id | source | source_elapsed | thread |
---|---|---|---|---|---|---|---|
---- | |||||||
905cdba0-3d52-11ec-bd5d-331fd6e9788e | 905cffda-3d52-11ec-bd5d-331fd6e9788e | Scan | 0 | 253230276965299 | 127.0.0.1 | 10 | shard 0 |
905cdba0-3d52-11ec-bd5d-331fd6e9788e | 905d056f-3d52-11ec-bd5d-331fd6e9788e | Performing a database query | 0 | 253230276965299 | 127.0.0.1 | 153 | shard 0 |
905cdba0-3d52-11ec-bd5d-331fd6e9788e | 905d06f5-3d52-11ec-bd5d-331fd6e9788e | read_data: querying locally | 0 | 253230276965299 | 127.0.0.1 | 192 | shard 0 |
905cdba0-3d52-11ec-bd5d-331fd6e9788e | 905d0744-3d52-11ec-bd5d-331fd6e9788e | Start querying token range (-inf, {minimum token, end}] | 0 | 253230276965299 | 127.0.0.1 | 200 | shard 0 |
905cdba0-3d52-11ec-bd5d-331fd6e9788e | 905d096a-3d52-11ec-bd5d-331fd6e9788e | Creating shard reader on shard: 0 | 0 | 253230276965299 | 127.0.0.1 | 255 | shard 0 |
905cdba0-3d52-11ec-bd5d-331fd6e9788e | 905d09e0-3d52-11ec-bd5d-331fd6e9788e | Scanning cache for range (-inf, {minimum token, end}] and slice {(-inf, +inf)} | 0 | 253230276965299 | 127.0.0.1 | 267 | shard 0 |
905cdba0-3d52-11ec-bd5d-331fd6e9788e | 905d0af7-3d52-11ec-bd5d-331fd6e9788e | Page stats: 0 partition(s), 0 static row(s) (0 live, 0 dead), 0 clustering row(s) (0 live, 0 dead) and 0 range tombstone(s) | 0 | 253230276965299 | 127.0.0.1 | 295 | shard 0 |
905cdba0-3d52-11ec-bd5d-331fd6e9788e | 905d18c5-3d52-11ec-bd5d-331fd6e9788e | Querying is done | 0 | 253230276965299 | 127.0.0.1 | 648 | shard 0 |
905cdba0-3d52-11ec-bd5d-331fd6e9788e | 905d19d9-3d52-11ec-bd5d-331fd6e9788e | read_data: querying locally | 0 | 253230276965299 | 127.0.0.1 | 676 | shard 0 |
905cdba0-3d52-11ec-bd5d-331fd6e9788e | 905d19f3-3d52-11ec-bd5d-331fd6e9788e | Start querying token range ({minimum token, end}, +inf) | 0 | 253230276965299 | 127.0.0.1 | 678 | shard 0 |
905cdba0-3d52-11ec-bd5d-331fd6e9788e | 905d1a22-3d52-11ec-bd5d-331fd6e9788e | Creating shard reader on shard: 0 | 0 | 253230276965299 | 127.0.0.1 | 683 | shard 0 |
905cdba0-3d52-11ec-bd5d-331fd6e9788e | 905d1a48-3d52-11ec-bd5d-331fd6e9788e | Scanning cache for range ({minimum token, end}, +inf) and slice {(-inf, +inf)} | 0 | 253230276965299 | 127.0.0.1 | 687 | shard 0 |
905cdba0-3d52-11ec-bd5d-331fd6e9788e | 905d1ad6-3d52-11ec-bd5d-331fd6e9788e | Page stats: 0 partition(s), 0 static row(s) (0 live, 0 dead), 0 clustering row(s) (0 live, 0 dead) and 0 range tombstone(s) | 0 | 253230276965299 | 127.0.0.1 | 701 | shard 0 |
905cdba0-3d52-11ec-bd5d-331fd6e9788e | 905d20a5-3d52-11ec-bd5d-331fd6e9788e | Querying is done | 0 | 253230276965299 | 127.0.0.1 | 850 | shard 0 |
---- |
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/22a7a049-bd2a-41f1-9858-aecc67c7ca1cn%40googlegroups.com.
Nov 04 09:47:16 ip-172-16-89-251 scylla[15656]: [shard 3] tracing - Setting tracing probability to 1.0 (normalized 281474976710656)
Nov 04 09:47:16 ip-172-16-89-251 scylla[15656]: [shard 7] tracing - Setting tracing probability to 1.0 (normalized 281474976710656)
Nov 04 09:47:16 ip-172-16-89-251 scylla[15656]: [shard 9] tracing - Setting tracing probability to 1.0 (normalized 281474976710656)
Nov 04 09:47:16 ip-172-16-89-251 scylla[15656]: [shard 10] tracing - Setting tracing probability to 1.0 (normalized 281474976710656)
Nov 04 09:47:16 ip-172-16-89-251 scylla[15656]: [shard 5] tracing - Setting tracing probability to 1.0 (normalized 281474976710656)
Nov 04 09:47:16 ip-172-16-89-251 scylla[15656]: [shard 13] tracing - Setting tracing probability to 1.0 (normalized 281474976710656)
Nov 04 09:47:16 ip-172-16-89-251 scylla[15656]: [shard 11] tracing - Setting tracing probability to 1.0 (normalized 281474976710656)
Nov 04 09:47:16 ip-172-16-89-251 scylla[15656]: [shard 12] tracing - Setting tracing probability to 1.0 (normalized 281474976710656)
Nov 04 09:47:16 ip-172-16-89-251 scylla[15656]: [shard 8] tracing - Setting tracing probability to 1.0 (normalized 281474976710656)
Nov 04 09:47:16 ip-172-16-89-251 scylla[15656]: [shard 6] tracing - Setting tracing probability to 1.0 (normalized 281474976710656)
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/29f56f10-7243-4d66-9852-c3f2a34d32d7n%40googlegroups.com.
session_id | event_id | activity | scylla_parent_id | scylla_span_id | source | source_elapsed | thread
--------------------------------------+--------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+-----------------+---------------+----------------+---------
acaf0af0-3d55-11ec-b12d-bc6d814a812d | acaf2467-3d55-11ec-b12d-bc6d814a812d | GetItem | 0 | 310160318929845 | 172.16.89.251 | 0 | shard 0
acaf0af0-3d55-11ec-b12d-bc6d814a812d | acaf255b-3d55-11ec-b12d-bc6d814a812d | Creating read executor for token -5751088081278505224 with all: {172.16.89.45, 172.16.89.191, 172.16.89.194} targets: {172.16.89.45} repair decision: NONE | 0 | 310160318929845 | 172.16.89.251 | 24 | shard 0
acaf0af0-3d55-11ec-b12d-bc6d814a812d | acaf257b-3d55-11ec-b12d-bc6d814a812d | read_data: sending a message to /172.16.89.45 | 0 | 310160318929845 | 172.16.89.251 | 27 | shard 0
acaf0af0-3d55-11ec-b12d-bc6d814a812d | acaf3424-3d55-11ec-b12d-bc6d814a812d | read_data: got response from /172.16.89.45 | 0 | 310160318929845 | 172.16.89.251 | 403 | shard 0
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/d9a7ca4e-d374-46a9-ad89-31e991714d5an%40googlegroups.com.
Traces indicate that Scylla processed your query in
400microseconds, which translates to sub-1ms latency, as reported
by monitoring. It strongly indicates that the problem lies in the
driver - in particular, take a look at Piotr Dulikowski's
suggestion about the "40ms bug", which comes from using Nagle's
algorithm and delayed ack at the same time. If that's indeed the
case, the fix is to either disable one of the mechanisms in your
system, or at least in the library you're using.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/d0dece61-f28f-4ac5-a910-3c01134a16f6n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/052e6e4f-3899-4779-a17e-ae1bc2aaf65cn%40googlegroups.com.