ScyllaDB performance is significantly decreased in AWS Graviton for mixed operation

49 views
Skip to first unread message

Uttam Giri

<uttameast@gmail.com>
unread,
May 4, 2023, 12:34:47 PM5/4/23
to ScyllaDB users

Hello, We conducted performance measurements on scyllaDB in AWS Graviton and found that the throughput Ops/sec is significantly decreased when the Consistency level is set to ALL or Quorum for the Mixed Scenario (which involves both READ and WRITE Operations). However, we did observe an improvement in performance when using Consistency level ONE or TWO. Are there any potential solutions or workarounds for this issue?

Please note the following details:

Scylla version: 5.2.0~rc4-0.20230402.d70751fee3f9

Instance type: AWS Graviton3

Performance tool: - Cassandra Stress Tool


Thank you.

Avi Kivity

<avi@scylladb.com>
unread,
May 4, 2023, 1:10:19 PM5/4/23
to scylladb-users@googlegroups.com
It's impossible to say anything given the lack of details.

Note that consistency level TWO is equivalent to QUORUM (assuming single-datacenter and RF=3), so check whether your tests are repeatable.
--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/c50235f4-1b7a-4508-b8ff-9b14f4c40575n%40googlegroups.com.

Uttam Giri

<uttameast@gmail.com>
unread,
May 4, 2023, 4:46:51 PM5/4/23
to ScyllaDB users
Hi Avi, thank you for your response.
Our current setup includes a cluster of four data centers, all located in the same availability zone except for DC1G. And, all the Scylla nodes are in the same Region in the AWS. 

h-4.4$ nodetool status
Datacenter: DC1G
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns    Host ID                               Rack
UN  172.31.13.106  7.42 MB    256          ?       295dd4f4-6602-493f-99c7-e40a9d17273a  beta
UN  172.31.1.89    7.22 MB    256          ?       bf22ae21-b746-4651-8543-21d4ac133131  alpha
Datacenter: DC2G
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns    Host ID                               Rack
UN  172.31.82.185  7.63 MB    256          ?       29cc7adb-8f12-46f6-bdd4-941e965e9f9b  alpha
UN  172.31.86.168  8.49 MB    256          ?       43df2b25-0d60-495e-975f-d713cf30d056  beta
Datacenter: DC3G
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns    Host ID                               Rack
UN  172.31.91.62   8.36 MB    256          ?       1fa5e3ee-2720-411c-8ea6-8362a8bddd93  alpha
UN  172.31.95.202  7.92 MB    256          ?       433d56dd-7959-42b8-b331-738d22fe444b  beta
Datacenter: DC4G
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns    Host ID                               Rack
UN  172.31.82.111  6.74 MB    256          ?       f78c1877-b544-42b8-8023-9e3c855f5852  beta
UN  172.31.91.124  7.62 MB    256          ?       61792510-b388-4bdf-8b26-6a2aedce7449  alpha


Cassandra Stress tool results:-
(MIXED, Replication Factor:-7, Consistency Level:- All)

Results:
Op rate                   :    1,032 op/s  [data: 505 op/s, insert: 527 op/s]
Partition rate            :      530 pk/s  [data: 3 pk/s, insert: 527 pk/s]
Row rate                  :      530 row/s [data: 3 row/s, insert: 527 row/s]
Latency mean              :   12.4 ms [data: 13.2 ms, insert: 11.7 ms]
Latency median            :    0.7 ms [data: 0.8 ms, insert: 0.7 ms]
Latency 95th percentile   :   48.0 ms [data: 48.5 ms, insert: 47.3 ms]
Latency 99th percentile   :   54.7 ms [data: 56.6 ms, insert: 50.4 ms]
Latency 99.9th percentile :   59.4 ms [data: 59.4 ms, insert: 59.3 ms]
Latency max               :   63.1 ms [data: 63.1 ms, insert: 60.2 ms]
Total partitions          :      5,795 [data: 35, insert: 5,760]
Total errors              :          0 [data: 0, insert: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:00:10

(MIXED, Replication Factor:-7, Consistency Level:- QUORUM)

Results:
Op rate                   :    1,460 op/s  [data: 718 op/s, insert: 742 op/s]
Partition rate            :      749 pk/s  [data: 6 pk/s, insert: 742 pk/s]
Row rate                  :      749 row/s [data: 6 row/s, insert: 742 row/s]
Latency mean              :    9.1 ms [data: 12.7 ms, insert: 5.6 ms]
Latency median            :    0.6 ms [data: 0.7 ms, insert: 0.4 ms]
Latency 95th percentile   :   42.3 ms [data: 42.7 ms, insert: 39.8 ms]
Latency 99th percentile   :   48.2 ms [data: 48.8 ms, insert: 47.3 ms]
Latency 99.9th percentile :   51.6 ms [data: 56.0 ms, insert: 49.8 ms]
Latency max               :   59.7 ms [data: 59.7 ms, insert: 59.0 ms]
Total partitions          :      7,909 [data: 67, insert: 7,842]
Total errors              :          0 [data: 0, insert: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:00:10

(MIXED, Replication Factor:-7, Consistency Level:- ONE)

Results:
Op rate                   :   45,505 op/s  [data: 22,756 op/s, insert: 22,749 op/s]
Partition rate            :   26,757 pk/s  [data: 4,008 pk/s, insert: 22,749 pk/s]
Row rate                  :   26,757 row/s [data: 4,008 row/s, insert: 22,749 row/s]
Latency mean              :    0.3 ms [data: 0.3 ms, insert: 0.3 ms]
Latency median            :    0.2 ms [data: 0.2 ms, insert: 0.2 ms]
Latency 95th percentile   :    0.7 ms [data: 0.7 ms, insert: 0.7 ms]
Latency 99th percentile   :    1.3 ms [data: 1.3 ms, insert: 1.3 ms]
Latency 99.9th percentile :    8.5 ms [data: 8.5 ms, insert: 8.5 ms]
Latency max               :   39.9 ms [data: 39.9 ms, insert: 14.4 ms]
Total partitions          :    279,100 [data: 41,807, insert: 237,293]
Total errors              :          0 [data: 0, insert: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:00:10

(MIXED, Replication Factor:-7, Consistency Level:- TWO)

Results:

Op rate                   :   24,650 op/s  [data: 12,302 op/s, insert: 12,348 op/s]
Partition rate            :   13,678 pk/s  [data: 1,330 pk/s, insert: 12,348 pk/s]
Row rate                  :   13,678 row/s [data: 1,330 row/s, insert: 12,348 row/s]
Latency mean              :    0.5 ms [data: 0.7 ms, insert: 0.4 ms]
Latency median            :    0.3 ms [data: 0.3 ms, insert: 0.2 ms]
Latency 95th percentile   :    0.8 ms [data: 1.0 ms, insert: 0.7 ms]
Latency 99th percentile   :    7.8 ms [data: 9.6 ms, insert: 1.5 ms]
Latency 99.9th percentile :   32.8 ms [data: 38.1 ms, insert: 9.2 ms]
Latency max               :   53.9 ms [data: 53.9 ms, insert: 10.9 ms]
Total partitions          :    149,845 [data: 14,567, insert: 135,278]
Total errors              :          0 [data: 0, insert: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:00:10


Thank you.

Avi Kivity

<avi@scylladb.com>
unread,
May 5, 2023, 6:23:07 AM5/5/23
to scylladb-users@googlegroups.com
Note that for this setup QUORUM reads contact five nodes.


By itself that doesn't explain the huge drop in performance. You likely have some nodes misconfigured. I recommend installing scylla-monitor and looking at the dashboards, especially Advanced, in shard mode.

Also, what instance type are you using. "Graviton3" doesn't say anything about the storage.

Uttam Giri

<uttameast@gmail.com>
unread,
May 5, 2023, 3:02:02 PM5/5/23
to scylladb-users@googlegroups.com
Only seeing a bunch of C++ exceptions in the Scylla monitor Dashboard. 
Here are the exception details  in the log file:-

Scylla version 5.2.0~rc4-0.20230402.d70751fee3f9 with build-id 488322ffbfc5a64cc0b9a38e1170aba7efa64fe3

May 05 18:33:47 ip-172-31-91-62.ec2.internal scylla[1441]:  [shard  1] exception - Throw exception at:
                                                           0x30fed77 0x30ff2cb 0x2f66c3b /opt/scylladb/libreloc/libstdc++.so.6+0xa2bcf 0x2f7547b 0x2639f0b 0x104f5af 0x2fb0adb 0x2fb1a6f 0x2fd017f 0x2f895c7 /opt/scylladb/libreloc/libc.so.6+0x84fb7 /opt/scylladb/libreloc/libc.so.6+0xf075b
                                                              --------
                                                              seastar::internal::coroutine_traits_base<void>::promise_type
                                                              --------
                                                              seastar::internal::coroutine_traits_base<bool>::promise_type
                                                              --------
                                                              seastar::internal::coroutine_traits_base<void>::promise_type

image.png


We are using instance type "m7g.4xlarge" Graviton3, and, each node has a storage capacity of 200 GiB.

image.png
image.png
 
Thank you

Virus-free.www.avg.com

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages